2017年12月29日 星期五

TensorFlow not runing by GPU?

Problem

     You're sure you install the GPU version of TensorFlow by:
pip install tensorflow-gpu
However, you find that the calculation is extremely slow and your CPU loading is pretty high.

Solution

    This might because of the environment variable that controls which graphic cards should be used:  The CUDA_VISIBLE_DEVICES should be equal to 0 if you have only one GPU.  In my case, I have downloaded a deep learning project that has assigned this variable to empty string (CUDA_VISIBLE_DEVICES = "") result in the project is running on CPU.

Batch Normalization in TensorFlow

Overview

    This article will not illustrate the basic concept of Batch Normalization.  Instead, this article will focus on the implementation detail of Batch Normalization in TensorFlow.

Introduction

    Batch Normalization can make the convergence of neural networks easier, and sometimes can even improve the accuracies.  The formula is shown below:






where










   The idea is simple and elegant.   However, when it comes to implementation, it becomes a little tricky due to the average parts (the average of the means and the average of the variances):  To calculate such average parts over all training set seems infeasible.
     A common way to solve it is to invoke the Moving Average algorithm.  In this case, you cannot merely wrap the Batch Normalization in a function and return its output tensor, you should also return the update operations of the Moving Average calculation.  And session run such update operations after you backpropagate the neural network in each step.  Therefore, you should expect that there should be somehow an update operation after each training step (or, at least in an implicit way), no matter what Batch Normalization API you use.
     Tensorflow provides two Batch Normalization API, tf.nn.batch_normalization() and tf.layers.batch_normalization().  The following section will use these different API to implement a function called BatchNormalization() and compare their performance.

tf.nn.batch_normalization:

    This is the low-level API for Batch Normalization:  You should not only session run the update operations each training step, but also need to calculate the average of mean and variance as well as to create variables such as gamma and betta all by your own.  The code is shown as follows (ref: [1], [2]):
def BatchNormalization(isTraining_, currentStep_, inputTensor_, isConvLayer_, layerName_="BatchNorm"):
        with tf.variable_scope(layerName_):
                currentBatchMean = None
                currentBatchVariance = None
                outputChannels = None
                if isConvLayer_:
                        currentBatchMean, currentBatchVariance = tf.nn.moments(inputTensor_, [0, 1, 2])
                else:
                        currentBatchMean, currentBatchVariance = tf.nn.moments(inputTensor_, [0])

                averageCalculator = tf.train.ExponentialMovingAverage(decay=0.99,
                                                                      num_updates=currentStep_)
                updateVariablesOperation = averageCalculator.apply( [currentBatchMean, currentBatchVariance] )

                totalMean = tf.cond(isTraining_,
                                    lambda: currentBatchMean, lambda: averageCalculator.average(currentBatchMean) )

                totalVariance = tf.cond(isTraining_,
                                        lambda: currentBatchVariance, lambda: averageCalculator.average(currentBatchVariance) )

                outputChannels = int(inputTensor_.shape[-1])
                gamma = tf.Variable( tf.ones([outputChannels]) )
                betta = tf.Variable( tf.zeros([outputChannels]) )
                epsilon = 1e-5
                outputTensor = tf.nn.batch_normalization(inputTensor_, mean=totalMean, variance=totalVariance, offset=betta,
                                                         scale=gamma, variance_epsilon=epsilon)
                return outputTensor, updateVariablesOperation
Note that as remarked in ref [1], it suggests that one should assign the current training step to num_updates in the construction of tf.train.ExponentialMovingAverage() to "prevent from averaging across non-existing iterations".  I think it just means that the variable is randomly initialized in the first step and should be scaled down its importance.  If you don't assign the num_updates, according to the TensorFlow documents, it will calculate the mean simply by:
totalMean = (1 - decay)*currentBatchMean + decay*totalMean

And if you assign the current training step to num_updates, the mean will be calculated as follows:
decay = min(decay, (1 + step)/(10+step) )
totalMean = (1 - decay)*currentBatchMean + decay*totalMean

Usage:

You can build your net as follows:
class AlexnetBatchNorm(SubnetBase):
        def __init__(self, isTraining_, trainingStep_, input_, ...):
                self.isTraining = isTraining_
                self.trainingStep = trainingStep_
                self.input = input_
                ...

        def Build(self):
                net = ConvLayer(self.input, 3, 8, stride_=1, padding_='SAME', layerName_='conv1')
                net, updateVariablesOp1 = BatchNormalization(self.isTraining, self.trainingStep, net, isConvLayer_=True)
                net = tf.nn.relu(net)

                net = ConvLayer(net, 3, 16, stride_=1, padding_='SAME', layerName_='conv2')
                net, updateVariablesOp2 = BatchNormalization(self.isTraining, self.trainingStep, net, isConvLayer_=True)
                net = tf.nn.relu(net)

                ...

                updateOperations = tf.group(updateVariablesOp1, updateVariablesOp2, ...)
                return net, updateOperations
where ConvLayer() is a simple wrapper for convolution layer. The isTraining_, trainingStep_, input_ is the placeholders that will be assigned while you session run.
     Finally, session run the update operation for each training step:
while step < MAX_TRAINING_STEPS:
        session.run( trainOp,
                     feed_dict={self.net.isTraining : True,
                                self.net.trainingStep : step,
                                self.net.input : x,
                                ...})

        session.run( self.updateNetOp,
                     feed_dict={self.net.isTraining : False,
                                self.net.trainingStep : step,
                                self.net.input : x,
                                        ...})

You can refer to our git hub (file: Train.py, src/subnet/AlexBatchNorm.py, src/layers/BasicLayers.py) for more detail.

Performance:

     The following figure shows the training and validation curve of models that applied Batch Normalization (the green and the pink curves) or not (the blue and the orange curves):
One can see that the model with Batch Normalization converges very fast and even improves the result by a small percentage.

Recover:

    In the above implementation, we have used tf.train.ExponentialMovingAverage to calculate the average of mean and variance.  However, its documents suggested that while you try to recover the graph from checkpoints, you should do something like:
variables_to_restore = ema.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
In the Batch Normalization, however, it seems that we can recover the network as usual (probably the tf.nn.batch_normalization() has already done it?).  If I try to recover the network as the above suggestion, I got following error:
Therefore, one should just recover the network just as follows:
modelLoader = tf.train.Saver()
modelLoader.restore(session, PATH_TO_MODEL_CHECKPOINT)
    One more proof of this is to re-train the model and see if its loss starts from the same value as the pre-train model:
As shown above, the blue curve is the validation of the pre-train model.  The red curve is the model that read from the last step of the pre-train model and re-train it again.  You can see that they are perfectly matched.  Therefore, we can judge that the means and variances of variables are perfectly recovered.


tf.layers.batch_normalization:

    TensorFlow also provides high-level API for Batch Normalization.  However, its weird behavior makes us decide not to use it finally.
    The wrapper function of Batch Normalization that applies this API is simply:
def BatchNormalization(isTraining_, inputTensor_, layerName_=None):
        return tf.layers.batch_normalization(inputTensor_, training=isTraining_, name=layerName_)
    In this implementation, you can see that we don't need to input whether the last layer is convolution and the function just return the output tensor.  However, this does not mean that you don't need to update the network.  The update operation is stored in the tf.collection and you should pull it out and session run it after each training step.  The documentation suggests that you can also claim the dependencies of the update operation and the training operation so that while you session run the training operation, it'll automatically run the update operation for you:
updateOps = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
optimizer = tf.train.AdamOptimizer(learning_rate=self.learningRate)
with tf.control_dependencies(updateOps):
                self.trainOp = optimizer.minimize(lossOp)

while step < MAX_TRAINING_STEPS:
        session.run( self.trainOp,
                     feed_dict={self.net.isTraining : True,
                                self.net.trainingStep : step,
                                self.net.input : x,
                                ...})

Performance:

Following is the comparison of the two implementations.
The two upper curves are the train & val that apply the tf.layers.batch_normalization API.  And the two lower curve is the train & val that apply the tf.nn.batch_normalization API.  One can see that: Both of them converge to the same limit.  However, the tf.layers.batch_normalization API had gone through a small hump during the 45~50 training epochs.  It's this strange behavior makes us finally use the tf.nn.batch_normalization API.

Conclusion

     This article concentrates on the implementation detail of the Batch Normalization in Tensorflow.  Two approaches have been compared.  Moreover, this article also proves that one does not need to recover the tf.train.ExponentialMovingAverage variables manually (as the documentation suggested) while one recover the networks.

2017年8月10日 星期四

AI + AR: The new generation of Augmented Reality

Overview

    AI is hot, while AR is awesome, why not combine them together?  At the very first time that I understand how AI (strictly speaking, the Convolution Neural Network (CNN) ) works, I always want to apply it to enhance the object detection of traditional AR.  However, it was not until recently that I decided to make it come true.  This article will show you how AI improve the user experience by present a simple demo of PokemonGo-like AR application.

Introduction

    Augmented Reality (AR) can be roughly separated into two categories: camera-based and location-based.

Camera-based AR

    In this category, the application will interact with the user through the camera.  Basically, the traditional camera-based AR detects the Corners of an image, then use machine learning algorithms to determine whether the target objects present in this image given the distribution of its Corners.  The word 'Corner' here, means the pixels that have the prominent color difference with its surrounding pixels.  For example, the tip of the roof. As shown below (Left), Corners are marked by the red circle:
    The traditional AR is usually used to recognized 2D images as shown above (Right).  However, when it comes to 3D object recognition, the distributions of corners vary if one uses the different perspective to see it, as shown follows:

Therefore, it's hard to apply the traditional AR to recognize 3D objects.

Location-based AR

    The famous application of location-based AR is the PokemonGo.  It detects the location of the user and determines whether place Game Objects to interact with the user.

    It also somehow provides the camera-based AR, but there's no more functionality whether you open or without open the camera.  For example, if I detect Squirtle near the bush, then I step back.  The Squirtle does not look smaller or respond (such as attack, chase or flee) to my movement.  Moreover, why is the Squirtle in the bush, not in the pond?  Furthermore, if I keep stepping back, the Squirtle looks like dragged by me, not chasing me to the road.

    
This fact makes the user feel cumbersome if they play the PokemonGo with camera opened.  Furthermore, It's somehow disappointing, since one of their advertisement couple of years ago looks like it can enable the user to interact with his surroundings (such as the Pikachu can be found near the bush or the Snorlax sleep on the bridge).

   
     Nevertheless, with CNN many of the problems that discussed above can be solved.  The next section will show you how CNN can improve such user experience tremendously:

Improve AR by CNN

     Despite classification, another basic functionality of CNN is to perform object detection.  It can output the x, y, width, height and probability (relative to the image coordinate) of the trigger objects (e.g. the bridge).  Such information can be used to place the Game Objects (e.g. the Snorlax) on the image so that you can find the Snorlax only if you open the camera.
     Furthermore, unlike the traditional camera-based AR which can only recognize a certain bridge (e.g. the white bridge, as shown above) but not all of the bridges.  Not to mention it can only recognize such bridge in a certain perspective.  These disadvantages can be conquered if one apply CNN to perform the object detection.
    I have spent a few weeks to write a simple application based on such idea:  If the bush has been detected, it will trigger the MushBro (sorry, to avoid the copyright issue, I didn't use the Pikachu) to jump out and interact with the player, as shown follows.  You can see that the MushBro will keep standing near the bush.  Although the video is not shown, if one step back, the MushBro will keep shrinking.  And if one keeps stepping back, so that the bush becomes too small to be recognized, the MushBro will flee.

Other application scenarios

Still not interested yet?  Consider the following scenario:
"You know there's Lapras located in Loch Ness, but only if you point your camera to the center of Loch Ness so that the Lapras will emerge."
and
"You have heard that there will be an Ho-Oh on the roof of an old temple.  And you can only find it when you aim your cell phone to the roof of that temple."

as well as
"After raining, you can go fishing near the newly created puddle."
or
"You have a small probability to find the Mew with while you point your cell phone to the rainbow."

Note that in the last two scenarios, the appearance of the Pokemons is even beyond the developer's expectation:  The Pokemons are placed by nature, not located by human hands.

Implementation

    I put the recognition system on the server, and wrote an application that keeps sending images to that server.  When the recognition finished, the server sends the recognition result back to the phone.  And the phone application determines where to place the Game Object (the MushBro).


Conclusion

    This article shows how CNN can improve the user experience of the traditional AR, and its potential to be extended to more fascinated scenarios.
    However, there're also some cons of this approach:

1. It's very expensive to perform the object recognition on the server.  In this case, to reduce the recognition rate can reduce the burden of the server, such as performing one object detection for several frames. Or once the trigger object detected, stop sending new images.  Note: the latter solution will not change the Game Object's size according to the movement of the user.  And therefore, might reduce the user experience.

2. The Game Object is wiggling.  This can be solved by applying algorithms to stabilize the recognition result.  For the future work, I'll try to use several tracking algorithms such as a simple LSTM to perform such trick.

3. The user can use a certain image to cheat the object recognizer: For example, if the Ho-Oh will appear in a certain temple in Tainan.  The user can just show the image of that temple to the camera, so that they can trigger Ho-Oh without actually going there.  This can be solved by checking the location of the user first, then perform the object recognition.  This can also be used to reduce the burden of the recognition server.

2017年7月14日 星期五

How do you think about Neural Network, the intuition perspective or the parameter perspective?

Overview

    This article compares two perspectives to the Neural Networks (NN).  With different perspective, one might design the NN in a very different way and finally result in various accuracies.

Introduction

    Recently, there's a competition of counting the number of sea lions in the picture (as shown below).
The winner suppresses other competitors with error nearly 15% less than the second place.  That's a huge success.  Here is the quote from the resident of google brain:
While everyone tried object detection/segmentation, winner is simple VGG16 regressor that directly outputs sea lion counts from raw images.

     This reminds me that my colleagues and I have a similar debate about another competition: The Nature Conservancy Fisheries Monitoring.


The Nature Conservancy Fisheries Monitoring contest

    In this competition, participants are asked to classify the fishes in the given pictures (as shown below).

There're many objects in the image.  However, only a small part of the image is relevant: the Neural Network (NN) should learn to ignore most of the irrelevant objects (such as the human and tools) and focus on the fish and classify their species.
    In this circumstance, how should we designed and label the output of NN?  One might say: since the classification is relatively simple than the detection (which will output bounding box of the fishes), it'd be better that NN just outputs the categories of the fishes.  Another might say: if NN is a human, how can he learn without give him a hint (such as label bounding box so that NN will learn to focus on the important part)?
    If we regard the first approach as "the parameters perspective" and the second approach as "the intuition perspective", following list several arguments for each perspective.

The parameter perspective

Supporters:  

    The parameters shared per output of classification is larger than the detection.  Namely, there're plenty of parameters that can optimize the classification.

Opponents:

    How does the NN learn to recognize fish, if we don't give it a hint?  If we only label what kind of fish in the image, it may end up that NN learns to recognize some misleading features in the image.

The intuition perspective

Supporters:

    Like the human, if you have marked what's important in the picture (as shown below), NN will learn to focus on the things that really matter.

Opponents:

    To do so, one may decrease the parameters per output to the one-fifth of the original one (from predicting the category to predicting the category, the upper left point and the bottom right point).  This will transfer the NN from the optimization of the category to the optimization of the category as well as other irrelevant variables (at least irrelevant to the competition).

We did not perform any experiment to test whether perspective is correct.  However, the champion of the counting sea lion competition seems to support the parameter perspective.

Conclusion

    Neural Network is a black box:  Instead of design algorithm by human hand, it let the model automatically learns to solve the question by fitting its inner parameters.  Therefore, it might not have as much of meanings as the human designed algorithms.  However, from time to time, people tend to give meaning to the NN or tend to illustrate their behavior.  That's fine.  Some of the interpretation of NN even has strong evidence.
    However, never forget it is also a model that contained a large amount of parameters!  When the two points of view (the human intuition perspective and the parameters perspective) against each other, the parameters perspective seems a better choice in my opinion.

2017年6月12日 星期一

Connect C++ to Python

Background

    In order to support features such as function overloading and template, C++ applies the name mangling technique.   However, the C++ standard does not restrict how to do it.  Therefore, every vendor who implements the C++ compiler may do it in his own way.  This fact makes C++ hard to connect with other languages.
    With this in mind, I always think the only way to connect C++ with other languages is through the C interface.  Until I saw Boost.Python.

Boost.Python

    This module provides a convenient way to build such connection.  With this module, you don't even need to change your C++ code at all (non-intrusive, see its Quick Start for more detail).  However, when it comes to more advanced features such as the polymorphism and the template, one should do some effort to make it work...

2017年4月1日 星期六

Why the traditional classifier is better than the Neural Network?

Overview

    This article gives my interpretation of why traditional classifier suppresses the (fully-connected) neural network if the input features are given.

Introduction

     To recognize certain objects from an image, a lot of research use the Convolution Neural Network (CNN) to extract features and then input those features to the traditional classifier to predict the result.  This still holds even if the data sets are large enough (i.e. the effect of overfitting can be excluded).
     It's hard to believe though, since the CNN is trained for such purpose and the last layer (such as the fully-connected layer or the softmax layer) is optimized with others as a whole.  Namely,  during training, each layer will adjust their parameters so that they can give the best result.  In this situation, the former layers will extract features that best work with the last layer who performs the classification.  And the last layer is especially designed for accepting those kinds of features and do the classification.
     However, there're plenty of papers use the traditional classifier to give the final results.  And even the SVM (which is regarded as the simplest classifier and it's trivial version can only separate the data linearly) is considered better than the fully-connected layer.  Besides of the cause of overfitting (the data is too few compare to the parameters to fit), following is the interpretation I proposed.

Interpretation

     Suppose we take 2D feature space to describe a system:
1. If the distribution of data looks like Fig. 1(a), it is very simple to separate it into two classes.
(All of the following graphs are all calculated by libsvm)
Fig. 1 (a)(b)

2. If the distribution of data looks like Fig. 2(a), it still makes sense.
Fig. 2 (a)(b)
For example, the horizontal axis represents the BMI of students, and the vertical axis is their sports performance.  This distribution can also be perfectly separated by SVM if a certain transformation is performed on the two axes.  For example, if you transform the BMI ratio to some score that represents the health of human body, the graph will look like Fig. 1 (which means "health score" is more suitable to describe such system).

3. If the distribution of data looks like Fig. 3(a), some red dots that surround by blue dots may be noises (i.e. data that is mistakenly labeled).  You know you want your algorithm to separate your data and gives the result like Fig. 3(b).  However, if your algorithm makes a fuss about such noises it would look like Fig. 3(c).  If this is the case, it will drag down the performance.
Fig. 3 (a) (b)
(c)

4. If the distribution of data looks like Fig. 4(a), this feature space may make no sense in my opinion.  This probably means this feature space is not adequate to describe such system.  If your algorithm tries hard and tends to separate these data into many small groups, it probably does it wrong.
Fig. 4 (a) (b)
In this case, you should probably abandon one of the features (or even both of them).  Or, find new features that describe the system well by setting some restrictions of your model (such as Regularization).  This may also be able to explain why Clustering can't give a better result than SVM:  Although the parameters of Clustering is quite a few (compare to the input data), trying to separate every data well is meaningless.

    Now, let us back to the high dimension feature space.  If your algorithm has many parameters that can fit into many circumstances (such as the fully-connected neural network), no matter current data is a noise or some of the features are poor to describe the system, it may get into trouble.  Instead, if one algorithm that can ignore some noises and take less of those features which does not help to classify objects, it may give a better result.

Multi-tasks in Convolution Neural Network (CNN) - the reuse of the extracted features

Overview

     This article will discuss one of the approaches to perform multi-tasks in computer vision: By extracting the features from a CNN and send those features to the traditional classification methods (such as SVM, Joint Bayesian... etc).  This article also shows the range of accuracy, so that one can judge if this approach satisfies his own purposes.

Motivation

     In some circumstances, we want more than one task is executed.  For example, supposed that we want to develop an APP that can recommend different products to different consumer groups by examining the age, gender, wear glasses... etc.  And judge their response to such advertisement by examining their facial expressions.  It seems that multiple CNN should be executed at a time.
     However, passing an image through one CNN is time-consuming.  Not to mention that there're so many tasks to be performed.  There should be other alternatives.

Introduction

     As shown below, the Convolution Neural Network (CNN) is usually composed of several Convolution Layer at the front of the network, as well as the Fully-Connected Layer at its end.

The former of the network are believed to perform the feature extraction (i.e. to extract the characteristic of such image). While the later layer, especially the Fully-Connected Layer, is believed to perform the classification of such image by using the previously derived features.
     Due to such characteristics of CNN, it is straightforward to replace the classification part (i.e. the Fully-Connected Layer) by the traditional classifiers (such as SVM).  For its efficiency and even its accuracy (see this article).
     In this article, we approach the multi-tasks by training a CNN for one task.  And perform other tasks by extracting features from that CNN and input those features to the traditional classifier to perform other tasks.  We believe that the features that used by one task may be close to the feature of the other task, if both of the tasks are similar to each other.

CNN + traditional classifiers

     In our case, we have at least three tasks to perform: to infer whether the user is smiling, wearing glasses, and the user's age.  We start from the DEX model (which is designed for age deduction), take features from the first the Fully-Connected Layer (so that the size of the features is large enough to do further processing) and finally, input those features to the traditional classifier (SVM in our cases).


Result


Age

      Since we do not re-train the CNN, the error of Age should remain the same (i.e. from its paper: MAE = 3.221).
-------------------------------------------------------------------------------------------------------------------------------------------------------------

      On the beginning of this project, our team has a heated discussion about whether the result of Smile-Task is more accurate than the Glasses-Task.  One might say that: since the glasses is more apparent than any other textures  of the human faces, the Glasses-Task should get the better result.  On the other hand, the other may say: since the DEX model is trained for detecting the human age, it might tend to only extract the features that relate to it (e.g. the wrinkle) and these features are more suitable for judging smiling.

Smile

      In this task, we only output two results: Smile or Not-Smile.  We collect about 3000 training data for each class to train SVM.
     The accuracy is around 88% which can be better if we train a model especially for detecting Smile.  However, this accuracy is suitable for our purpose.  Instead of letting users wait, some mistakes are tolerable.

Glasses

      In this task, we output three results: No Glasses, Glasses, and Sun Glasses.  We also collect around 3000 training data for each class.
      The accuracy is only around 84%.  Looks like the Smile-Task win!  To give a brief interpretation after the result, this might due to the DEX model is trained for predicting age.  Therefore, the later parts of the network might tend to ignore some features that belong to the glasses, since wearing a glasses does not change the age of a human.
      Furthermore, the result of glasses may get better if we extract features from the former layers.  However, since the features of the former layer is quite large (10 times as many features than we have used) and will drag down the efficiency.

Conclusions

    It's possible to use features that extracted by CNN that is trained for another purpose.  And the performance is around 80%~90% depend on whether current task is close to the original task (for which the CNN is designed).

2017年3月21日 星期二

Qt中文亂碼問題

Problem

   若想直接在Qt上打中文,常會遇到亂碼問題。嘗試了網路上蠻多方法都失敗,本文總結出解決亂碼的方法。

Solution

情境1

想在 Source Code (*.cpp) 裡面寫中文;但當其顯示在UI上時,卻呈現亂碼。
如:
QString newInfo("你好");
this->displayPanel->myQLabel->setText(newInfo);

解決方法請見這篇文章


情境2

如果有一已編譯好的函示庫(假設為輸入員工ID,則該函示庫會輸出員工姓名),該函示庫的輸出型態為std::string,其編碼為BIG-5(以下皆假設在Windows作業系統下)。若欲將此型態轉為QString,則需以下面方式即可:
namespace QtExtensions
{
    QString ConvertToUTF8(const std::string& big5FormatString_)
    {
        QString result = QString::fromLocal8Bit(big5FormatString_.c_str());

        return result;
    }
}//End of namespace

而在使用時:
std::string GetStaffName(const std::string& staffID_);

int main()
{
    std::string staffNameInBig5Format = ::GetStaffName("01234567");
    QString staffNameInUTF8 = QtExtensions::ConvertToUTF8(staffNameInBig5Format);
    staffInfoWindow.SetStaffName(staffNameInUTF8);
}

Can't create 64bits virtual machine on VirtualBox

Problem

    I have a exported ubuntu 14.10 64bits that run on VirtualBox.  After I upgrade my host OS (i.e. win7 to win10), that exported ubuntu will crash when I try to boot it.
    Furthermore, when I want to create a new virtual machine, it does not have the "64bits" options.  I know this issue!  Normally, this problem can be solved by Enable the Virtualization Settings in BIOS.  However, I found the Virtualization Settings is still enabled...

Solution

    After several searches, I found this article.  What causes this problem is: Only one virtual machine manager (hypervisor) is valid at a time, but we have Two (one is our VirtualBox, the other is Hyper-V that used by Microsoft (I still wonder if it has to do with the virtual machine of my HoloLens?)).  Although I can't change the settings as the above article did (my OS is win10), the comments of that article provide the solution to turn off the Hyper-V:

bcdedit /set hypervisorlaunchtype off
After typing the above commands in command line and reboot, the VirtualBox can create the 64bits OS and execute it successfully.  However, I found that I should type the above command everytime  I reboot my computer.  Finally, I solve this by removing the "Android-" something that relates to the VisualStudio...

No picture on display (black screen) after Updating Win10

Problem

    After the first update of win 10 (Professional, maybe), I got No Picture On Display (even when I wait for a long time). While the computer seems to be still running, though.  By forcing the computer to reboot (usually, I need to reboot many times), the computer can recover to its last status (i.e. I can use my computer again, but the update seems not applied at all...).  This bothers me for a long time...

Solution

    Recently, I can't stand anymore and begin to face the problem.  Finally, I found it might be caused by the driver issue of my Graphic Card (the win7 driver not compatible with and win10?).  Anyway, by removing the Graphic Card and reboot, I saw the updating procedure after reboot.  When it finished, I put my Graphic Card back.  And Everything works fine.

The History of my Computer

    I start from win7, and install win10 Home (by buying from vendor).  After then, I buy win10 Professional online (to enable the development of HoloLens).

My Hardware

CPU: Intel R. i7-4790K
Motherboard: GIGABYTE Z97MX-Gaming 5
Graphics Card: MSI GTX780Ti