Overview
This article compares two perspectives to the Neural Networks (NN). With different perspective, one might design the NN in a very different way and finally result in various accuracies.Introduction
Recently, there's a competition of counting the number of sea lions in the picture (as shown below).The winner suppresses other competitors with error nearly 15% less than the second place. That's a huge success. Here is the quote from the resident of google brain:
While everyone tried object detection/segmentation, winner is simple VGG16 regressor that directly outputs sea lion counts from raw images.
This reminds me that my colleagues and I have a similar debate about another competition: The Nature Conservancy Fisheries Monitoring.
The Nature Conservancy Fisheries Monitoring contest
In this competition, participants are asked to classify the fishes in the given pictures (as shown below).There're many objects in the image. However, only a small part of the image is relevant: the Neural Network (NN) should learn to ignore most of the irrelevant objects (such as the human and tools) and focus on the fish and classify their species.
In this circumstance, how should we designed and label the output of NN? One might say: since the classification is relatively simple than the detection (which will output bounding box of the fishes), it'd be better that NN just outputs the categories of the fishes. Another might say: if NN is a human, how can he learn without give him a hint (such as label bounding box so that NN will learn to focus on the important part)?
If we regard the first approach as "the parameters perspective" and the second approach as "the intuition perspective", following list several arguments for each perspective.
The parameter perspective
Supporters:
The parameters shared per output of classification is larger than the detection. Namely, there're plenty of parameters that can optimize the classification.Opponents:
How does the NN learn to recognize fish, if we don't give it a hint? If we only label what kind of fish in the image, it may end up that NN learns to recognize some misleading features in the image.The intuition perspective
Supporters:
Like the human, if you have marked what's important in the picture (as shown below), NN will learn to focus on the things that really matter.Opponents:
To do so, one may decrease the parameters per output to the one-fifth of the original one (from predicting the category to predicting the category, the upper left point and the bottom right point). This will transfer the NN from the optimization of the category to the optimization of the category as well as other irrelevant variables (at least irrelevant to the competition).We did not perform any experiment to test whether perspective is correct. However, the champion of the counting sea lion competition seems to support the parameter perspective.
Conclusion
Neural Network is a black box: Instead of design algorithm by human hand, it let the model automatically learns to solve the question by fitting its inner parameters. Therefore, it might not have as much of meanings as the human designed algorithms. However, from time to time, people tend to give meaning to the NN or tend to illustrate their behavior. That's fine. Some of the interpretation of NN even has strong evidence.However, never forget it is also a model that contained a large amount of parameters! When the two points of view (the human intuition perspective and the parameters perspective) against each other, the parameters perspective seems a better choice in my opinion.