Linear Classification
方法应该有两个主要的部分:score function,是原始数据到class score的映射。loss function,量化预测分数和真实标签的一致性。
Parameterized mapping from images to label scores
对于参数的解释:
- W的每一行都是一种分类的分类器。这些数字的几何解释:当我们改变W的其中一行时,在像素空间中相对应的线将会向不同的方向旋转。b:允许分类器移动这些线
- W的每一行都代表一种分类的模板。对于图片每个分类的分数,通过比较每个模板和图片(通过点积的方法)最终找到最适合的。术语来说:线性分类做的就是模板匹配。
bias trick
详情见原网页。
将w和b合成一个参数w。原来w为[10 * 3072] , b为[10 * 1] ,合成新的w[10 * 3073],x由[3072 * 1]变为[3073 * 1]。
loss function
Multiclass Support Vector Machine loss
The SVM loss is set up so that the SVM “wants” the correct class for each image to a have a score higher than the incorrect classes by some fixed margin Δ.(课件中cs231n_Lecture03将margin设定为1)
The loss function quantifies our unhappiness with predictions on the training set
Regularization
There is one bug with the loss function we presented above.The issue is that this set of W is not necessarily unique. we wish to encode some preference for a certain set of weights W over others to remove this ambiguity.
因为用上面的方法求出的W不是唯一的,因此引入了regularization
L2 penalty
The most appealing property is that penalizing large weights tends to improve generalization, because it means that no input dimension can have a very large influence on the scores all by itself.
This effect can improve the generalization performance of the classifiers on test images and lead to less overfitting.
Lastly, note that due to the regularization penalty we can never achieve loss of exactly 0.0 on all examples, because this would only be possible in the pathological setting of W=0.
使用这个方法可以是单个参数对scores的影响变小。这种效果可以提高分类器在测试图像上的泛化性能,并且可以减少过度拟合。
最后,由于引入了regularization,loss function不可能为0。
loss function(without regularization), in both unvectorized and half-vectorized form
All we have to do now is to come up with a way to find the weights that minimize the loss.
Practical Considerations
setting Delta. 一般取Δ=1.0。
All-vs-All (AVA) strategy Weston and Watkins 1999 (pdf)
Softmax Classifier
If you’ve heard of the binary Logistic Regression classifier before, the Softmax classifier is its generalization to multiple classes. Unlike the SVM which treats the outputs f(xi,W) as (uncalibrated and possibly difficult to interpret) scores for each class, the Softmax classifier gives a slightly more intuitive output (normalized class probabilities) and also has a probabilistic interpretation that we will describe shortly.
softmax 已概率形式给出结果。
在实践中:
应该使用normalization trick.公式见原文。
SVM和softmax的表现差异很小。
Compared to the Softmax classifier, the SVM is a more local objective, which could be thought of either as a bug or a feature.
SVM SVM是一个更本地化的目标,可以将其视为错误或功能。
the Softmax classifier is never fully happy with the scores it produces: the correct class could always have a higher probability and the incorrect classes always a lower probability and the loss would always get better.
网友评论