-
Whatis the fundamental idea behind Support Vector Machines ?
The fundamental idea behind Support Vector Machines is to fit the widest possible "Street"
between the classes. In other words, the goal is to have the largest pissible margin between the classes. In other words, the goal is to have the largest possible margin between the decision boundary that separates the two classes and the training instances. When performing soft margin classification. the SVM searches for a compromise between perfectly separating the two classes and having the widest possible street(i.e., a few instances may end up on the street). Another key idea is to use kernels when training on nonlinear datasets. -
What is a support vector ?
After training an SVM, a support vector is any instance located on the "street", including its border. The decision boundary is entirely determined by support vectors. Any instance that is not a support vector(i.e., is off the street) has no influence whatsoever; you could remove them, add more instances, or move them around, and as long as they stay off the street they won't affect the decision boundary. Computing hte predictions only involves the support vectors, not the whole training set.
- Why is it import to scale the inputs when using SVMs?
SVMs try to fit the largest possible "street" between the classes, so if the training set is not scaled, the svm will tend to neglect small features
- Can an SVM classifier output a confidence score when it classifies an instance ? what about a probability ?
An SVM classifier can output the distance between the test instance and the decision boundary, and you acn use this as a condidence score. However, this score cannot be directly converted into an estimation of the class probaility.If you set probability=True when creating an SVM in Scikit-=Learn, then after training it will calibrate the probabilities using Logistic Regression on the SVM's scores(trained by an additional five-fold cross-validation on the training data). This will add the predict_proba() and predict_log_proba() methods to the SVM.
- Should you use the primal or the dual from of the SVM problem to train a model on a training set with millions of instances and hundreds of features ?
This querstion applies only on linear SVMs since kernelized SVMs can only use the dual form. The computational of primal form os the SVM problem is proportional to the number of training instances m, while the computational complexity of the dual from is proportional to a number between m2 and m3. Os if there are milions of instances, you should definitely use the primal form, bacause the dual form will be much too slow.
- Say you've trained an SVM classifier with an RBF kernel, but it seems to underfit the training set. Should you increase or decrease γ (gamma)? What about C?
If an SVM classifier trained with an RBF kernel underfits the training set, there might be too much regularization. To decrease it , you need to increase gamma or C(or both)
网友评论