1. Machine Learning Problems
(a) 1. BF,2. C,3. AD,4. G,5. AE,6. A,7. BF,8. AE,9. BG
(b) False. Although a large number of data can train an excellent model working quite well in data resource, or training data, we focus more on the model performance on the test data or the model generalization ability.
- Maximizing performance on the whole dataset may result in severe overfitting.
- On the other hand, using all the data will consume more computation and time.
2. Bayes Decision Rule



3. Gaussian Discriminant Analysis and MLE


c.


4. Text Classification with Naive Bayes
a. 10 words
ooking 9453
voip 9494
computron 13613
nbsp 30033
meds 37568
pills 38176
cialis 45153
sex 56930
php 65398
viagra 75526
b. accuracy = 0.9857315598548972
c. False. When the ratio of spam and ham is 1:99, the spam filter can easily to find ham emails but may regard spam email as the ham email, too.
d.
precision = 0.9750223015165032
recall = 0.9724199288256228
e. Precision really matters. In this condition, it can find more spams.
To identify drugs and bombs at an airport, I think the recall is more important, because we must find all the bombs to make sure the safety.
网友评论