Machine Learning笔记 第09周

作者: 我的名字叫清阳 | 来源:发表于2016-03-17 13:31 被阅读555次
Reason for Feature selection
  • The aim of Feature selection can be find the knowledge in data and reduce dimentionality of data.
    • with less features, it's easier to interpret data and get insight with data
    • the amount of data needed for solving ML problems grows exponentially as the number of features grows. So it's better to reduce the number of features.
Quiz 1: How hard is the feature selection problem?
  • It is NP-hard and it is exponential.

Filtering and Wrapping

Filtering and Wrapping
  • Filtering is forward flow, there is no feedback from learning to the searching algorithm
  • Wrapping has the searching algorithm inside with the learning algorithm and allows feedback from learning to the search algorithm.
Filtering example
  • Filtering
    • Pros: fast
    • Cons: 1. slow for isolated features; 2 ignores the learning problem
  • Wrapping
    • Pros: 1. takes into account of model bias; 2. takes into account of learning
    • cons: very slow.
  • example of filtering: use DT to select important features for the learning algorithms (e.g. kNN).
How to do filtering and wrapping

For filtering Criteria:

  • Information gain
  • variation, entropy
  • independent/non-redundant

How to do Wrapping:

  • hill climbing
  • randomized optimization
  • Forward search: find the best feature first. then in the rest feature, find one and combine with the first selected feature which give the best the score and keep it; then find the one which get the best score when combined with the selected……
  • backward search: remove one, for the rest of combinations, keep the one does the best, repeat... until the score change too much?
Quiz2: using filtering, choose the features to get zero training error
  • For DT, it's easy. when a == 0, then label is -; when a == 1, then split on b, and when b == 0, label is -; when b == 1, label is +. This is a AND B.
  • For the perceptron (wTx > 0), it is not that easy to see the results. With a and b, the problem is not solvable. adding c with weight of -1, the problem can be solved. Although c does not offer any information, it is still useful in this case.
Relevance
  • B.O.C:Bayes optimal classifier. Relevance only concerns B.O.C.
  • Strongly relevant: removing x degrades B.O.C, then x is strongly relevent
  • weakly relevant: when x is not strongly relevent and exits subset of features that addig x to it improves B.O.C
  • irrelevant: NOT( strongly or weakly relevant)
Usefulness Wrap up
2016-03-16

相关文章

网友评论

    本文标题:Machine Learning笔记 第09周

    本文链接:https://www.haomeiwen.com/subject/movqlttx.html