data:image/s3,"s3://crabby-images/2a73d/2a73d35988a499fc5d9d75a976215934ea92604f" alt=""
- The aim of Feature selection can be find the knowledge in data and reduce dimentionality of data.
- with less features, it's easier to interpret data and get insight with data
- the amount of data needed for solving ML problems grows exponentially as the number of features grows. So it's better to reduce the number of features.
data:image/s3,"s3://crabby-images/100e9/100e91684222f707446b3c59e35a767375c587de" alt=""
- It is NP-hard and it is exponential.
Filtering and Wrapping
data:image/s3,"s3://crabby-images/60436/6043652de5d2927284bea0d976cc8740bc340f42" alt=""
- Filtering is forward flow, there is no feedback from learning to the searching algorithm
- Wrapping has the searching algorithm inside with the learning algorithm and allows feedback from learning to the search algorithm.
data:image/s3,"s3://crabby-images/9f97a/9f97a6d926b746540763940d5f016623f1906b48" alt=""
- Filtering
- Pros: fast
- Cons: 1. slow for isolated features; 2 ignores the learning problem
- Wrapping
- Pros: 1. takes into account of model bias; 2. takes into account of learning
- cons: very slow.
- example of filtering: use DT to select important features for the learning algorithms (e.g. kNN).
data:image/s3,"s3://crabby-images/63eed/63eedda5a698de9690a016edcaf97b259db0c758" alt=""
For filtering Criteria:
- Information gain
- variation, entropy
- independent/non-redundant
How to do Wrapping:
- hill climbing
- randomized optimization
- Forward search: find the best feature first. then in the rest feature, find one and combine with the first selected feature which give the best the score and keep it; then find the one which get the best score when combined with the selected……
- backward search: remove one, for the rest of combinations, keep the one does the best, repeat... until the score change too much?
data:image/s3,"s3://crabby-images/be9ba/be9bafa0a5734c24aba1225b2d45e41093792b69" alt=""
- For DT, it's easy. when a == 0, then label is -; when a == 1, then split on b, and when b == 0, label is -; when b == 1, label is +. This is a AND B.
- For the perceptron (wTx > 0), it is not that easy to see the results. With a and b, the problem is not solvable. adding c with weight of -1, the problem can be solved. Although c does not offer any information, it is still useful in this case.
data:image/s3,"s3://crabby-images/1d6e3/1d6e3897060d0c7c779d314ebc5596b2534a55f5" alt=""
- B.O.C:Bayes optimal classifier. Relevance only concerns B.O.C.
- Strongly relevant: removing x degrades B.O.C, then x is strongly relevent
- weakly relevant: when x is not strongly relevent and exits subset of features that addig x to it improves B.O.C
- irrelevant: NOT( strongly or weakly relevant)
data:image/s3,"s3://crabby-images/9f682/9f68207d3a1d5f49463c7c1180a321d709c057f3" alt=""
data:image/s3,"s3://crabby-images/2bb2b/2bb2bee648c911cb6f39d5eac94d65c94d426091" alt=""
2016-03-16
网友评论