美文网首页
Feature Selection

Feature Selection

作者: asuka_19d5 | 来源:发表于2018-10-25 08:48 被阅读0次

Feature Selection

Ensemble Learning
  • bagging method and boosting method
Bagging
  • sampling with replacement
  • decrease variance by introducing randomness into your model framework
  • random forest = bagging + decision tree
Random Forest
  • description of random forest
    there are n samples with m features in the training data
    • take n observations with replacement each time 行上有放回的抽样
    • in these n observations, take k features (k < m) to calculate a best decision tree 列上抽样不放回
    • repeat the below steps for several times, combine all the decision tree to a random forest
  • features:
    • decrease variance by introducing randomness into the model framework
    • the destributions of each n observations are the same as the original training data
    • we don't need to do pruning for each "weak" decision tree
    • less overfitting
    • parallel implementation
  • Feature Importance value in Random Forest
    importance(i) = performance(RF) - performance(RF^{random \,i})(advance topic: out of bag evaluation)
    how to define performance? let the column of the feature be a list of random numbers and calculate the output by the model, get the loss
    • not negative nor positive, just show how much a special feature influences the model
Support Vector Machine
  • SVM: maximize the minimum margin


    image.png
    image.png
    if not linear, add noise factors into the equation to map it to a higher dimension space, applying kernel function image.png
Why Feature Selection?
  • reduce overfitting
  • better understanding your model
  • improve model stability (i.e. improve generalization)
    取决于你想要做什么,如果是做一个调查,想研究每一个feature的贡献,则需要删除一些data以减少相关性太大的features对模型的影响;如果想要做prediction,则只关心结果是否准确,不太需要删除features。模型稳定性差:某一个feature变化一点点而导致系数变化特别大,说明模型不稳定variance特别大,原因可能是model特别复杂或者相关性features太多。解决办法最直观的:regularization
Pearson Correlation

to measrue linear dependency between features
\rho_{x_1, x_2} = \frac{ cov(x_1, x_2) }{\sigma x_1 \sigma x_2}

  • cov(x_1, x_2)means covariance and \sigma means standard deviation
  • covariance:
    cov(x_1, x_2) = E[(x_1 - E(x_2))(x_2 - E(x_1))] = E(x_1x_2) - E(x_1)E(x_2) where \sigma x_1^2 = E(x_1^2) - E(x_1)^2
Regularization Models

L1 tends to provide sparse solution
L2 tends to spread out more equality

for example: image.png
Principal Component Analysis

相关文章

网友评论

      本文标题:Feature Selection

      本文链接:https://www.haomeiwen.com/subject/dhegzftx.html