美文网首页
Feature Selection

Feature Selection

作者: asuka_19d5 | 来源:发表于2018-10-25 08:48 被阅读0次

    Feature Selection

    Ensemble Learning
    • bagging method and boosting method
    Bagging
    • sampling with replacement
    • decrease variance by introducing randomness into your model framework
    • random forest = bagging + decision tree
    Random Forest
    • description of random forest
      there are n samples with m features in the training data
      • take n observations with replacement each time 行上有放回的抽样
      • in these n observations, take k features (k < m) to calculate a best decision tree 列上抽样不放回
      • repeat the below steps for several times, combine all the decision tree to a random forest
    • features:
      • decrease variance by introducing randomness into the model framework
      • the destributions of each n observations are the same as the original training data
      • we don't need to do pruning for each "weak" decision tree
      • less overfitting
      • parallel implementation
    • Feature Importance value in Random Forest
      importance(i) = performance(RF) - performance(RF^{random \,i})(advance topic: out of bag evaluation)
      how to define performance? let the column of the feature be a list of random numbers and calculate the output by the model, get the loss
      • not negative nor positive, just show how much a special feature influences the model
    Support Vector Machine
    • SVM: maximize the minimum margin


      image.png
      image.png
      if not linear, add noise factors into the equation to map it to a higher dimension space, applying kernel function image.png
    Why Feature Selection?
    • reduce overfitting
    • better understanding your model
    • improve model stability (i.e. improve generalization)
      取决于你想要做什么,如果是做一个调查,想研究每一个feature的贡献,则需要删除一些data以减少相关性太大的features对模型的影响;如果想要做prediction,则只关心结果是否准确,不太需要删除features。模型稳定性差:某一个feature变化一点点而导致系数变化特别大,说明模型不稳定variance特别大,原因可能是model特别复杂或者相关性features太多。解决办法最直观的:regularization
    Pearson Correlation

    to measrue linear dependency between features
    \rho_{x_1, x_2} = \frac{ cov(x_1, x_2) }{\sigma x_1 \sigma x_2}

    • cov(x_1, x_2)means covariance and \sigma means standard deviation
    • covariance:
      cov(x_1, x_2) = E[(x_1 - E(x_2))(x_2 - E(x_1))] = E(x_1x_2) - E(x_1)E(x_2) where \sigma x_1^2 = E(x_1^2) - E(x_1)^2
    Regularization Models

    L1 tends to provide sparse solution
    L2 tends to spread out more equality

    for example: image.png
    Principal Component Analysis

    相关文章

      网友评论

          本文标题:Feature Selection

          本文链接:https://www.haomeiwen.com/subject/dhegzftx.html