Feature Selection

Feature Selection

作者: asuka_19d5 | 来源:发表于2018-10-25 08:48 被阅读0次

Feature Selection
单细胞学习2
feature selection & feature extr
2018-07-16
特征选择，经典三刀
Filters in Feature Selection
Machine Learning笔记第09周
一文读懂聚类特征选择
processor wei's papers 阅读
3. 机器学习之特征选择

Feature Selection

Ensemble Learning

bagging method and boosting method

Bagging

sampling with replacement
decrease variance by introducing randomness into your model framework
random forest = bagging + decision tree

Random Forest

description of random forest
there are n samples with m features in the training data
- take n observations with replacement each time 行上有放回的抽样
- in these n observations, take k features (k < m) to calculate a best decision tree 列上抽样不放回
- repeat the below steps for several times, combine all the decision tree to a random forest
features:
- decrease variance by introducing randomness into the model framework
- the destributions of each n observations are the same as the original training data
- we don't need to do pruning for each "weak" decision tree
- less overfitting
- parallel implementation
Feature Importance value in Random Forest
(advance topic: out of bag evaluation)
how to define performance? let the column of the feature be a list of random numbers and calculate the output by the model, get the loss
- not negative nor positive, just show how much a special feature influences the model

Support Vector Machine

SVM: maximize the minimum margin

image.png
image.png
if not linear, add noise factors into the equation to map it to a higher dimension space, applying kernel function image.png

Why Feature Selection?

reduce overfitting
better understanding your model
improve model stability (i.e. improve generalization)
取决于你想要做什么，如果是做一个调查，想研究每一个feature的贡献，则需要删除一些data以减少相关性太大的features对模型的影响；如果想要做prediction，则只关心结果是否准确，不太需要删除features。模型稳定性差：某一个feature变化一点点而导致系数变化特别大，说明模型不稳定variance特别大，原因可能是model特别复杂或者相关性features太多。解决办法最直观的：regularization

Pearson Correlation

to measrue linear dependency between features
$\rho_{x_1, x_2} = \frac{ cov(x_1, x_2) }{\sigma x_1 \sigma x_2}$

$cov(x_1, x_2)$ means covariance and $\sigma$ means standard deviation
covariance:
$cov(x_1, x_2) = E[(x_1 - E(x_2))(x_2 - E(x_1))] = E(x_1x_2) - E(x_1)E(x_2)$ where $\sigma x_1^2 = E(x_1^2) - E(x_1)^2$

Regularization Models

L1 tends to provide sparse solution
L2 tends to spread out more equality

for example:

image.png

Principal Component Analysis

相关文章

Feature Selection
Feature Selection Ensemble Learning bagging method and bo...
单细胞学习2
feature selection 比较不同feature selection方法单细胞转录组测序的确可以一次性...
feature selection & feature extr
1、概述——特征选择 & 特征提取特征选择（feature selection）和特征提取（feature ex...
2018-07-16
[1807.04800] Feature Selection for Gender Classification ...
特征选择，经典三刀
特征选择（Feature Selection,FS）和特征抽取（Feature Extraction, FE）是...
Filters in Feature Selection
Filters are a kind of feature selection algorithms that c...
Machine Learning笔记第09周
The aim of Feature selection can be find the knowledge in...
一文读懂聚类特征选择
《Feature Selection for Clustering:A Review》 0.1 introduct...
processor wei's papers 阅读
Unsupervised Feature Selection with Joint Clustering Anal...
3. 机器学习之特征选择
1. 特征选择( Feature Selection ) 1.1 特征选择的定义特征选择( Feature ...

网友评论

本文标题：Feature Selection

本文链接：https://www.haomeiwen.com/subject/dhegzftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|Feature Selection|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！