https://book.douban.com/subject/30243136/
Performance Metric
- F1 score: 2/F = 1/P + 1/R
- Other interpretations for AUC:
- Wilcoxon Test of Ranks
- Gini-index: Gini+1 = 2*AUC
- Not sensitive to predicted score
Feature Engineering and Feature Selection
Continuous Variables
- Bucketing for continuous variables in, for example, logistic regression (by width or by percentile)
- Missing value treatment (imputation or code dummy variables)
- Feed RF nodes to linear models
Discrete Variables
- Cross-interaction
- Statistics (e.g., unique values of B for each A)
Time, Space, Text Features
Popular Models
Logistic Regression:
- Why not OLS (outliers)
- How to solver: GD, or stochastic GD (Google FTRL)
- Advantage: Fast, scalable
FM
- Motivation:
- Feature interaction (not done manually)
- Polynomial kernel (too many parameters, too sparse matrix)
- Approach:
- Instead of learning all co-occurrence of i and j, the weight w is calculated as the dot product of v_i and v_j with dimension k.
- Here assumption is imposed on matrix W so that it can be de-composed.
- The parameters for different combinations are no longer independent
- Improvement:
- FFM to map similar features into a field
- Application:
- Serve as embedding for NN (e.g., User and Ad similarity)
- Outperforms GBDT for learn complicated feature interactions (due to sparse combinations)
GBDT
Compared with Linear Models: Missing value, Range difference of attributes,, outliers, interactions, non-linear decision boundary
网友评论