美文网首页
《美团机器学习实践》笔记

《美团机器学习实践》笔记

作者: kingstone010148 | 来源:发表于2019-02-17 14:48 被阅读0次

    https://book.douban.com/subject/30243136/

    Performance Metric

    • F1 score: 2/F = 1/P + 1/R
    • Other interpretations for AUC:
      • Wilcoxon Test of Ranks
      • Gini-index: Gini+1 = 2*AUC
      • Not sensitive to predicted score

    Feature Engineering and Feature Selection

    Continuous Variables

    • Bucketing for continuous variables in, for example, logistic regression (by width or by percentile)
    • Missing value treatment (imputation or code dummy variables)
    • Feed RF nodes to linear models

    Discrete Variables

    • Cross-interaction
    • Statistics (e.g., unique values of B for each A)

    Time, Space, Text Features

    Popular Models

    Logistic Regression:

    • Why not OLS (outliers)
    • How to solver: GD, or stochastic GD (Google FTRL)
    • Advantage: Fast, scalable

    FM

    • Motivation:
      • Feature interaction (not done manually)
      • Polynomial kernel (too many parameters, too sparse matrix)
    • Approach:
      • Instead of learning all co-occurrence of i and j, the weight w is calculated as the dot product of v_i and v_j with dimension k.
      • Here assumption is imposed on matrix W so that it can be de-composed.
      • The parameters for different combinations are no longer independent
    • Improvement:
      • FFM to map similar features into a field
    • Application:
      • Serve as embedding for NN (e.g., User and Ad similarity)
      • Outperforms GBDT for learn complicated feature interactions (due to sparse combinations)

    GBDT
    Compared with Linear Models: Missing value, Range difference of attributes,, outliers, interactions, non-linear decision boundary

    Data Mining

    相关文章

      网友评论

          本文标题:《美团机器学习实践》笔记

          本文链接:https://www.haomeiwen.com/subject/qczmsqtx.html