美文网首页程序员互联网科技人工智能/模式识别/机器学习精华专题
机器学习特征工程3-自动特征构造(FeatureTools)

机器学习特征工程3-自动特征构造(FeatureTools)

作者: scottlin | 来源:发表于2018-07-15 21:49 被阅读45次

    FeatureTools 介绍

    Featuretools是一个执行自动特征工程的框架。它擅长于为深度学习把互相关联的数据集转换为特征矩阵。我们可以将特征构造的操作分为两类:「转换」和「聚合」。我们通过下面的例子来了解FeatureTools使用方法。
    代码示例地址:
    https://github.com/scottlinlin/auto_feature_demo.git

    安装

    pip install featuretools
    

    快速入门

    1、导入feauretool

    import featuretools as ft
    

    2、加载数据

    #加载数据
    clients = pd.read_csv('data/clients.csv', parse_dates = ['joined'])
    loans = pd.read_csv('data/loans.csv', parse_dates = ['loan_start', 'loan_end'])
    payments = pd.read_csv('data/payments.csv', parse_dates = ['payment_date'])
    

    输出:



    3、创建实体和实体集

    #创建实体
    es = ft.EntitySet(id = 'clients')
    
    #添加clients实体
    es = es.entity_from_dataframe(entity_id = 'clients', dataframe = clients, 
                                  index = 'client_id', time_index = 'joined')
    
    #添加loads实体
    es = es.entity_from_dataframe(entity_id = 'loans', dataframe = loans, 
                                  variable_types = {'repaid': ft.variable_types.Categorical},
                                  index = 'loan_id', 
                                  time_index = 'loan_start')
    
    
    #添加pyments实体
    es = es.entity_from_dataframe(entity_id = 'payments', 
                                  dataframe = payments,
                                  variable_types = {'missed': ft.variable_types.Categorical},
                                  make_index = True,
                                  index = 'payment_id',
                                  time_index = 'payment_date')
    #打印实体集
    es
    

    输出:



    4、添加实体关系

    # 通过client_id 关联clients和loans实体
    r_client_previous = ft.Relationship(es['clients']['client_id'],
                                        es['loans']['client_id'])
    es = es.add_relationship(r_client_previous)
    
    # 通过loan_id 关联payments和loans实体
    r_payments = ft.Relationship(es['loans']['loan_id'],
                                 es['payments']['loan_id'])
    es = es.add_relationship(r_payments)
    
    #打印实体集
    es
    

    输出:



    5、聚合特征,并生成新特征

    #聚合特征,并生成新特征
    features, feature_names = ft.dfs(entityset = es, target_entity = 'clients')
    features.head()
    

    输入:



    6、聚合特征,通过指定聚合和转换函数生成新特征

    #聚合特征,通过指定聚合agg_primitives和转换trans_primitives生成新特征
    features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                     agg_primitives = ['mean', 'max', 'percent_true', 'last'],
                                     trans_primitives = ['years', 'month', 'subtract', 'divide'])
    features.head()
    

    输出:


    相关文章

      网友评论

        本文标题:机器学习特征工程3-自动特征构造(FeatureTools)

        本文链接:https://www.haomeiwen.com/subject/hpappftx.html