自动特征构建方法

作者: AntiGravity | 来源:发表于2020-05-15 16:29 被阅读0次

自动特征构建方法
半自动特征构建方法
DGA总结备忘
Maven自动化构建工具
算法笔记（19）自动特征选择及Python代码实现
[Python] 看binarytree源码深入学习二叉树
Paper Collection - Continuous In
Udacity 数据分析进阶课程笔记L42：特征选择
从0构建自动化测试平台(五)兼容性测试实现
NSMS V1.0 配置文档之Field字段用法

所谓自动构建，往往是参考了人的构建方法。

自动特征挖掘

一般是在人挖掘到再无可挖后，再进行自动特征挖掘的尝试。

交叉、特征构建函数比较有限
需要有衍生变量为基础

难点：

特征的构建跟模型有关（通常默认为逻辑回归）
组合优化（例如非连续变量，0,1的优化无法梯度下降，适合通过强化学习、神经网络的技巧）

遗传算法

遗传算法详解
简单来说，就是随机构造初代个体，每个个体都有编码（DNA）和解码（DNA对外的表现）。然后通过选择来淘汰、交叉来繁衍、变异来增加随机性。最终获得最优解。
以下是一个java实现的遗传算法主要部分。

package GeneticTSP;

import java.util.Random;

/**
 * 遗传算法类
 * 包含：
 *      1.run 开始跑算法
 *      2.createBeginningSpecies 创建种群
 *      3.calRate 计算每一种物种被选中的概率
 *      4.select  轮盘策略 选择适应度高的物种
 *      5.crossover 染色体交叉
 *      6.mutate 染色体变异
 *      7.getBest 获得适应度最大的物种
 */

public class GeneticAlgorithm {
    
        //开始遗传
        SpeciesIndividual run(SpeciesPopulation list)
        {
            //创建初始种群
            createBeginningSpecies(list);
            
            for(int i=1;i<=TSPData.DEVELOP_NUM;i++)
            {
                //选择
                select(list);
                
                //交叉
                crossover(list);
                
                //变异
                mutate(list);
            }   
            
            return getBest(list);
        }
        
        //创建初始种群
        void createBeginningSpecies(SpeciesPopulation list)
        {
            //100%随机
            int randomNum=(int)(TSPData.SPECIES_NUM);
            for(int i=1;i<=randomNum;i++)
            {
                SpeciesIndividual species=new SpeciesIndividual();//创建结点
                species.createByRandomGenes();//初始种群基因

                list.add(species);//添加物种
            }
            
//          //40%贪婪
//          int greedyNum=TSPData.SPECIES_NUM-randomNum;
//          for(int i=1;i<=greedyNum;i++)
//          {
//              SpeciesIndividual species=new SpeciesIndividual();//创建结点
//              species.createByGreedyGenes();//初始种群基因
    //
//              this.add(species);//添加物种
//          }
        }

        //计算每一物种被选中的概率
        void calRate(SpeciesPopulation list)
        {
            //计算总适应度
            float totalFitness=0.0f;
            list.speciesNum=0;
            SpeciesIndividual point=list.head.next;//游标
            while(point != null)//寻找表尾结点
            {
                point.calFitness();//计算适应度
                
                totalFitness += point.fitness;
                list.speciesNum++;

                point=point.next;
            }

            //计算选中概率
            point=list.head.next;//游标
            while(point != null)//寻找表尾结点
            {
                point.rate=point.fitness/totalFitness;
                point=point.next;
            }
        }
        
        //选择优秀物种（轮盘赌）
        void select(SpeciesPopulation list)
        {           
            //计算适应度
            calRate(list);
            
            //找出最大适应度物种
            float talentDis=Float.MAX_VALUE;
            SpeciesIndividual talentSpecies=null;
            SpeciesIndividual point=list.head.next;//游标

            while(point!=null)
            {
                if(talentDis > point.distance)
                {
                    talentDis=point.distance;
                    talentSpecies=point;
                }
                point=point.next;
            }

            //将最大适应度物种复制talentNum个
            SpeciesPopulation newSpeciesPopulation=new SpeciesPopulation();
            int talentNum=(int)(list.speciesNum/4);
            for(int i=1;i<=talentNum;i++)
            {
                //复制物种至新表
                SpeciesIndividual newSpecies=talentSpecies.clone();
                newSpeciesPopulation.add(newSpecies);
            }

            //轮盘赌list.speciesNum-talentNum次
            int roundNum=list.speciesNum-talentNum;
            for(int i=1;i<=roundNum;i++)
            {
                //产生0-1的概率
                float rate=(float)Math.random();
                
                SpeciesIndividual oldPoint=list.head.next;//游标
                while(oldPoint != null && oldPoint != talentSpecies)//寻找表尾结点
                {
                    if(rate <= oldPoint.rate)
                    {
                        SpeciesIndividual newSpecies=oldPoint.clone();
                        newSpeciesPopulation.add(newSpecies);
                        
                        break;
                    }
                    else
                    {
                        rate=rate-oldPoint.rate;
                    }
                    oldPoint=oldPoint.next;
                }
                if(oldPoint == null || oldPoint == talentSpecies)
                {
                    //复制最后一个
                    point=list.head;//游标
                    while(point.next != null)//寻找表尾结点
                        point=point.next;
                    SpeciesIndividual newSpecies=point.clone();
                    newSpeciesPopulation.add(newSpecies);
                }
                
            }
            list.head=newSpeciesPopulation.head;
        }
        
        //交叉操作
        void crossover(SpeciesPopulation list)
        {
            //以概率pcl~pch进行
            float rate=(float)Math.random();
            if(rate > TSPData.pcl && rate < TSPData.pch)
            {           
                SpeciesIndividual point=list.head.next;//游标
                Random rand=new Random();
                int find=rand.nextInt(list.speciesNum);
                while(point != null && find != 0)//寻找表尾结点
                {
                    point=point.next;
                    find--;
                }
            
                if(point.next != null)
                {
                    int begin=rand.nextInt(TSPData.CITY_NUM);

                    //取point和point.next进行交叉，形成新的两个染色体
                    for(int i=begin;i<TSPData.CITY_NUM;i++)
                    {
                        //找出point.genes中与point.next.genes[i]相等的位置fir
                        //找出point.next.genes中与point.genes[i]相等的位置sec
                        int fir,sec;
                        for(fir=0;!point.genes[fir].equals(point.next.genes[i]);fir++);
                        for(sec=0;!point.next.genes[sec].equals(point.genes[i]);sec++);
                        //两个基因互换
                        String tmp;
                        tmp=point.genes[i];
                        point.genes[i]=point.next.genes[i];
                        point.next.genes[i]=tmp;
                        
                        //消去互换后重复的那个基因
                        point.genes[fir]=point.next.genes[i];
                        point.next.genes[sec]=point.genes[i];
                        
                    }
                }
            }
        }
        
        //变异操作
        void mutate(SpeciesPopulation list)
        {   
            //每一物种均有变异的机会,以概率pm进行
            SpeciesIndividual point=list.head.next;
            while(point != null)
            {
                float rate=(float)Math.random();
                if(rate < TSPData.pm)
                {
                    //寻找逆转左右端点
                    Random rand=new Random();
                    int left=rand.nextInt(TSPData.CITY_NUM);
                    int right=rand.nextInt(TSPData.CITY_NUM);
                    if(left > right)
                    {
                        int tmp;
                        tmp=left;
                        left=right;
                        right=tmp;
                    }
                    
                    //逆转left-right下标元素
                    while(left < right)
                    {
                        String tmp;
                        tmp=point.genes[left];
                        point.genes[left]=point.genes[right];
                        point.genes[right]=tmp;

                        left++;
                        right--;
                    }
                }
                point=point.next;
            }
        }

        //获得适应度最大的物种
        SpeciesIndividual getBest(SpeciesPopulation list)
        {
            float distance=Float.MAX_VALUE;
            SpeciesIndividual bestSpecies=null;
            SpeciesIndividual point=list.head.next;//游标
            while(point != null)//寻找表尾结点
            {
                if(distance > point.distance)
                {
                    bestSpecies=point;
                    distance=point.distance;
                }

                point=point.next;
            }
            
            return bestSpecies;
        }

}

Symbolic Learning

通过遗传算法构造衍生变量
gplearn的案例

Auto Cross

主要目的：寻找交叉效应
创新：
- beam search（对贪婪算法优化，每一轮多留几个选项，避免陷入局部最优，不过不能完全避免）
- 简化的逻辑回归求解方式（新变量加入后，固定其他参数，只对新变量求解。近似估计，增加速度）
可以提升
- meta feature（特征的方差、缺失值、百分之多少是一样的）
- 更好的优化方法（遗传算法变异时给一定的方向（强化学习））

自动特征构建方法
所谓自动构建，往往是参考了人的构建方法。自动特征挖掘一般是在人挖掘到再无可挖后，再进行自动特征挖掘的尝试。交...
半自动特征构建方法
Target Mean Encoding 对离散变量最有效的编码方式之一，尤其对高维度，重点在于防止过拟合例如，...
DGA总结备忘
1、适合中小企业的DGA域名检测使用LSTM或者CNN构建的DGA检测模型，这种方法需要使用深度学习自动提取特征...
Maven自动化构建工具
Maven是Apache软件基金会的一个开源项目，用来处理Java工程的自动化构建，它有两个主要特征：构建自动...
算法笔记（19）自动特征选择及Python代码实现
自动特征选择常用方法包括使用单一变量法进行特征选择、基于模型的特征选择、迭代式特征选择。使用单一变量法进行特征选...
[Python] 看binarytree源码深入学习二叉树
1. binarytree 库 binarytree 1.1 运行环境 1.2 安装方法 1.3 自动构建随机二叉...
Paper Collection - Continuous In
github上面一个可以自动构建的项目 Xcode 自动构建命令——xcodebuild iOS自动打包并发布脚本
Udacity 数据分析进阶课程笔记L42：特征选择
方法一：加入新特征的通过直觉构建代码实现可视化评估重复上述过程警惕特征漏洞任何人都有可能犯错—要对你得到的结果持...
从0构建自动化测试平台(五)兼容性测试实现
往期文章从0构建自动化测试平台(一)之技术选型从0构建自动化测试平台(二)WEB服务器构建从0构建自动化测试...
NSMS V1.0 配置文档之Field字段用法
NSMS延用了 ANDX的数据库连接方式，提供了自动构建 MySQL查询语句的封装方法。NSMS为保障这些封装方法...

自动特征构建方法

自动特征挖掘

难点：

遗传算法

Symbolic Learning

Auto Cross

相关文章

自动特征构建方法

半自动特征构建方法

DGA总结备忘

Maven自动化构建工具

算法笔记（19）自动特征选择及Python代码实现

[Python] 看binarytree源码深入学习二叉树

Paper Collection - Continuous In

Udacity 数据分析进阶课程笔记L42：特征选择

从0构建自动化测试平台(五)兼容性测试实现

NSMS V1.0 配置文档之Field字段用法

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读