决策树

作者: wwq2020 | 来源:发表于2020-09-18 16:28 被阅读0次

    本文针对id3算法

    思想就是计算数据集中数据按某个特征方式分类后产生的信息增益最大者先进行分类,递归的构建一棵树,直到剩余的都是同分类,就是所谓的决策树

    代码来自机器学习实战

    def createDataSet():
        dataSet = [[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'],
                   [0, 1, 'no']]
        labels = ['no sufacing', 'flippers']
        return dataSet, labels
    
    
    def calcShannonEnt(dataSet):
        numEntries = len(dataSet)
        labelCounts = {}
        for featVec in dataSet:
            currentLabel = featVec[-1]
            if currentLabel not in labelCounts.keys():
                labelCounts.setdefault(currentLabel, 0)
            labelCounts[currentLabel] += 1
    
        shannonEnt = 0.0
        for key in labelCounts:
            prob = float(labelCounts[key]) / numEntries
            shannonEnt += prob * math.log2(1 / prob)
        return shannonEnt
    
    
    def splitDataSet(dataSet, axis, value):
        retDataSet = []
        for featVec in dataSet:
            if featVec[axis] == value:
                reduceFeatVec = featVec[:axis]
                reduceFeatVec.extend(featVec[axis + 1:])
                retDataSet.append(reduceFeatVec)
        return retDataSet
    
    
    def chooseBestFeatureToSplit(dataSet):
        numFeature = len(dataSet[0]) - 1
        print(numFeature)
        baseEntropy = calcShannonEnt(dataSet)
        bestInforGain = 0
        bestFeature = -1
    
        for i in range(numFeature):
            featList = [number[i] for number in dataSet]
            uniqualVals = set(featList)
            newEntrogy = 0
    
            for value in uniqualVals:
                subDataSet = splitDataSet(dataSet, i, value)
                prob = len(subDataSet) / float(len(dataSet))
                newEntrogy += prob * calcShannonEnt(subDataSet)
    
            infoGain = baseEntropy - newEntrogy
            print(infoGain)
    
            if infoGain > bestInforGain:
                bestInforGain = infoGain
                bestFeature = i
        return bestFeature
    
    

    相关文章

      网友评论

          本文标题:决策树

          本文链接:https://www.haomeiwen.com/subject/veejyktx.html