美文网首页数据挖掘
python画Lift图,Lift曲线

python画Lift图,Lift曲线

作者: xiaogp | 来源:发表于2020-04-17 11:42 被阅读0次

    Lift的计算公式

    Lift曲线的衡量的是模型通过某个阈值划定预测结果的命中率,对比不用模型随机划定结果的命中率的提升度.


    image.png

    Lift = \frac{True Positive / (True Positive + False Positive) }{(True Positive + False Negative) / (True Positive + False Positive + True Neagtive + False Negative)}

    数据输入

    输入:predictions, labels,threshold_list,cut_point

    predictions: 为每条样本的预测值组成的集合,预测概率在0-1之间
    labels: 为每条样本的真实值(0, 1)组成的集合,本例中1是坏客户
    threshold_list: 阈值列表
    cut_point: KS的阈值分割点的数量

    数据预览,左列labels,右列predictions

    head -4 test_predict_res.txt
    0.0 0.831193
    0.0 0.088209815
    1.0 0.93411493
    0.0 0.022157196
    

    python代码实现

    def lift_plot(predictions, labels, threshold_list, cut_point=100):
        base = len([x for x in labels if x == 1]) / len(labels)
        predictions_labels = list(zip(predictions, labels))
        lift_values = []
    
        x_axis_range = np.linspace(0, 1, cut_point)
        x_axis_valid = []
        for i in x_axis_range:
            hit_data = [x[1] for x in predictions_labels if x[0] > i]
            if hit_data:  # 避免为空
                bad_hit = [x for x in hit_data if x == 1]
                precision = len(bad_hit) / len(hit_data)
                lift_value = precision / base
                lift_values.append(lift_value)
                x_axis_valid.append(i)
    
        plt.plot(x_axis_valid, lift_values, color="blue")  # 提升线
        plt.plot([0, 1], [1, 1], linestyle="-", color="darkorange", alpha=0.5, linewidth=2)  # base线
        
        for threshold in threshold_list:
            threshold_hit_data = [x[1] for x in predictions_labels if x[0] > threshold]
            if threshold_hit_data:
                threshold_bad_hit = [x for x in threshold_hit_data if x == 1]
                threshold_precision = len(threshold_bad_hit) / len(threshold_hit_data)
                threshold_lift_value = threshold_precision / base
                plt.scatter([threshold], [threshold_lift_value], color="white", edgecolors="blue", s=20, label="threshold:{} lift:{}",format(threshold, round(threshold_lift_)value, 2)))  # 阈值点
                plt.plot([threshold, threshold], [0, 20], linestyle="--", color="black", alpha=0.2, linewidth=1)  # 阈值的纵轴
                plt.text(threshold - 0.02, threshold_lift_value + 1, round(threshold_lift_value, 2))
        plt.title("Lift plot")
        plt.legend(loc=2, prop={"size": 9})
        plt.grid()
        plt.show()
    
    
    if __name__ == "__main__":
        # 读取预测数据和真实标签
        labels = []
        predictions = []
        with open("test_predict_res.txt", "r", encoding="utf8") as f:
            for line in f.readlines()
                labels.append(float(line.strip().split()[0]))
                predictions.append(float(line.strip().split()[1]))
    
        lift_plot(predictions, labels, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
    
    lift图.png

    Lift图的解释

    举例预测企业风险,预测概率越接近1是高风险企业,在真实的坏企业发生比例的情况下(极低),随着模型预测概率越高,模型命中真实坏企业的能力越强,比如以0.8作为阈值,模型的预测能力比随机瞎猜提高4.23倍。

    相关文章

      网友评论

        本文标题:python画Lift图,Lift曲线

        本文链接:https://www.haomeiwen.com/subject/opjevhtx.html