Lift的计算公式
Lift曲线的衡量的是模型通过某个阈值划定预测结果的命中率,对比不用模型随机划定结果的命中率的提升度.
image.png
数据输入
输入:predictions, labels,threshold_list,cut_point
predictions: 为每条样本的预测值组成的集合,预测概率在0-1之间
labels: 为每条样本的真实值(0, 1)组成的集合,本例中1是坏客户
threshold_list: 阈值列表
cut_point: KS的阈值分割点的数量
数据预览,左列labels,右列predictions
head -4 test_predict_res.txt
0.0 0.831193
0.0 0.088209815
1.0 0.93411493
0.0 0.022157196
python代码实现
def lift_plot(predictions, labels, threshold_list, cut_point=100):
base = len([x for x in labels if x == 1]) / len(labels)
predictions_labels = list(zip(predictions, labels))
lift_values = []
x_axis_range = np.linspace(0, 1, cut_point)
x_axis_valid = []
for i in x_axis_range:
hit_data = [x[1] for x in predictions_labels if x[0] > i]
if hit_data: # 避免为空
bad_hit = [x for x in hit_data if x == 1]
precision = len(bad_hit) / len(hit_data)
lift_value = precision / base
lift_values.append(lift_value)
x_axis_valid.append(i)
plt.plot(x_axis_valid, lift_values, color="blue") # 提升线
plt.plot([0, 1], [1, 1], linestyle="-", color="darkorange", alpha=0.5, linewidth=2) # base线
for threshold in threshold_list:
threshold_hit_data = [x[1] for x in predictions_labels if x[0] > threshold]
if threshold_hit_data:
threshold_bad_hit = [x for x in threshold_hit_data if x == 1]
threshold_precision = len(threshold_bad_hit) / len(threshold_hit_data)
threshold_lift_value = threshold_precision / base
plt.scatter([threshold], [threshold_lift_value], color="white", edgecolors="blue", s=20, label="threshold:{} lift:{}",format(threshold, round(threshold_lift_)value, 2))) # 阈值点
plt.plot([threshold, threshold], [0, 20], linestyle="--", color="black", alpha=0.2, linewidth=1) # 阈值的纵轴
plt.text(threshold - 0.02, threshold_lift_value + 1, round(threshold_lift_value, 2))
plt.title("Lift plot")
plt.legend(loc=2, prop={"size": 9})
plt.grid()
plt.show()
if __name__ == "__main__":
# 读取预测数据和真实标签
labels = []
predictions = []
with open("test_predict_res.txt", "r", encoding="utf8") as f:
for line in f.readlines()
labels.append(float(line.strip().split()[0]))
predictions.append(float(line.strip().split()[1]))
lift_plot(predictions, labels, [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
lift图.png
Lift图的解释
举例预测企业风险,预测概率越接近1是高风险企业,在真实的坏企业发生比例
的情况下(极低),随着模型预测概率越高,模型命中真实坏企业的能力越强,比如以0.8作为阈值,模型的预测能力比随机瞎猜提高4.23倍。
网友评论