tips:
- 将验证集中的query进行一下筛选
- 测试一下正负样本权重的作用
- 在之前的模型上测试一下效果
- 使用输出分数并添加阈值的方式来进行过滤可能效果会更好一点
- 如果有效果就加到mv-dssm上边去
dssm网络效果:
原始网络(全联接层 + bn + drop out):
acc: 0.87, auc: 0.5, precision: 0.87
使用hard samples:网络不收敛
添加attention:
acc:0.699424736337 auc:0.703279355108 precision:0.941307578009
只使用title进行attention:(说明有效果,可以加到mvdssm上去)
acc:0.63518696069 auc:0.6106462139 precision:0.91075604053
测试正样本权重效果
文章中加了权重的效果明显提升的原因可能是样本不均衡的问题,负样本太多。
pai -name tensorflow140 -Dscript="file:///home/hengsong/origin_deep_cluster_odps_8.tar.gz" -DentryFile="train_inference_v9.py" -Dcluster='{"worker":{"count":50, "cpu":200, "memory":4000}, "ps":{"count":10, "cpu":200, "memory":5000}}' -DuseSparseClusterSchema=True -DenableDynamicCluster=True -Dtables="odps://graph_embedding/tables/hs_train_data_dssm_v2_7,odps://graph_embedding/tables/hs_test_data_dssm_v2_7,odps://graph_embedding/tables/hs_tmp_267" -Doutputs="odps://graph_embedding/tables/hs_dssm_result_5" -DcheckpointDir="oss://bucket-automl/hengsong/?role_arn=acs:ram::1293303983251548:role/graph2018&host=cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--learning_rate=3e-4 --batch_size=1024 --is_save_model=True --attention_type=1 --num_epochs=1 --ckpt=hs_ugc_video_4e_10.ckpt" -DuseSparseClusterSchema=True;
20190823095451775gczy8959
inference
pai -name tensorflow140 -Dscript="file:///home/hengsong/origin_deep_cluster_odps_8.tar.gz" -DentryFile="inference_v8.py" -Dcluster='{"worker":{"count":1, "cpu":200, "memory":4000}, "ps":{"count":1, "cpu":200, "memory":5000}}' -DuseSparseClusterSchema=True -DenableDynamicCluster=True -Dtables="odps://graph_embedding/tables/hs_train_data_dssm_v2_7,odps://graph_embedding/tables/hs_test_data_dssm_v2_7,odps://graph_embedding/tables/hs_tmp_267" -Doutputs="odps://graph_embedding/tables/hs_dssm_result_5" -DcheckpointDir="oss://bucket-automl/hengsong/?role_arn=acs:ram::1293303983251548:role/graph2018&host=cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--learning_rate=3e-4 --batch_size=1024 --is_save_model=True --attention_type=1 --num_epochs=1 --ckpt=hs_ugc_video_4e_10.ckpt-1" -DuseSparseClusterSchema=True;
pai -name tensorflow140 -Dscript="file:///home/hengsong/origin_deep_cluster_odps_8.tar.gz" -DentryFile="inference_v9.py" -Dcluster='{"worker":{"count":1, "cpu":200, "memory":4000}, "ps":{"count":1, "cpu":200, "memory":5000}}' -DuseSparseClusterSchema=True -DenableDynamicCluster=True -Dtables="odps://graph_embedding/tables/hs_train_data_dssm_v2_7,odps://graph_embedding/tables/hs_test_data_dssm_v2_7,odps://graph_embedding/tables/hs_tmp_267" -Doutputs="odps://graph_embedding/tables/hs_dssm_result_3" -DcheckpointDir="oss://bucket-automl/hengsong/?role_arn=acs:ram::1293303983251548:role/graph2018&host=cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--learning_rate=3e-4 --batch_size=1024 --is_save_model=True --attention_type=1 --num_epochs=1 --ckpt=hs_ugc_video_4e_10.ckpt-1" -DuseSparseClusterSchema=True;
网友评论