美文网首页
20190725工作进展

20190725工作进展

作者: Songger | 来源:发表于2019-07-25 13:35 被阅读0次
    1. 每种优化器都有自己最合适的学习率???为啥啊???

    2. 测试auc数值和sklearn的差别

    3. 发现是读数据的问题


      刚读进来的数据

      解决方法:分隔符错了

    4. auc在0.5左右波动的问题:
      考虑到网络参数异常,使用不同的学习率进行测试:

    学习率1e-2: auc在0.5波动

    学习率1e-4: 停滞在全零估计结果中

    学习率1e-6: 停滞在全1估计结果中

    1. 考虑到使用的优化器可能有问题(肯定没有,不过还是改成最常用的adam试一下吧。。。)

    使用adam,然后学习率改成3e-4进行尝试

    学习率3e-4: 停滞在全零估计中

    学习率1e-5:

    学习率1e-5,batchsize2048:

    1. 考虑到可能是网络权重变化过大的原因,尝试将网络权重打印出来,因为最后一层的权重直接影响到结果,选择最后一层网络权重进行测试。

    2. 因为输入数据中只有title,所以将原来的concat去掉之后看一下效果:
      效果明显编号,出乎意料

    现在使用不同的优化器和不同的batch size来进行测试:

    pai -name tensorflow140 -Dscript="file:///home/hengsong/origin_deep_cluster_odps_8.tar.gz" -DentryFile="train_v4.py" -Dcluster='{"worker":{"count":30, "cpu":200, "memory":4000}, "ps":{"count":10, "cpu":200, "memory":5000}}' -Dtables="odps://graph_embedding/tables/hs_train_data_dssm_2,odps://graph_embedding/tables/hs_test_data_dssm_2" -DcheckpointDir="oss://bucket-automl/hengsong/?role_arn=acs:ram::1293303983251548:role/graph2018&host=cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--learning_rate=3e-4 --batch_size=256 --is_save_model=True --attention_type=1 --num_epochs=10000 --ckpt=hs_ugc_video.ckpt" -DuseSparseClusterSchema=True;

    learning rate 1e-5, batch size 256: 结果中还是有很多正样本会被预测成负样本,但是预测为正的结果基本上都是对的

    adam learning rate 3e-4, batch size 256: 可以看出来,adam的效果明显更好

    adam learning rate 3e-4, batch size 2048: 可以看出来,batch size设置为2048效果明显更好

    http://logview.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190725080111552g98sqtvj2_e132b8f3_c0ba_4efa_beab_ec1d2e58381e&token=OXhadUtRZVBxM0JUTExObWF3NGlwaEg5N1gwPSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjQ2NDY0NzMseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MjUwODAxMTE1NTJnOThzcXR2ajJfZTEzMmI4ZjNfYzBiYV80ZWZhX2JlYWJfZWMxZDJlNTgzODFlIl19XSwiVmVyc2lvbiI6IjEifQ==

    使用不同的attention进行测试:结果全零

    使用70亿条数据进行训练:batch size 256, learning rate 3e-4 with adam

    1. 新问题:训练集和测试集的acc差别很大,相差0.7左右
      可能的原因:之己算所有的参数的时候都是从第一个epoch开始,计算到当前的均值,因为test每隔50个epoch才会更新一次,因此可能会滞后一些

    2. batch size使用2048的效果要明显优于256,使用1024对大数据进行测试

    3. 关于tensorflow auc代码的问题和修正

    可以参考以下链接:
    https://zhoujiansun.wordpress.com/2018/08/06/tensorflow%EF%BC%9Aauc%E8%AE%A1%E7%AE%97%E6%96%B9%E6%B3%95%E4%BF%AE%E6%AD%A3/

    1. 最后留下的两个程序

    pai -name tensorflow140 -Dscript="file:///home/hengsong/origin_deep_cluster_odps_8.tar.gz" -DentryFile="train_v4.py" -Dcluster='{"worker":{"count":30, "cpu":200, "memory":4000}, "ps":{"count":10, "cpu":200, "memory":5000}}' -Dtables="odps://graph_embedding/tables/hs_train_data_dssm_2,odps://graph_embedding/tables/hs_test_data_dssm_2" -DcheckpointDir="oss://bucket-automl/hengsong/?role_arn=acs:ram::1293303983251548:role/graph2018&host=cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--learning_rate=3e-4 --batch_size=1024 --is_save_model=True --attention_type=1 --num_epochs=100 --ckpt=hs_ugc_video.ckpt" -DuseSparseClusterSchema=True;

    pai -name tensorflow140 -Dscript="file:///home/hengsong/origin_deep_cluster_odps_8.tar.gz" -DentryFile="train_v4.py" -Dcluster='{"worker":{"count":10, "cpu":200, "memory":4000}, "ps":{"count":3, "cpu":200, "memory":5000}}' -Dtables="odps://graph_embedding/tables/hs_train_data_dssm_3,odps://graph_embedding/tables/hs_test_data_dssm_3" -DcheckpointDir="oss://bucket-automl/hengsong/?role_arn=acs:ram::1293303983251548:role/graph2018&host=cn-hangzhou.oss-internal.aliyun-inc.com" -DuserDefinedParameters="--learning_rate=3e-4 --batch_size=1024 --is_save_model=True --attention_type=1 --num_epochs=1000 --ckpt=hs_ugc_video.ckpt" -DuseSparseClusterSchema=True;

    使用40w数据进行训练:

    batch size256
    https://logview.alibaba-inc.com/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190725155755428ga1tqtvj2_38a3203c_b5a3_4227_8842_6e57755a6965&token=WkkzczUyY2twcmo3enFNQXRFQ2ZOc3hjV3dvPSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjQ2NzUwNzcseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MjUxNTU3NTU0MjhnYTF0cXR2ajJfMzhhMzIwM2NfYjVhM180MjI3Xzg4NDJfNmU1Nzc1NWE2OTY1Il19XSwiVmVyc2lvbiI6IjEifQ==

    batch size2048
    http://logview.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190725161452143guuvqtvj2_b4532ada_b1a0_4dae_a5b0_686d332af28f&token=L0EyOUJHdExxamUxS2w3NEdla3VVWmwrTmxzPSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjQ2NzYwOTQseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MjUxNjE0NTIxNDNndXV2cXR2ajJfYjQ1MzJhZGFfYjFhMF80ZGFlX2E1YjBfNjg2ZDMzMmFmMjhmIl19XSwiVmVyc2lvbiI6IjEifQ==

    使用70亿数据进行训练:

    batch size256
    http://logview.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190725163818390gtgsqtvj2_8e005984_d97a_4f8e_a31f_fa6ea596ec0c&token=ejFlMEJmY2pxVzFZeFowYmk1czl4ZTAydEdjPSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjQ2Nzc1MDAseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MjUxNjM4MTgzOTBndGdzcXR2ajJfOGUwMDU5ODRfZDk3YV80ZjhlX2EzMWZfZmE2ZWE1OTZlYzBjIl19XSwiVmVyc2lvbiI6IjEifQ==

    batch size2048
    http://logview.odps.aliyun-inc.com:8080/logview/?h=http://service-corp.odps.aliyun-inc.com/api&p=graph_embedding&i=20190725161602347gn3wqtvj2_dad26a20_0055_4b3c_b4bb_afaf5772518c&token=eVZ3TFZmM1JQSDJFT3k3L2JDMEV3U1JQZEw0PSxPRFBTX09CTzoxMjkzMzAzOTgzMjUxNTQ4LDE1NjQ2NzYxNjQseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2dyYXBoX2VtYmVkZGluZy9pbnN0YW5jZXMvMjAxOTA3MjUxNjE2MDIzNDdnbjN3cXR2ajJfZGFkMjZhMjBfMDA1NV80YjNjX2I0YmJfYWZhZjU3NzI1MThjIl19XSwiVmVyc2lvbiI6IjEifQ==

    相关文章

      网友评论

          本文标题:20190725工作进展

          本文链接:https://www.haomeiwen.com/subject/feucrctx.html