PAI 竞品调研

作者: Nina_198c | 来源:发表于2018-02-27 18:49 被阅读0次

    产品定位:大数据/机器学习平台

    面向的用户:开发者/算法工程师

    这次主要研究的是深度学习平台。

    支持的框架:Tensorflow, Mxnet, Caffe

    这次使用https://help.aliyun.com/document_detail/50654.html?spm=a2c4g.11186623.6.591.zC378V 平台提供的代码和数据进行了测试,按照文档可以运行起来。

    但是问题是:训练产生的模型放在哪里?

                           训练生成的模型如何发布?

                           发布后,如何调用?如果做推理。

    继续研究

    但是出错了,具体信息如下:(需要找人看一下)

    2018-02-27 18:50:47 INFO Current task status:RUNNING

    2018-02-27 18:50:47 INFO Start execute shell on node oxs-base-biz-gateway011193082232.nu29.

    2018-02-27 18:50:47 INFO Current working dir /home/admin/alisatasknode/taskinfo/20180227/phoenix/18/50/41/yau65ufags1a5bmdhknncewt

    2018-02-27 18:50:47 INFO Full Command ..

    2018-02-27 18:50:47 INFO -------------------------

    2018-02-27 18:50:47 INFO /opt/taobao/tbdpapp/paiwrapper/paiservice.sh /home/admin/alisatasknode/taskinfo//20180227/phoenix/18/50/41/yau65ufags1a5bmdhknncewt//910558 1829668957174154 DEV 910558 http://dms.cn-beijing.data.aliyun-inc.com/

    2018-02-27 18:50:47 INFO -------------------------

    2018-02-27 18:50:47 INFO List of passing environment ..

    2018-02-27 18:50:47 INFO -------------------------

    2018-02-27 18:50:47 INFO SKYNET_SOURCEID=null:

    2018-02-27 18:50:47 INFO SKYNET_ONDUTY=1829668957174154:

    2018-02-27 18:50:47 INFO SKYNET_ENVTYPE=1:

    2018-02-27 18:50:47 INFO SKYNET_PTYPE=1002:

    2018-02-27 18:50:47 INFO IS_NEW_SCHEDULE=true:

    2018-02-27 18:50:47 INFO SKYNET_TENANT_ID=198836943440800:

    2018-02-27 18:50:47 INFO SKYNET_SOURCENAME=group_198836943440800_dev:

    2018-02-27 18:50:47 INFO SKYNET_EXENAME=:

    2018-02-27 18:50:47 INFO TASK_WHITE_LIST=:

    2018-02-27 18:50:47 INFO SKYNET_CYCTIME=20180227000000:

    2018-02-27 18:50:47 INFO SKYNET_PRGNAME=:

    2018-02-27 18:50:47 INFO SKYNET_APP_ID=35192:

    2018-02-27 18:50:47 INFO SKYNET_SYSTEM_ENV=:

    2018-02-27 18:50:47 INFO SKYNET_PARAVALUE=1829668957174154 DEV 910558 http://dms.cn-beijing.data.aliyun-inc.com/:

    2018-02-27 18:50:47 INFO SKYNET_TASKID=1605844:

    2018-02-27 18:50:47 INFO SKYNET_RERUN_TIME=0:

    2018-02-27 18:50:47 INFO SKYNET_NODENAME=TensorFlow(V1.2)-2:

    2018-02-27 18:50:47 INFO SKYNET_ACTIONID=1:

    2018-02-27 18:50:47 INFO YUNQU_APP_NAME=:

    2018-02-27 18:50:47 INFO KILL_SIGNAL=SIGKILL:

    2018-02-27 18:50:47 INFO SKYNET_ID=-1:

    2018-02-27 18:50:47 INFO SKYNET_FLOW_PARAVALUE=group:adidas:

    2018-02-27 18:50:47 INFO SKYNET_PRIORITY=1:

    2018-02-27 18:50:47 INFO SKYNET_GMTDATE=:

    2018-02-27 18:50:47 INFO SKYNET_ONDUTY_WORKNO=1829668957174154:

    2018-02-27 18:50:47 INFO SKYNET_CYCTYPE=0:

    2018-02-27 18:50:47 INFO SKYNET_CONNECTION=***************:

    2018-02-27 18:50:47 INFO SKYNET_JOBID=193294:

    2018-02-27 18:50:47 INFO SKYNET_BIZDATE=20180226:

    2018-02-27 18:50:47 INFO ALISA_TASK_ID=T3_0001179129:

    2018-02-27 18:50:47 INFO ALISA_TASK_EXEC_TARGET=group_198836943440800_dev:

    2018-02-27 18:50:47 INFO ALISA_TASK_PRIORITY=1:

    2018-02-27 18:50:47 INFO --- Invoking Shell command line now ---

    2018-02-27 18:50:47 INFO =================================================================

    LOGBACK: No context given for ch.qos.logback.classic.encoder.PatternLayoutEncoder@77556fd

    JobId: 910558-1605844, Worker: null, JCS version: basein, max parallelism: 30

    Execution Plan:

    ____Nodes:

    ________ #1[odpscmd]

    ____Dependencies:

    [1] start subjob: #1[odpscmd]

    [1] Start OdpsCmdHandler:jobId=910558-1605844

    [1] local log file = /home/admin/alisatasknode/taskinfo//20180227/phoenix/18/50/41/yau65ufags1a5bmdhknncewt//T3_0001179129_jcs.log

    [1] user accessId :LTAImjOrNBOQ1F6Q

    [1] execute command : set biz_id=1829668957174154^alipay^LTAImjOrNBOQ1F6Q^2018-02-27; PAI -name tensorflow_ext121 -project algo_public -DossHost="oss-cn-beijing-internal.aliyuncs.com" -Dbuckets="oss://paitesting.oss-cn-beijing-internal.aliyuncs.com/train.tfrecords/" -DgpuRequired="100" -Darn="acs:ram::1829668957174154:role/aliyunodpspaidefaultrole" -Dscript="oss://paitesting.oss-cn-beijing-internal.aliyuncs.com/tensorflow_mnist.py";

    [1] execute endpoint : http://service.cn.maxcompute.aliyun.com/api

    [1] OK

    [1] ID = 20180227105050631gkspr8jc2

    [1] Odps Instance Id = 20180227105050631gkspr8jc2

    二月 27, 2018 6:50:51 下午 org.apache.http.client.protocol.ResponseProcessCookies processCookies

    警告: Cookie rejected [bs_n_lang="en_US", version:0, domain:aliyun.com, path:/, expiry:null] Illegal 'domain' attribute "aliyun.com". Domain of origin: "dms.cn-beijing.data.aliyun-inc.com"

    二月 27, 2018 6:50:52 下午 org.apache.http.client.protocol.ResponseProcessCookies processCookies

    警告: Cookie rejected [ck2="2f8709cf9971ac7d243abf3d39ff1244", version:0, domain:aliyun.com, path:/, expiry:null] Illegal 'domain' attribute "aliyun.com". Domain of origin: "dms.cn-beijing.data.aliyun-inc.com"

    [1] Sub Instance ID = 2018022718505347e8f0d8_62c4_496c_bc29_6a3c60d9e1f2

    二月 27, 2018 6:50:56 下午 org.apache.http.client.protocol.ResponseProcessCookies processCookies

    警告: Cookie rejected [bs_n_lang="en_US", version:0, domain:aliyun.com, path:/, expiry:null] Illegal 'domain' attribute "aliyun.com". Domain of origin: "dms.cn-beijing.data.aliyun-inc.com"

    二月 27, 2018 6:50:56 下午 org.apache.http.client.protocol.ResponseProcessCookies processCookies

    警告: Cookie rejected [ck2="776ba43efacf2af856e118ff3d1b44de", version:0, domain:aliyun.com, path:/, expiry:null] Illegal 'domain' attribute "aliyun.com". Domain of origin: "dms.cn-beijing.data.aliyun-inc.com"

    [1] http://logview.odps.aliyun.com/logview/?h=http://service.cn.maxcompute.aliyun.com/api&p=AI_project001&i=2018022718505347e8f0d8_62c4_496c_bc29_6a3c60d9e1f2&token=Zmt6QVU2aUpHVlQ3ZWRPdlh3blFGMzdldUpvPSxPRFBTX09CTzoxODI5NjY4OTU3MTc0MTU0LDE1MjAzMzM0NTYseyJTdGF0ZW1lbnQiOlt7IkFjdGlvbiI6WyJvZHBzOlJlYWQiXSwiRWZmZWN0IjoiQWxsb3ciLCJSZXNvdXJjZSI6WyJhY3M6b2RwczoqOnByb2plY3RzL2FpX3Byb2plY3QwMDEvaW5zdGFuY2VzLzIwMTgwMjI3MTg1MDUzNDdlOGYwZDhfNjJjNF80OTZjX2JjMjlfNmEzYzYwZDllMWYyIl19XSwiVmVyc2lvbiI6IjEifQ==

    [1] train: running

    [1] train: 2018-02-27 18:51:02 TensorflowTask_job:0/0/0[0%]

    [1] train: 2018-02-27 18:51:08 TensorflowTask_job:1/0/1[0%]

    [1] train: 2018-02-27 18:51:14 TensorflowTask_job:1/0/1[0%]

    [1] train: 2018-02-27 18:51:19 TensorflowTask_job:1/0/1[0%]

    [1] train: 2018-02-27 18:51:25 TensorflowTask_job:1/0/1[0%]

    [1] train: 2018-02-27 18:51:30 TensorflowTask_job:1/0/1[0%]

    [1] train: 2018-02-27 18:51:36 TensorflowTask_job:1/0/1[0%]

    [1] train: 2018-02-27 18:51:41 TensorflowTask_job:1/0/1[0%]

    [1] train: 2018-02-27 18:51:47 TensorflowTask_job:1/0/1[0%]

    [1] train: 2018-02-27 18:51:52 TensorflowTask_job:0/0/1[0%]

    [1] Instance 20180227105050631gkspr8jc2 Failed.

    [1] FAILED: Failed 2018022718505347e8f0d8_62c4_496c_bc29_6a3c60d9e1f2:ODPS-1202005:Algo Job Failed-User Error-Failed to execute system command.(1)

    [1] Execute Odpscmd Failed!

    [1] ERROR: run subjob: #1[odpscmd] failed!

    Run job failed, time taken: 77s

    2018-02-27 18:52:05 INFO =================================================================

    2018-02-27 18:52:05 INFO Exit code of the Shell command 1

    2018-02-27 18:52:05 INFO --- Invocation of Shell command completed ---

    2018-02-27 18:52:05 ERROR Shell run failed!

    2018-02-27 18:52:05 ERROR Current task status: ERROR

    2018-02-27 18:52:05 INFO Cost time is: 77.775s

    /home/admin/alisatasknode/taskinfo//20180227/phoenix/18/50/41/yau65ufags1a5bmdhknncewt/T3_0001179129.log-END-EOF

    提交了工单,原来是文档描述错误,需要把代码和数据放置在目录 'oss://bucketname/'下,并且在数据源选择时 选到目录这一级。

    又试了一下,运行没有问题了。

    那就说明是训练是结束了,但是这个模型应该如何发布呢?或者,我想要测试一下这个模型的效果。

    相关文章

      网友评论

        本文标题:PAI 竞品调研

        本文链接:https://www.haomeiwen.com/subject/wuimxftx.html