

grad in y axis decreasing

LR is the same for different param
to refine this process, Adagrad is introduced here


sparse data -> only a few params are frequently updated
automatically decaying LR -> pro or con?













Batch norm




grad in y axis decreasing
LR is the same for different param
to refine this process, Adagrad is introduced here
sparse data -> only a few params are frequently updated
automatically decaying LR -> pro or con?
Batch norm
本文标题:Recitation 3 | Deep Learning Opt
本文链接:https://www.haomeiwen.com/subject/tbajvctx.html
网友评论