第一周 深度学习的实用层面
一般训练集和测试集比例是 7:3 or 6:2:2,但是在大数据时代(数据超过百万)我们可以将测试集和交叉验证集的比例缩小。甚至可以是98:1:1,超过百万的,可以99.5:0.25:0.25.
并且训练集和测试集的数据要服从同一分布!
data:image/s3,"s3://crabby-images/b8e48/b8e48e6e68e42afb946a0d00861c77a8575cbddb" alt=""
图一不能很好拟合数据,underfitting,图三over fitting
data:image/s3,"s3://crabby-images/8636a/8636ac468f35909c26431085db4100e116d9d394" alt=""
这里提到了个optimal error,就是用人眼来辨别,当作误差的基准。
data:image/s3,"s3://crabby-images/5841e/5841edc18d744d69fd8be6286f20ebfa500e8a9d" alt=""
data:image/s3,"s3://crabby-images/704cb/704cb6ef254a56303f4fbfa568e23f36f397b6c4" alt=""
data:image/s3,"s3://crabby-images/a30a6/a30a6a5177c02ab8212b46e176f4de72edf0165e" alt=""
为什么L2正则项可以防止过拟合呢?当λ变大,则w(权重矩阵)的某些值就变成了0,这意味着模型将从high variance过渡到high bias。下图使用了个更明显的激活函数来阐述这个问题。
data:image/s3,"s3://crabby-images/3cd2d/3cd2df623cea1f176b247cddb393e8050ed06b63" alt=""
drop out(随机失活)正则。在网络中每一个节点都以0.5的概率选择删去或者保留该节点,最后会得到一个不是那么大的网络,虽然方法很简单,但是它的确有用。
data:image/s3,"s3://crabby-images/fd97c/fd97c263428a2c6bb9cd0ef06f37d2cee02ea944" alt=""
dropout和L2正则化很像,通常用在计算机视觉方向,因为数据过少,很容易引起过拟合。我们可以在容易发生过拟合的层时(节点数目过大)时将keep prob值设置的小点,不同的层可以使用不同数值的keep prob.它的缺点是破坏了损失函数J
data:image/s3,"s3://crabby-images/e641a/e641acc401904e462a165260865bd7fca536912f" alt=""
还有一些其他的正则化方法。比如说数据扩增:1.将图片反转加到数据集里,将图片旋转,放大,将数字扭曲变形等,2.early stopping,可以绘制出dev-set train-set的损失函数J曲线,并选择合适的时候停止梯度下降,缺点是J不能得到很好的优化,没有到足够低,并且一个时刻做了两件事,有点伤。
data:image/s3,"s3://crabby-images/9f770/9f770bb177a7293f8337c18e6b8408905292c0e1" alt=""
归一化输入
data:image/s3,"s3://crabby-images/00ddd/00ddd3641a7748e981a3243c36a88b4e6cc24e7d" alt=""
data:image/s3,"s3://crabby-images/3a26c/3a26c2ae80d0952e265b792328c26028330ea266" alt=""
在层数很多的神经网络中,很容易发生梯度消失和梯度爆炸
data:image/s3,"s3://crabby-images/706b7/706b743ebd4bd946474f0f6f7966aea50c7e5cfa" alt=""
data:image/s3,"s3://crabby-images/e40e0/e40e0ce1bd5c20295358ee569b0d020a3c51ec39" alt=""
双边误差比单边更加准确,所以梯度校验时采用2e来做。
data:image/s3,"s3://crabby-images/a5cb3/a5cb315612ed9ed5f780317375313ffee09b9ce3" alt=""
梯度校验some tips:
data:image/s3,"s3://crabby-images/e4537/e45374fb9157cafe4b3536b7722f682699b7ec1d" alt=""
data:image/s3,"s3://crabby-images/2b81d/2b81d124265b1bb7d492ca58f783cece160ae84f" alt=""
第二周 优化算法
Mini -batch
data:image/s3,"s3://crabby-images/09cff/09cff674e4208ea9a362a62c0ff9a26167fbba34" alt=""
data:image/s3,"s3://crabby-images/76175/76175cc4858b1a36af294e0cff692b116b57dbef" alt=""
data:image/s3,"s3://crabby-images/f1c76/f1c76367a75f41a6dd29bc9aa8dbb10b5a19c4a8" alt=""
data:image/s3,"s3://crabby-images/629c6/629c63621740c4b9e79ca0afe32abe8d0860024d" alt=""
data:image/s3,"s3://crabby-images/1dd60/1dd60d60dba6b859d8e491e732ead71d48a2e602" alt=""
指数加权平均
data:image/s3,"s3://crabby-images/e7d9f/e7d9f858f1edaef04873e960dee02c737e2fc5f1" alt=""
data:image/s3,"s3://crabby-images/ec028/ec028115ff594c5dfb75d36f5c2c158315159fd6" alt=""
动量梯度下降
data:image/s3,"s3://crabby-images/dc1c9/dc1c92be283c5d81eb326510eb42e105935cffa4" alt=""
data:image/s3,"s3://crabby-images/e1bcc/e1bcc7f8d71a76bce2643a565db5f9014f9dfa94" alt=""
data:image/s3,"s3://crabby-images/0aadd/0aaddb8aff4818c0728e7227f1d2936980352ed4" alt=""
data:image/s3,"s3://crabby-images/b2dc1/b2dc183d51f604eb1fb8150c1d1c8d0185367ebf" alt=""
data:image/s3,"s3://crabby-images/e3916/e3916ad1825feb6e0006f56cd4b12f5b4be55990" alt=""
data:image/s3,"s3://crabby-images/8903a/8903a050737e3485f41b61618a5b43b6795de0cb" alt=""
第三周:超参数调试
data:image/s3,"s3://crabby-images/ec84e/ec84e05b0d1aa4b15d4a77f7ea313e4f7db49242" alt=""
不要使用网格,因为不同的参数有不同的重要性,随机取值,α能够取到更多的值,可能会取到更好效果的值。当搜索时,可以先搜索大范围的,然后再缩小范围搜索。~
data:image/s3,"s3://crabby-images/794bc/794bc2312083ba663855228ad5a828a9c93fb997" alt=""
在对数轴上取数~。有点技巧。
data:image/s3,"s3://crabby-images/cae88/cae88837d34a8b88efbc18d99a8adcfa71594bc1" alt=""
data:image/s3,"s3://crabby-images/07241/0724165f95fb7f78d76053cb66f591fea898145e" alt=""
超参数的搜索过程,当然~要视应用而定。
data:image/s3,"s3://crabby-images/7e217/7e21792fbcbf9359859657c4b1245ee366caf1f6" alt=""
归一化输入,使训练加快惹,上边是逻辑回归时用方差归一化输入
data:image/s3,"s3://crabby-images/3a9f4/3a9f4693ce6043a18c38b0556478e4b7b881962e" alt=""
Batch归一化:对隐藏层进行归一化,而且不仅仅使用方差和均值,还加入了参数使得可以任意调节
data:image/s3,"s3://crabby-images/7fa1c/7fa1c8bc8372a28b187c854cde1bbd2d6b865c86" alt=""
注意。batch norm是在计算激活函数前就要归一化的
data:image/s3,"s3://crabby-images/59c3a/59c3a5523162ab617c81fdd2c97dd4519434eb76" alt=""
用mini-batch与batch norm结合在一起。可以把b参数去掉
data:image/s3,"s3://crabby-images/e11d0/e11d0fbc07cccda48fe56b9f59535b6cf9e16cb5" alt=""
batch norm的作用~!减少covariate shift的影响,whichmeans 前层参数的变化会影响后面的训练。每一层都归一化后,可以使得参数shift的不那么快,有助于加速训练、而且还带来了一点意想不到的作用“正则化~”
data:image/s3,"s3://crabby-images/50ae6/50ae621db77e4b3315cce183607de20d76f1216d" alt=""
data:image/s3,"s3://crabby-images/694f3/694f354d3d18ae6f7844478b113728f70ddd36fc" alt=""
softmax归一化:处理最后分类结果是多分类的
data:image/s3,"s3://crabby-images/df2be/df2be3c9cccf76e37c8583e2ca58544949550b39" alt=""
网友评论