Convolution Neural Network (CNN)
MNIST
CNN, gpu, deep network, dropout, ensembles
结果达到接近人肉眼识别水平:
9,967 / 10,000 识别正确
以下是误识别的图片
data:image/s3,"s3://crabby-images/255c4/255c4616a2145645b985ab120809c217667af8df" alt=""
其中很多对于人肉眼都不容易识别,之前的神经网络,相邻层之前所有的神经元都两两相连。
data:image/s3,"s3://crabby-images/2d232/2d232c67277769ebd5d70fc925d32a5d9da2dd13" alt=""
输出层: 0-9
CNN结构很不一样, 输入是一个二维的神经元 (28x28):
data:image/s3,"s3://crabby-images/b04fa/b04fa1e499ec54781315577b084cec1bf38ace72" alt=""
local receptive fields:使用小方块连接下一个神经元
data:image/s3,"s3://crabby-images/5fcf7/5fcf7258bdd4e164ff715cdd75663f9813429c27" alt=""
data:image/s3,"s3://crabby-images/e3205/e3205244962e1ef61aa9bea27a7c78bd64af7a49" alt=""
data:image/s3,"s3://crabby-images/fe6a0/fe6a043742ff68e81a0841960387e4b87ed3f4a2" alt=""
共享权重和偏向(shared weights and biases):
data:image/s3,"s3://crabby-images/d491e/d491e7fd678f461b37dcfd3c60f729be9f616210" alt=""
对于第一个隐藏层, 所有神经元探测到同样的特征, 只是根据不同位置
Feature map: 从输入层转化到输出层
data:image/s3,"s3://crabby-images/60563/60563d9239298a17c49c5975b0b930941e44b201" alt=""
通常一些表现较好的方法都使用更多的feature map:
data:image/s3,"s3://crabby-images/a0755/a0755253e80864b464c4734b73a8299da2b8b0c7" alt=""
浅色代表更小的权重(负数)
表明CNN在学习
共享的权重和偏向(weights, bias)大大减少了参数的数量:
data:image/s3,"s3://crabby-images/cb9db/cb9dbeaab9dd35b39ff49fc076b9bb163d7b637b" alt=""
对于每一个feature map, 需要 5x5=25个权重参数, 加上1个偏向b, 26个
如果有20个feature maps, 总共26x20=520个参数就可以定义CNN
如果像之前的神经网络, 两两相连, 需要 28x28 = 784 输入层, 加上第一个隐藏层30个神经元, 则需要784x30再加上30个b, 总共23,550个参数! 多了40倍的参数。
data:image/s3,"s3://crabby-images/fab8c/fab8c238255f4073f148376c6aee033195920e42" alt=""
data:image/s3,"s3://crabby-images/61b98/61b988ce89912edb45b12d634fd0affb8f27c9e0" alt=""
24x24 , 2x2 pooling => 12x12
多个feature maps:
data:image/s3,"s3://crabby-images/6bd8d/6bd8d1f7ed962f68f97565d61561ead8a7e6b9d9" alt=""
其他pooling: L2 pooling, 平方和开方
以上所有步骤结合在一起:
data:image/s3,"s3://crabby-images/50076/50076e16186b1a3967b2469795835c437c15580c" alt=""
网友评论