美文网首页
基于keras的Resnet

基于keras的Resnet

作者: 苟且偷生小屁屁 | 来源:发表于2017-11-01 16:11 被阅读0次

    本文的部分内容借鉴https://zhuanlan.zhihu.com/p/21586417

    首先看看Resnet最常见的一张图:

    v2-358f29d5d8fab3ed6bea88ca7075f3a9_b.png

    当网络结构越来越深时, 想必浅层结构, 网络越来越难被训练.
    如今有很多常用的方法, 比如 BatchNormalization, Dropout等手段, 以前关于BN的文章可以看出, 不加BN时网络可能直接发散了.
    DL的原则是网络的深度越深越好, 深度代表着一种熵, 也就是网络的深度代表着网络对特征的抽象化程度, 抽象程度越高的越可能包含有语义级的含义. 但是如何解决难以训练的问题呢?

    • 该怎么解决呢?

    如果加入的神经元是线性的, 也就是x = x, 网络结构的实际深度实际上没有变化.
    对于DL的一层来说, 正常的映射应该是 x -> f(x) 如果这时按照上图则应该有x -> h(x)+x, 如果想要二者相等, h(x)+x=f(x), 也就是h(x) = f(x)-x,这就是"残差"概念的由来. 当h(x) = 0时, 网络等价于x->x,与此同时x=f(x),一方面网络基本等于线性,可以扩展到很深,另一方面本来希望得到的非线性映射也传播了下去.

    • 还有一种说法是低层的特征与高层的特征进行了融合,从而获得了更好的效果,这种说法也有一定的道理.
    • 当然,后来还有一篇论文证明resnet的深度没有实质的加深,这篇论文我还没有看过,等看完以后我还会来更新这篇博客.
    • conv_block
    v2-ab8ddd6f4efd4635222211443c72de6d_b.png
    程序没有管每个节点的命名, 主路1,3,1结构,侧路1结构
    def conv_block(input_tensor, filters):
        filter1, filter2, filter3 = filters
    
        x = Conv2D(filter1,(1,1),strides=1)(input_tensor)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        x = Conv2D(filter2,(3,3),strides=1,padding='same')(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        x = Conv2D(filter3,(1,1),strides=1)(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        y = Conv2D(filter3,(1,1),strides=1)(input_tensor)
        y = BatchNormalization(axis=-1)(y)
        y = Activation('relu')(y)
    
        out = merge([x,y],mode='sum')
        z = Activation('relu')(out)
    
        return z
    
    
    • identity_block 不同的地方是侧路没有卷积
    v2-7b95380d6bec5e74a3253ce5ff0fd724_b.png
    def identity_block(input_tensor, filters):
    
    
        filter1, filter2, filter3 = filters
    
        x = Conv2D(filter1,(1,1),strides=1)(input_tensor)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        x = Conv2D(filter2,(3,3),strides=1,padding='same')(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        x = Conv2D(filter3,(1,1),strides=1)(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        y = Conv2D(filter3,(1,1),strides=1)(input_tensor)
        y = BatchNormalization(axis=-1)(y)
        y = Activation('relu')(y)
    
        out = merge([x,input_tensor],mode='sum')
        z = Activation('relu')(out)
        return z
    

    网络的整体结构为:

    data  1,3,224,224
    
    conv  filter=64, kernel_size=7, pad=3,stride=2 1,64,112,112
    
    bn
    
    activation('relu')
    
    maxpool kernel_size=3,stride=2  1,64,56,56
    
    # block 1  (64,64,256)
    conv_block() in:1,64,56,56 filter=(64,64,256),out=1,256,56,56
    
    identity_block  in=1,256,56,56, filter=(64,64,256),out=1,256,56,56
    
    identity_block  in=1,256,56,56, filter=(64,64,256),out=1,256,56,56
    
    # block 2  (128,128,512)
    
    conv_block  in=1,256,56,56 filter=(128,128,512),out=1,512,28,28
    
    identity_block  in=1,256,56,56 filter=(128,128,512),out=1,512,28,28
    
    identity_block  in=1,256,56,56 filter=(128,128,512),out=1,512,28,28
    
    identity_block  in=1,256,56,56 filter=(128,128,512),out=1,512,28,28
    
    # block 3 (256,256,1024)
    
    conv_block  in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
    
    identity_block  in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
    
    identity_block  in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
    
    identity_block  in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
    
    identity_block  in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
    
    identity_block  in=1,512,28,28 filter=(256,256,1024),out=1,1024,14,14
    
    # block 4 (512,512,2048)
    
    conv_block  in=1,1024,14,14 filter=(512,512,2048),out=1,2048,7,7
    
    identity_block  in=1,1024,14,14 filter=(512,512,2048),out=1,2048,7,7
    
    identity_block  in=1,1024,14,14 filter=(512,512,2048),out=1,2048,7,7
    
    maxpool kernel_size=7, stride=1 out=1,2048,1,1
    
    flatten
    
    dence(1,1000)
    
    acivation('softmax')
    
    probbility(1,1000)
    
    

    主函数

    # coding:utf-8
    import keras
    from resnet_model import resnet_model
    from keras.datasets import cifar10
    from keras.utils import plot_model
    from keras.callbacks import TensorBoard, ModelCheckpoint, LearningRateScheduler
    import math
    
    if __name__ == '__main__':
    
        n_class = 10
        img_w = 32
        img_h = 32
        BATCH_SIZE = 128
        EPOCH = 100
    
        (x_train, y_train), (x_test, y_test) = cifar10.load_data()
    
        x_train = x_train.astype('float32')
        x_train /= 255.
        y_train = keras.utils.np_utils.to_categorical(y_train, n_class)
    
        x_test = x_test.astype('float32')
        x_test /= 255.
        y_test = keras.utils.np_utils.to_categorical(y_test, n_class)
    
    
        tb = TensorBoard(log_dir='log')
        cp = ModelCheckpoint(filepath='best_model.h5', monitor='val_loss',save_best_only=1, mode='auto')
    
    
        def step_decay(epoch):
            initial_lrate = 0.01
            drop = 0.5
            epochs_drop = 10.0
            lrate = initial_lrate * math.pow(drop, math.floor((1 + epoch) / epochs_drop))
            return lrate
    
        lr = LearningRateScheduler(step_decay)
        CB = [tb, cp, lr]
        input_shape = [x_train.shape[1], x_train.shape[2], x_train.shape[3]]
    
        model = resnet_model(out_class=n_class, input_shape = input_shape)
    
        plot_model(model, show_layer_names=1)
    
        model.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
    
        model.fit(x_train, y_train, batch_size=BATCH_SIZE, epochs=EPOCH, validation_split=0.3,
                  callbacks=CB, shuffle=1)
    
        loss, acc = model.evaluate(x_test, y_test, batch_size= BATCH_SIZE)
    

    模型函数

    # coding: utf-8
    from keras.models import Model
    from keras.layers import Input,Conv2D,BatchNormalization,Activation,MaxPool2D,merge,Flatten,Dense
    import math
    # from identity_block import identity_block
    # from conv_block import conv_block
    # from keras.layers import Conv2D,BatchNormalization,Activation
    
    def conv_block(input_tensor, filters):
        filter1, filter2, filter3 = filters
    
        x = Conv2D(filter1,(1,1),strides=1)(input_tensor)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        x = Conv2D(filter2,(3,3),strides=1,padding='same')(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        x = Conv2D(filter3,(1,1),strides=1)(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        y = Conv2D(filter3,(1,1),strides=1)(input_tensor)
        y = BatchNormalization(axis=-1)(y)
        y = Activation('relu')(y)
    
        out = merge([x,y],mode='sum')
        z = Activation('relu')(out)
    
        return z
    
    
    
    
    def identity_block(input_tensor, filters):
    
    
        filter1, filter2, filter3 = filters
    
        x = Conv2D(filter1,(1,1),strides=1)(input_tensor)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        x = Conv2D(filter2,(3,3),strides=1,padding='same')(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        x = Conv2D(filter3,(1,1),strides=1)(x)
        x = BatchNormalization(axis=-1)(x)
        x = Activation('relu')(x)
    
        y = Conv2D(filter3,(1,1),strides=1)(input_tensor)
        y = BatchNormalization(axis=-1)(y)
        y = Activation('relu')(y)
    
        out = merge([x,input_tensor],mode='sum')
        z = Activation('relu')(out)
        return z
    
    
    
    def resnet_model(out_class, input_shape):
    
        inputs = Input(shape=input_shape) #1,3,224,224
    
        #
        x = Conv2D(64, (7, 7), strides=2, padding='same')(inputs) #conv1  1,64,112,112
        x = BatchNormalization(axis=-1)(x) #bn_conv1
        x = Activation('relu')(x) #conv1_relu
    
        x = MaxPool2D(pool_size=(3,3),strides=2)(x) # 1,64,56,56
    
        # block1  (64,64,256) 1,2 in:1,64,56,56
        x = conv_block(x, [64, 64, 256]) #out=1,256,56,56
        x = identity_block(x, [64, 64, 256]) #out=1,256,56,56
        x = identity_block(x, [64, 64, 256]) #out=1,256,56,56
    
        # block2  (128,128,512) 1,3 in:1,256,56,56
        x = conv_block(x, [128,128,512]) #out=1,512,28,28
        x = identity_block(x, [128,128,512]) #out=1,512,28,28
        x = identity_block(x, [128,128,512]) #out=1,512,28,28
        x = identity_block(x, [128, 128, 512])  # out=1,512,28,28
    
        # block 3 (256,256,1024) 1,5 in:1,512,28,28
        x = conv_block(x, [256,256,1024])  # out=1,1024,14,14
        x = identity_block(x, [256, 256, 1024])  # out=1,1024,14,14
        x = identity_block(x, [256, 256, 1024])  # out=1,1024,14,14
        x = identity_block(x, [256, 256, 1024])  # out=1,1024,14,14
        x = identity_block(x, [256, 256, 1024])  # out=1,1024,14,14
        x = identity_block(x, [256, 256, 1024])  # out=1,1024,14,14
    
        # block 4 (512,512,2048) 1,2 in:1,1024,14,14
        x = conv_block(x, [512,512,2048])  # out=1,2048,7,7
        x = identity_block(x, [512, 512, 2048])  # out=1,2048,7,7
        x = identity_block(x, [512, 512, 2048])  # out=1,2048,7,7
    
        # maxpool kernel_size=7, stride=1 out=1,2048,1,1
        x = MaxPool2D(pool_size=(7, 7), strides=1)(x)
    
        # flatten
        x = Flatten()(x)
    
        # # Dense
        # x = Dense(1000)(x) # out=1,1000
    
        # Dense,这里改造了一下,适应cifar10
        x = Dense(out_class)(x)  # out=1,1000
    
        out = Activation('softmax')(x)
    
        model = Model(inputs=inputs, outputs=out)
    
        return model
    

    现在正在跑, 1060的卡还是太局限了, 建议有经济能力的同学直接上1080ti,

    • epoch=300,每轮166秒,一共用时13.8小时计算完成
    图片.png 图片.png 图片.png 图片.png
    • 训练集效果还可以,99.75%,实际上由于关于cifar10的训练进行的次数不多,之前用vgg16达到过1.000, 很难说这个比率是不是真的高,损失0.0082

    • 测试集74.39%,显而易见出现了过拟合的现象,loss的波动也非常大,

    • 考虑解决方案, 加入dropout(0.5)尝试,加入学习率衰减,是否因为模型过于复杂,因为resnet在Imagenet上的表现最好,Imagenet的图像容量要远大于cifar10

    相关文章

      网友评论

          本文标题:基于keras的Resnet

          本文链接:https://www.haomeiwen.com/subject/pebwpxtx.html