美文网首页
Residual layer

Residual layer

作者: 海街diary | 来源:发表于2018-02-23 22:59 被阅读532次

    1. 介绍

    Residual layer用来解决神经网络训练过程中的gradient消失或爆炸等意外情况,其效果是提供了lower bound,即很容易学到indenty function(extra layers'weight = 0)

    2 原理

    residual layer

    注意由于a[L] 和 z[L+2] 要相加合并,因此必须确保两者的height, weight, channel都相同。

    而且也并不一定非为2层,也可以为3层。

    residual layer with 3 layers

    实现如下(第1层使用1x1 conv,结合第2层,可以加快计算;因此第1层padding=valid, 第2层padding=same;此外要注意F3 = # channels of X)

    def identity_block(X, f, filters, stage, block):
        """
        Implementation of the identity block as defined in Figure 3
        
        Arguments:
        X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
        f -- integer, specifying the shape of the middle CONV's window for the main path
        filters -- python list of integers, defining the number of filters in the CONV layers of the main path
        stage -- integer, used to name the layers, depending on their position in the network
        block -- string/character, used to name the layers, depending on their position in the network
        
        Returns:
        X -- output of the identity block, tensor of shape (n_H, n_W, n_C)
        """
        
        # defining name basis
        conv_name_base = 'res' + str(stage) + block + '_branch'
        bn_name_base = 'bn' + str(stage) + block + '_branch'
        
        # Retrieve Filters
        F1, F2, F3 = filters
        
        # Save the input value. You'll need this later to add back to the main path. 
        X_shortcut = X
        
        # First component of main path
        X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
        X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
        X = Activation('relu')(X)
        
        ### START CODE HERE ###
        
        # Second component of main path (≈3 lines)
        X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1,1), padding = 'same', name = conv_name_base + '2b', kernel_initializer = glorot_uniform(seed=0))(X)
        X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
        X = Activation('relu')(X)
    
        # Third component of main path (≈2 lines)
        X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1,1), padding = 'valid', name = conv_name_base + '2c', kernel_initializer = glorot_uniform(seed=0))(X)
        X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)
    
        # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
        X = layers.add([X, X_shortcut])
        X = Activation('relu')(X)
        
        ### END CODE HERE ###
        
        return X
    

    3. 变形

    typical residual 由于输入和输出同shape,限制了其功能。但对short_path 稍作改进,即可实现convolution 功能。

    convolutional residual layer

    实现如下(short_path的#filters 与 F3一致;同时参数s控制整个convolution layer是否减少height and weight)

    def convolutional_block(X, f, filters, stage, block, s = 2):
        """
        Implementation of the convolutional block as defined in Figure 4
        
        Arguments:
        X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
        f -- integer, specifying the shape of the middle CONV's window for the main path
        filters -- python list of integers, defining the number of filters in the CONV layers of the main path
        stage -- integer, used to name the layers, depending on their position in the network
        block -- string/character, used to name the layers, depending on their position in the network
        s -- Integer, specifying the stride to be used
        
        Returns:
        X -- output of the convolutional block, tensor of shape (n_H, n_W, n_C)
        """
        
        # defining name basis
        conv_name_base = 'res' + str(stage) + block + '_branch'
        bn_name_base = 'bn' + str(stage) + block + '_branch'
        
        # Retrieve Filters
        F1, F2, F3 = filters
        
        # Save the input value
        X_shortcut = X
    
    
        ##### MAIN PATH #####
        # First component of main path 
        X = Conv2D(F1, (1, 1), strides = (s,s), name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
        X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
        X = Activation('relu')(X)
        
        ### START CODE HERE ###
    
        # Second component of main path (≈3 lines)
        X = Conv2D(F2, (f, f), strides = (1,1), name = conv_name_base + '2b', padding='same', kernel_initializer = glorot_uniform(seed=0))(X)
        X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
        X = Activation('relu')(X)
    
        # Third component of main path (≈2 lines)
        X = Conv2D(F3, (1,1), strides = (1, 1), name = conv_name_base + '2c', padding='valid', kernel_initializer = glorot_uniform(seed=0))(X)
        X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)
    
        ##### SHORTCUT PATH #### (≈2 lines)
        X_shortcut = Conv2D(F3, (1,1), strides = (s, s), name = conv_name_base + '1', padding='valid', kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
        X_shortcut = BatchNormalization(axis=3, name = bn_name_base + '1')(X_shortcut)
    
        # Final step: Add shortcut value to main path, and pass it through a RELU activation (≈2 lines)
        X = layers.add([X, X_shortcut])
        X = Activation('relu')(X)
        
        ### END CODE HERE ###
        
        return X

    相关文章

      网友评论

          本文标题:Residual layer

          本文链接:https://www.haomeiwen.com/subject/sysfxftx.html