一小时培训之神经网络入门

作者: 林檎果 | 来源:发表于2018-04-09 22:16 被阅读125次

    系列培训目录

    ➡️神经网络(Neural Networks)⬅️

    卷积神经网络(Convolutional Neural Networks)

    循环神经网络(Recurrent Neural Networks)

    生成对抗神经网络(Generative Adversarial Networks)

    神经网络(Neural Networks)

    最简的神经神经网络 -- 一个神经元

    image
    • 组成:
      • 参数:用x表示
      • 权重:用w表示
      • 偏差:用b表示
      • 激活函数:用f(h)表示
    • 数学形式:
      image
      其中f表示激活函数,通常用
      image
      sigmoid值在0-1之间的数值很像概率适合做分类
    image

    如何找出这样的函数?

    方法:

    • 监督学习的方式训练:
      • 让机器找出使数据目标之间的误差最小的函数
        • 告诉机器衡量误差的函数
        • 梯度下降(Gradient Descent)更新神经元的权重(w),使得误差的方程最小

    常用的误差函数:

    • 回归问题用:平方差之和(the sum of squared errors):


      image
      其中: image
      是为了方便计算在求导时可去除平方,u是每行,j表示每列
    • 分类问题用:最小交叉墒(后面讲)
      image

    梯度下降 - 数学

    用链式法则去求误差函数对于权重w的偏微分,用来更新神经网络的权重w
    存在的问题:
    只看梯度的话,会卡在局部最优值

    image

    数学求法:
    为了更新权重,就要求误差函数E对于权重w的偏导,乘以学习率来控制学习速度
    η成为learning rate,表示每次更新权重w的步长,用来控制学习速度

    image

    求误差函数E对于权重w的偏导的求法:
    目标:

    image

    链式法则:


    image

    再用一次链式法则:


    image

    带入后,再用一次链式法则:


    image

    最后,做替换:


    image

    最终结果:


    image

    定义:


    image

    梯度下降 - 代码实现

    import numpy as np
    
    def sigmoid(x):
        """
        Calculate sigmoid
        """
        return 1/(1+np.exp(-x))
    
    def sigmoid_prime(x):
        """
        # Derivative of the sigmoid function
        """
        return sigmoid(x) * (1 - sigmoid(x))
    
    learnrate = 0.5
    x1,x2,x3,x4 = 1, 2, 3, 4
    y = 0.5
    
    # Initial weights
    w1,w2,w3,w4 = 0.5, -0.5, 0.3, 0.1
    
    ### Calculate one gradient descent step for each weight
    ### Note: Some steps have been consilated, so there are
    ###       fewer variable names than in the above sample code
    
    # TODO: Calculate the node's linear combination of inputs and weights
    h = x1*w1+x2*w2+x3*w3+x4*w4
    
    # TODO: Calculate output of neural network
    y_hat = sigmoid(h)
    
    # TODO: Calculate error of neural network
    error = (y - y_hat)
    
    # TODO: Calculate the error term
    #       Remember, this requires the output gradient, which we haven't
    #       specifically added a variable for.
    error_term = error * sigmoid_prime(h)
    # Note: The sigmoid_prime function calculates sigmoid(h) twice,
    #       but you've already calculated it once. You can make this
    #       code more efficient by calculating the derivative directly
    #       rather than calling sigmoid_prime, like this:
    # error_term = error * nn_output * (1 - nn_output)
    
    # TODO: Calculate change in weights
    del_w = learnrate * error_term * x
    
    print('Neural Network output:')
    print(nn_output)
    print('Amount of Error:')
    print(error)
    print('Change in Weights:')
    print(del_w)
    
    Neural Network output:
    0.689974481128
    Amount of Error:
    -0.189974481128
    Change in Weights:
    [-0.02031869 -0.04063738 -0.06095608 -0.08127477]
    

    训练方法

    迭代直到误差最小:

    1. 正向传播,获得预测值:沿着神经网络,矩阵点乘,计算出预测值$\hat y$。
    2. 反向传播,获得每层的误差梯度:用$\hat y$计算误差函数,反向传播误差。
    3. 更新权重:根据误差更新权重

    以预测研究生是否能入学为例 - 单个神经元版本

    image

    读取原始数据

    import numpy as np
    import pandas as pd
    admissions=pd.read_csv("entry_admission.csv")
    admissions.head()
    

    <div>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>admit</th>
    <th>gre</th>
    <th>gpa</th>
    <th>rank</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>0</td>
    <td>380</td>
    <td>3.61</td>
    <td>3</td>
    </tr>
    <tr>
    <th>1</th>
    <td>1</td>
    <td>660</td>
    <td>3.67</td>
    <td>3</td>
    </tr>
    <tr>
    <th>2</th>
    <td>1</td>
    <td>800</td>
    <td>4.00</td>
    <td>1</td>
    </tr>
    <tr>
    <th>3</th>
    <td>1</td>
    <td>640</td>
    <td>3.19</td>
    <td>4</td>
    </tr>
    <tr>
    <th>4</th>
    <td>0</td>
    <td>520</td>
    <td>2.93</td>
    <td>4</td>
    </tr>
    </tbody>
    </table>
    </div>

    数据处理

    # Make dummy variables for rank
    data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
    data = data.drop('rank', axis=1)
    
    # Standarize features
    for field in ['gre', 'gpa']:
        mean, std = data[field].mean(), data[field].std()
        data.loc[:,field] = (data[field]-mean)/std
        
    # Split off random 10% of the data for testing
    np.random.seed(42)
    sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
    data, test_data = data.ix[sample], data.drop(sample)
    
    # Split into features and targets
    features, targets = data.drop('admit', axis=1), data['admit']
    features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']
    
    features.head()
    

    <div>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>gre</th>
    <th>gpa</th>
    <th>rank_1</th>
    <th>rank_2</th>
    <th>rank_3</th>
    <th>rank_4</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>209</th>
    <td>-0.066657</td>
    <td>0.289305</td>
    <td>0</td>
    <td>1</td>
    <td>0</td>
    <td>0</td>
    </tr>
    <tr>
    <th>280</th>
    <td>0.625884</td>
    <td>1.445476</td>
    <td>0</td>
    <td>1</td>
    <td>0</td>
    <td>0</td>
    </tr>
    <tr>
    <th>33</th>
    <td>1.837832</td>
    <td>1.603135</td>
    <td>0</td>
    <td>0</td>
    <td>1</td>
    <td>0</td>
    </tr>
    <tr>
    <th>210</th>
    <td>1.318426</td>
    <td>-0.131120</td>
    <td>0</td>
    <td>0</td>
    <td>0</td>
    <td>1</td>
    </tr>
    <tr>
    <th>93</th>
    <td>-0.066657</td>
    <td>-1.208461</td>
    <td>0</td>
    <td>1</td>
    <td>0</td>
    <td>0</td>
    </tr>
    </tbody>
    </table>
    </div>

    targets.head()
    
    209    0
    280    0
    33     1
    210    0
    93     0
    Name: admit, dtype: int64
    

    单神经元版本

    def sigmoid(x):
        """
        Calculate sigmoid
        """
        return 1 / (1 + np.exp(-x))
    
    # TODO: We haven't provided the sigmoid_prime function like we did in
    #       the previous lesson to encourage you to come up with a more
    #       efficient solution. If you need a hint, check out the comments
    #       in solution.py from the previous lecture.
    
    # Use to same seed to make debugging easier
    np.random.seed(42)
    
    n_records, n_features = features.shape
    last_loss = None
    
    # Initialize weights
    weights = np.random.normal(scale=1 / n_features**.5, size=n_features)
    
    # Neural Network hyperparameters
    epochs = 1000
    learnrate = 0.5
    
    for e in range(epochs):#训练的 代 数
        del_w = np.zeros(weights.shape)
        for x, y in zip(features.values, targets):
            # Loop through all records, x is the input, y is the target
    
            # Note: We haven't included the h variable from the previous
            #       lesson. You can add it if you want, or you can calculate
            #       the h together with the output
    
            # TODO: Calculate the output
            output = sigmoid(np.dot(x,weights))
    
            # TODO: Calculate the error
            error = y-output
    
            # TODO: Calculate the error term
            error_term = error*output*(1-output)
    
            # TODO: Calculate the change in weights for this sample
            #       and add it to the total weight change
            del_w += error_term*x
    
        # TODO: Update weights using the learning rate and the average change in weights
        weights += learnrate*del_w
    
        # Printing out the mean square error on the training set
        if e % (epochs / 10) == 0:
            out = sigmoid(np.dot(features, weights))
            loss = np.mean((out - targets) ** 2)
            if last_loss and last_loss < loss:
                print("Train loss: ", loss, "  WARNING - Loss Increasing")
            else:
                print("Train loss: ", loss)
            last_loss = loss
    
    
    # Calculate accuracy on test data
    tes_out = sigmoid(np.dot(features_test, weights))
    predictions = tes_out > 0.5
    accuracy = np.mean(predictions == targets_test)
    print("Prediction accuracy: {:.3f}".format(accuracy))
    
    Train loss:  0.286196010415
    Train loss:  0.257761346594
    Train loss:  0.257722034703
    Train loss:  0.257722749419   WARNING - Loss Increasing
    Train loss:  0.257722752361   WARNING - Loss Increasing
    Train loss:  0.257722752309
    Train loss:  0.257722752309
    Train loss:  0.257722752309   WARNING - Loss Increasing
    Train loss:  0.257722752309   WARNING - Loss Increasing
    Train loss:  0.257722752309   WARNING - Loss Increasing
    Prediction accuracy: 0.725
    

    神经网络 - 由神经元组成

    通过非线性的激活函数的神经元组合起来就是神经网络。
    能得出非线性的函数,从而具备找到各种各样函数的能力。

    image

    数学形式:
    矩阵相乘,每一隐含层是一个矩阵 (图待加上偏差)

    image

    神经网络 - 代码表示

    import numpy as np
    
    def sigmoid(x):
        """
        Calculate sigmoid
        """
        return 1/(1+np.exp(-x))
    
    # Network size
    N_input = 4
    N_hidden = 3
    N_output = 2
    
    np.random.seed(42)
    # Make some fake data
    X = np.random.randn(N_input)
    
    weights_input_to_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))
    weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))
    
    # element-wise
    # TODO: Make a forward pass through the network
    hidden_layer_in = np.dot(X, weights_input_to_hidden)
    hidden_layer_out = sigmoid(hidden_layer_in)
    
    print('Hidden-layer Output:')
    print(hidden_layer_out)
    
    output_layer_in = np.dot(hidden_layer_out, weights_hidden_to_output)
    output_layer_out = sigmoid(output_layer_in)
    
    print('Output-layer Output:')
    print(output_layer_out)
    
    Hidden-layer Output:
    [ 0.41492192  0.42604313  0.5002434 ]
    Output-layer Output:
    [ 0.49815196  0.48539772]
    

    反向传播 - 将误差的梯度反向传播到神经网络的每个神经元用以更新权重w

    反向传播的计算方法:
    从最后一层的梯度计算,利用链式法则反向计算每一层梯度

    image

    公式:
    第j层的误差:

    image

    这里的Σ表示如果下一层(第k层)有多个神经元,则反向传上来的误差要叠加起来
    第j层的每个权重w的值:

    image
    import numpy as np
    
    
    def sigmoid(x):
        """
        Calculate sigmoid
        """
        return 1 / (1 + np.exp(-x))
    
    
    x = np.array([0.5, 0.1, -0.2])
    target = 0.6
    learnrate = 0.5
    
    weights_input_hidden = np.array([[0.5, -0.6],
                                     [0.1, -0.2],
                                     [0.1, 0.7]])
    
    weights_hidden_output = np.array([0.1, -0.3])
    
    ## Forward pass
    hidden_layer_input = np.dot(x, weights_input_hidden)
    hidden_layer_output = sigmoid(hidden_layer_input)
    
    output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
    output = sigmoid(output_layer_in)
    
    ## Backwards pass
    ## TODO: Calculate error
    error = target - output
    
    # TODO: Calculate error gradient for output layer
    del_err_output = error * output * (1 - output)
    # TODO: Calculate change in weights for hidden layer to output layer
    delta_weights_hidden_output = learnrate * del_err_output * hidden_layer_output
    
    
    # TODO: Calculate error gradient for hidden layer
    del_err_hidden = np.dot(del_err_output, weights_hidden_output) * \
                     hidden_layer_output * (1 - hidden_layer_output)
    # TODO: Calculate change in weights for input layer to hidden layer
    delta_weights_input_hidden = learnrate * del_err_hidden * x[:, None]
    
    print('Change in weights for hidden layer to output layer:')
    print(delta_weights_hidden_output)
    print('Change in weights for input layer to hidden layer:')
    print(delta_weights_input_hidden)
    
    
    Change in weights for hidden layer to output layer:
    [ 0.00804047  0.00555918]
    Change in weights for input layer to hidden layer:
    [[  1.77005547e-04  -5.11178506e-04]
     [  3.54011093e-05  -1.02235701e-04]
     [ -7.08022187e-05   2.04471402e-04]]
    

    回顾训练方法

    迭代直到误差最小:

    1. 正向传播,获得预测值:沿着神经网络,矩阵点乘,计算出预测值$\hat y$。
    2. 反向传播,获得每层的误差梯度:用$\hat y$计算误差函数,反向传播误差。
    3. 更新权重:根据每层的误差更新每层的权重

    以预测研究生是否能入学为例 - 神经网络版本

    数据处理

    import numpy as np
    import pandas as pd
    
    admissions = pd.read_csv('entry_admission.csv')
    
    # Make dummy variables for rank
    data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
    data = data.drop('rank', axis=1)
    
    # Standarize features
    for field in ['gre', 'gpa']:
        mean, std = data[field].mean(), data[field].std()
        data.loc[:,field] = (data[field]-mean)/std
        
    # Split off random 10% of the data for testing
    np.random.seed(21)
    sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
    data, test_data = data.ix[sample], data.drop(sample)
    
    # Split into features and targets
    features, targets = data.drop('admit', axis=1), data['admit']
    features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']
    

    神经网络版本

    import numpy as np
    
    np.random.seed(21)
    
    def sigmoid(x):
        """
        Calculate sigmoid
        """
        return 1 / (1 + np.exp(-x))
    
    
    # Hyperparameters
    n_hidden = 2  # number of hidden units
    epochs = 900
    learnrate = 0.005
    
    n_records, n_features = features.shape
    last_loss = None
    # Initialize weights
    weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
                                            size=(n_features, n_hidden))
    weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
                                             size=n_hidden)
    
    for e in range(epochs):
        del_w_input_hidden = np.zeros(weights_input_hidden.shape)
        del_w_hidden_output = np.zeros(weights_hidden_output.shape)
        for x, y in zip(features.values, targets):
            ## Forward pass ##
            # TODO: Calculate the output
            hidden_input = np.dot(x, weights_input_hidden)
            hidden_output = sigmoid(hidden_input)
    
            output = sigmoid(np.dot(hidden_output,
                                    weights_hidden_output))
    
            ## Backward pass ##
            # TODO: Calculate the network's prediction error
            error = y - output
    
            # TODO: Calculate error term for the output unit
            output_error_term = error * output * (1 - output)
    
            ## propagate errors to hidden layer
    
            # TODO: Calculate the hidden layer's contribution to the error
            hidden_error = np.dot(output_error_term, weights_hidden_output)
    
            # TODO: Calculate the error term for the hidden layer
            hidden_error_term = hidden_error * hidden_output * (1 - hidden_output)
    
            # TODO: Update the change in weights
            del_w_hidden_output += output_error_term * hidden_output
            del_w_input_hidden += hidden_error_term * x[:,None]
    
        # TODO: Update weights
        weights_input_hidden += learnrate * del_w_input_hidden / n_records
        weights_hidden_output += learnrate * del_w_hidden_output / n_records
    
        # Printing out the mean square error on the training set
        if e % (epochs / 10) == 0:
            hidden_output = sigmoid(np.dot(x, weights_input_hidden))
            out = sigmoid(np.dot(hidden_output,
                                 weights_hidden_output))
            loss = np.mean((out - targets) ** 2)
    
            if last_loss and last_loss < loss:
                print("Train loss: ", loss, "  WARNING - Loss Increasing")
            else:
                print("Train loss: ", loss)
            last_loss = loss
    
    # Calculate accuracy on test data
    hidden = sigmoid(np.dot(features_test, weights_input_hidden))
    out = sigmoid(np.dot(hidden, weights_hidden_output))
    predictions = out > 0.5
    accuracy = np.mean(predictions == targets_test)
    print("Prediction accuracy: {:.3f}".format(accuracy))
    
    Train loss:  0.245943442947
    Train loss:  0.224108177301
    Train loss:  0.228908195703   WARNING - Loss Increasing
    Train loss:  0.230352461418   WARNING - Loss Increasing
    Train loss:  0.230651907986   WARNING - Loss Increasing
    Train loss:  0.230865845199   WARNING - Loss Increasing
    Train loss:  0.231183108301   WARNING - Loss Increasing
    Train loss:  0.231499116961   WARNING - Loss Increasing
    Train loss:  0.231737211823   WARNING - Loss Increasing
    Train loss:  0.231882889013   WARNING - Loss Increasing
    Prediction accuracy: 0.750
    

    关于我:

    linxinzhe,全栈工程师,目前供职于某500强通信企业。人工智能,区块链爱好者。

    GitHub:https://github.com/linxinzhe

    欢迎留言讨论,也欢迎关注我~
    我也会关注你的哦!

    相关文章

      网友评论

        本文标题:一小时培训之神经网络入门

        本文链接:https://www.haomeiwen.com/subject/aakwhftx.html