美文网首页
深度学习笔记(三)—— 前馈神经网络的 Numpy 实现

深度学习笔记(三)—— 前馈神经网络的 Numpy 实现

作者: Nino_Lau | 来源:发表于2019-03-15 15:42 被阅读0次

    可能表格无法加载,文章迁移到这里!!

    Back Propagation

    We introduce back propagation in numpy and pytorch respectively.

    If you have some questions or suggestion about BackPropagation with Numpy, contact Jiaxin Zhuang or email(zhuangjx5@mail2.sysu.edu.cn)

    1. Simple expressions and interpretation of the gradient

    1.1 Simple expressions

    Lets start simple so that we can develop the notation and conventions for more complex expressions. Consider a simple multiplication function of two numbers f(x,y)=xy. It is a matter of simple calculus to derive the partial derivative for either input:

    f(x,y) = x y \hspace{0.5in} \rightarrow \hspace{0.5in} \frac{\partial f}{\partial x} = y \hspace{0.5in} \frac{\partial f}{\partial y} = x

    # set some inputs
    x1 = -2; x2 = 5;
    
    # perform the forward pass
    f = x1 * x2 # f becomes -10
    
    # perform the backward pass (backpropagation) in reverse order:
    # backprop through f = x * y
    dfdx1 = x2 # df/dx = y, so gradient on x becomes 5
    print("gradient on x is {:2}".format(dfdx1))
    dfdx2 = x1 # df/dy = x, so gradient on y becomes -2
    print('gradient on y is {:2}'.format(dfdx2))
    
    gradient on x is  5
    gradient on y is -2
    

    1.2 interpretation of the gradient

    Interpretation:The derivatives indicate the rate of change of a function with respect to that variable surrounding an infinitesimally small region near a particular point:
    \frac{df(x)}{dx} = \lim_{h\ \to 0} \frac{f(x + h) - f(x)}{h}
    In other words, the derivative on each variable tells you the sensitivity of the whole expression on its value.As mentioned, the gradient \nabla f is the vector of partial derivatives, so we have that \nabla f = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}] = [y, x].

    2. Compound expressions with chain rule

    2.1 Simple examples for chain rule

    Lets now start to consider more complicated expressions that involve multiple composed functions, such as f(x,y,z) = (x + y) z.

    This expression is still simple enough to differentiate directly, but we’ll take a particular approach to it that will be helpful with understanding the intuition behind backpropagation.

    In particular, note that this expression can be broken down into two expressions: q=x+y and f=qz. As seen in the previous section,f is just multiplication of q and z, so \frac{\partial f}{\partial q} = z, \frac{\partial f}{\partial z} = q,and q is addition of x and y so \frac{\partial q}{\partial x} = 1, \frac{\partial q}{\partial y} = 1.

    However, we don’t necessarily care about the gradient on the intermediate value q - the value of \frac{\partial f}{\partial q} is not useful. Instead, we are ultimately interested in the gradient of f with respect to its inputs x,y,z.

    The chain rule tells us that the correct way to “chain” these gradient expressions together is through multiplication. For example, \frac{\partial f}{\partial x} = \frac{\partial f}{\partial q} \frac{\partial q}{\partial x}. In practice this is simply a multiplication of the two numbers that hold the two gradients. Lets see this with an example:

    # set some inputs
    x = -2; y = 5; z = -4  
    
    # perform the forward pass
    q = 2*x + y # q becomes 1
    f = q * z # f becomes -4
    print(q, f)
    
    1 -4
    
    # perform the backward pass (backpropagation) in reverse order:
    # first backprop through f = q * z = (2*x+y) * z
    dfdz = q # df/dz = q, so gradient on z becomes 3
    dfdq = z # df/dq = z, so gradient on q becomes -4
    # now backprop through q = x + y
    dfdx = 2.0 * dfdq # dq/dx = 2. And the multiplication here is the chain rule!
    dfdy = 1.0 * dfdq # dq/dy = 1
    print('df/dx is {:2}'.format(dfdx))
    print('df/dy is {:2}'.format(dfdy))
    
    df/dx is -8.0
    df/dy is -4.0
    

    2.2 Intuitive understanding of backpropagation

    Notice that backpropagation is a beautifully local process.
    Every gate in a circuit diagram gets some inputs and can right away compute two things:

    1. its output value and
    2. the local gradient of its inputs with respect to its output value.

    3. Practice: Writing a simple Feedforward Neural Network

    3.1 Outline

    We would implement a simple feedforward neural network by using numpy. Thus, we need to define network and implement the forward pass as well as the backword propagation.

    1. Define a simpel feedforward neural netork, with 1 hidden layer. Implement forward and backward
    2. Load data from local csv file with pandas, which contains some training and testing dots, generated by 3 different gaussian distribution.(different mean and std).
    3. Define some functions for visualization and training
    4. Training and predicting every epoch
    5. plot the distribution of the points' label and the predictions
    # Load necessary module for later
    import numpy as np
    import pandas as pd
    np.random.seed(1024)
    

    3.2 Define a Feedforward Neural Netowk, implement forward and backward

    A simple Neural Network with 1 hidden layer.

                                       Networks Structure
                             
                                        Input        Weights            Output
    Hidden Layer                     [batch_size, 2] x [2,5]   ->   [batch_size, 5]
    activation function(sigmoid)     [batch_size, 5]           ->   [batch_size, 5]
    Classification Layer             [batch_size, 5] x [5,3]   ->   [batch_size, 3]
    activation function(sigmoid)     [batch_size, 3]           ->   [batch_size, 3]
    

    According to training and testing data. Each points is in two-dimension space, and there is three categories. And predictions would be a one-hot vector, like [0 0 1] , [1 0 0], [0 1 0]

    w1_initialization = np.random.randn(2, 5)  
    w2_initialization = np.random.randn(5, 3)  
    
    w2_initialization
    
    array([[-0.06510141,  0.80681666, -0.5778176 ],
           [ 0.57306064, -0.33667496,  0.29700734],
           [-0.37480416,  0.15510474,  0.70485719],
           [ 0.8452178 , -0.65818079,  0.56810558],
           [ 0.51538125, -0.61564998,  0.92611427]])
    
    class FeedForward_Neural_Network(object):
        def __init__(self, learning_rate):
            self.input_channel = 2  #  number of input neurons
            self.output_channel = 3 #  number of output neurons
            self.hidden_channel = 5 # number of hidden neurons
            self.learning_rate = learning_rate
            
            # weights initialization
            # Usually, we use random or uniform initialzation to initialize weight
            # For simplicity, here we use same array to initialze 
    #         np.random.randn(self.input_channel, self.hidden_channel) 
            # (2x5) weight matrix from input to hidden layer
            self.weight1 = np.array([[ 2.12444863,  0.25264613,  1.45417876,  0.56923979,  0.45822365],
                                     [-0.80933344,  0.86407349,  0.20170137, -1.87529904, -0.56850693]])
             
            # (5x3) weight matrix from hidden to output layer
    #         np.random.randn(self.hidden_channel, self.output_channel)  
            self.weight2 = np.array([ [-0.06510141,  0.80681666, -0.5778176 ],
                                   [ 0.57306064, -0.33667496,  0.29700734],
                                   [-0.37480416,  0.15510474,  0.70485719],
                                   [ 0.8452178 , -0.65818079,  0.56810558],
                                   [ 0.51538125, -0.61564998,  0.92611427]])
    
        
        def forward(self, X):
            """forward propagation through our network
            """
            # dot product of X (input) and first set of 3x2 weights
            self.h1 = np.dot(X, self.weight1)  
            # activation function
            self.z1 = self.sigmoid(self.h1)  
            # dot product of hidden layer (z2) and second set of 3x1 weights
            self.h2 = np.dot(self.z1, self.weight2) 
            # final activation function
            o = self.sigmoid(self.h2)
            return o
        
        def backward(self, X, y, o):
            """Backward, compute gradient and update parameters
            Inputs:
                X: data, [batch_size, 2]
                y: label, one-hot vector, [batch_size, 3]
                o: predictions, [batch_size, 3]
            """
            # backward propgate through the network
            self.o_error = y - o  # error in output
             # applying derivative of sigmoid to error  delata L
            self.o_delta = self.o_error * self.sigmoid_prime(o) 
    
            # z1 error: how much our hidden layer weights contributed to output error
            self.z1_error = self.o_delta.dot(self.weight2.T)  
            # applying derivative of sigmoid to z1 error
            self.z1_delta = self.z1_error * self.sigmoid_prime(self.z1)  
    
            # adjusting first set (input --> hidden) weights
            self.weight1 += X.T.dot(self.z1_delta) * self.learning_rate  
            # adjusting second set (hidden --> output) weights
            self.weight2 += self.z1.T.dot(self.o_delta) * self.learning_rate 
            
        def sigmoid(self, s):
            """activation function
            """
            return 1 / (1 + np.exp(-s))
    
        def sigmoid_prime(self, s):
            """derivative of sigmoid
            """
            return s * (1 - s)
    

    3.3 Loading Data From local csv by using Pandas

    # Import Module
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.cm as cm
    import math
    
    train_csv_file = './labels/train.csv'
    test_csv_file = './labels/test.csv'
    # Load data from csv file, without header
    train_frame = pd.read_csv(train_csv_file, encoding='utf-8', header=None)
    test_frame = pd.read_csv(test_csv_file, encoding='utf-8', header=None)
    
    # show data in Dataframe format (defined in pandas)
    train_frame
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>
    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>0</th>
    <th>1</th>
    <th>2</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>11.834241</td>
    <td>11.866105</td>
    <td>1</td>
    </tr>
    <tr>
    <th>1</th>
    <td>8.101150</td>
    <td>9.324800</td>
    <td>1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>11.184679</td>
    <td>1.196726</td>
    <td>2</td>
    </tr>
    <tr>
    <th>3</th>
    <td>8.911888</td>
    <td>-0.044024</td>
    <td>2</td>
    </tr>
    <tr>
    <th>4</th>
    <td>9.863982</td>
    <td>0.151162</td>
    <td>2</td>
    </tr>
    <tr>
    <th>5</th>
    <td>9.427897</td>
    <td>-0.598807</td>
    <td>2</td>
    </tr>
    <tr>
    <th>6</th>
    <td>10.038352</td>
    <td>2.133938</td>
    <td>2</td>
    </tr>
    <tr>
    <th>7</th>
    <td>11.149009</td>
    <td>-0.726649</td>
    <td>2</td>
    </tr>
    <tr>
    <th>8</th>
    <td>9.041540</td>
    <td>2.972213</td>
    <td>2</td>
    </tr>
    <tr>
    <th>9</th>
    <td>13.413336</td>
    <td>-3.174030</td>
    <td>2</td>
    </tr>
    <tr>
    <th>10</th>
    <td>-0.385824</td>
    <td>0.388751</td>
    <td>0</td>
    </tr>
    <tr>
    <th>11</th>
    <td>-0.192905</td>
    <td>1.562469</td>
    <td>0</td>
    </tr>
    <tr>
    <th>12</th>
    <td>10.735249</td>
    <td>7.702754</td>
    <td>1</td>
    </tr>
    <tr>
    <th>13</th>
    <td>-3.024363</td>
    <td>2.518729</td>
    <td>0</td>
    </tr>
    <tr>
    <th>14</th>
    <td>10.694739</td>
    <td>11.442958</td>
    <td>1</td>
    </tr>
    <tr>
    <th>15</th>
    <td>10.672035</td>
    <td>0.163851</td>
    <td>2</td>
    </tr>
    <tr>
    <th>16</th>
    <td>9.717515</td>
    <td>-0.673383</td>
    <td>2</td>
    </tr>
    <tr>
    <th>17</th>
    <td>7.757028</td>
    <td>-2.540235</td>
    <td>2</td>
    </tr>
    <tr>
    <th>18</th>
    <td>0.195954</td>
    <td>0.843201</td>
    <td>0</td>
    </tr>
    <tr>
    <th>19</th>
    <td>10.359054</td>
    <td>11.489937</td>
    <td>1</td>
    </tr>
    <tr>
    <th>20</th>
    <td>10.245470</td>
    <td>10.873774</td>
    <td>1</td>
    </tr>
    <tr>
    <th>21</th>
    <td>9.767327</td>
    <td>9.450749</td>
    <td>1</td>
    </tr>
    <tr>
    <th>22</th>
    <td>12.402497</td>
    <td>11.861342</td>
    <td>1</td>
    </tr>
    <tr>
    <th>23</th>
    <td>0.980769</td>
    <td>-1.524264</td>
    <td>0</td>
    </tr>
    <tr>
    <th>24</th>
    <td>-2.113837</td>
    <td>2.111235</td>
    <td>0</td>
    </tr>
    <tr>
    <th>25</th>
    <td>0.076416</td>
    <td>0.650588</td>
    <td>0</td>
    </tr>
    <tr>
    <th>26</th>
    <td>0.670296</td>
    <td>-0.344045</td>
    <td>0</td>
    </tr>
    <tr>
    <th>27</th>
    <td>10.452718</td>
    <td>9.419734</td>
    <td>1</td>
    </tr>
    <tr>
    <th>28</th>
    <td>10.647860</td>
    <td>8.271140</td>
    <td>1</td>
    </tr>
    <tr>
    <th>29</th>
    <td>-0.095686</td>
    <td>2.692840</td>
    <td>0</td>
    </tr>
    <tr>
    <th>...</th>
    <td>...</td>
    <td>...</td>
    <td>...</td>
    </tr>
    <tr>
    <th>180</th>
    <td>0.239345</td>
    <td>-2.378022</td>
    <td>0</td>
    </tr>
    <tr>
    <th>181</th>
    <td>1.497582</td>
    <td>-2.700999</td>
    <td>0</td>
    </tr>
    <tr>
    <th>182</th>
    <td>-0.471785</td>
    <td>0.856114</td>
    <td>0</td>
    </tr>
    <tr>
    <th>183</th>
    <td>13.690628</td>
    <td>11.552953</td>
    <td>1</td>
    </tr>
    <tr>
    <th>184</th>
    <td>10.652533</td>
    <td>10.357309</td>
    <td>1</td>
    </tr>
    <tr>
    <th>185</th>
    <td>8.714084</td>
    <td>9.839341</td>
    <td>1</td>
    </tr>
    <tr>
    <th>186</th>
    <td>12.177913</td>
    <td>10.932641</td>
    <td>1</td>
    </tr>
    <tr>
    <th>187</th>
    <td>10.049335</td>
    <td>8.478106</td>
    <td>1</td>
    </tr>
    <tr>
    <th>188</th>
    <td>1.370425</td>
    <td>2.321562</td>
    <td>0</td>
    </tr>
    <tr>
    <th>189</th>
    <td>2.189643</td>
    <td>0.012325</td>
    <td>0</td>
    </tr>
    <tr>
    <th>190</th>
    <td>7.425213</td>
    <td>10.904103</td>
    <td>1</td>
    </tr>
    <tr>
    <th>191</th>
    <td>6.836717</td>
    <td>10.750923</td>
    <td>1</td>
    </tr>
    <tr>
    <th>192</th>
    <td>8.911069</td>
    <td>11.032682</td>
    <td>1</td>
    </tr>
    <tr>
    <th>193</th>
    <td>8.819191</td>
    <td>11.310835</td>
    <td>1</td>
    </tr>
    <tr>
    <th>194</th>
    <td>-0.807627</td>
    <td>-1.435569</td>
    <td>0</td>
    </tr>
    <tr>
    <th>195</th>
    <td>-1.687238</td>
    <td>1.345539</td>
    <td>0</td>
    </tr>
    <tr>
    <th>196</th>
    <td>9.856732</td>
    <td>10.116610</td>
    <td>1</td>
    </tr>
    <tr>
    <th>197</th>
    <td>9.648434</td>
    <td>8.059552</td>
    <td>1</td>
    </tr>
    <tr>
    <th>198</th>
    <td>-0.223917</td>
    <td>1.003647</td>
    <td>0</td>
    </tr>
    <tr>
    <th>199</th>
    <td>10.004307</td>
    <td>8.482203</td>
    <td>1</td>
    </tr>
    <tr>
    <th>200</th>
    <td>12.090931</td>
    <td>9.942670</td>
    <td>1</td>
    </tr>
    <tr>
    <th>201</th>
    <td>10.983798</td>
    <td>10.193395</td>
    <td>1</td>
    </tr>
    <tr>
    <th>202</th>
    <td>0.109491</td>
    <td>-1.238625</td>
    <td>0</td>
    </tr>
    <tr>
    <th>203</th>
    <td>-1.068244</td>
    <td>-0.996179</td>
    <td>0</td>
    </tr>
    <tr>
    <th>204</th>
    <td>0.341772</td>
    <td>-0.582299</td>
    <td>0</td>
    </tr>
    <tr>
    <th>205</th>
    <td>-1.344687</td>
    <td>-0.894215</td>
    <td>0</td>
    </tr>
    <tr>
    <th>206</th>
    <td>-0.711753</td>
    <td>-2.676756</td>
    <td>0</td>
    </tr>
    <tr>
    <th>207</th>
    <td>-0.625906</td>
    <td>-2.659784</td>
    <td>0</td>
    </tr>
    <tr>
    <th>208</th>
    <td>9.685143</td>
    <td>10.292463</td>
    <td>1</td>
    </tr>
    <tr>
    <th>209</th>
    <td>9.921518</td>
    <td>12.654102</td>
    <td>1</td>
    </tr>
    </tbody>
    </table>
    <p>210 rows × 3 columns</p>
    </div>

    # obtain data from specific columns
    
    # obtain data from first and second columns and convert into narray
    train_data = train_frame.iloc[:,0:2].values 
    # obtain labels from third columns and convert into narray
    train_labels = train_frame.iloc[:,2].values 
    # obtain data from first and second columns and convert into narray
    test_data = test_frame.iloc[:,0:2].values
    # obtain labels from third columns and convert into narray
    test_labels = test_frame.iloc[:,2].values
    
    # train & test data shape
    print(train_data.shape)
    print(test_data.shape)
    # train & test labels shape
    print(train_labels.shape)
    print(test_labels.shape)
    
    (210, 2)
    (90, 2)
    (210,)
    (90,)
    

    3.4 Define some function for visualization and training

    def plot(data, labels, caption):
        """plot the data distribution, !!YOU CAN READ THIS LATER, if you are interested
        """
        colors = cm.rainbow(np.linspace(0, 1, len(set(labels))))
        for i in set(labels):
            xs = []
            ys = []
            for index, label in enumerate(labels):
                if label == i:
                    xs.append(data[index][0])
                    ys.append(data[index][1])
            plt.scatter(xs, ys, colors[int(i)])    
        plt.title(caption)
        plt.xlabel('x')
        plt.ylabel('y')
        plt.show()
    
    plot(train_data, train_labels, 'train_dataset')
    
    image
    plot(test_data, test_labels, 'test_dataset')
    
    image
    def int2onehot(label):
        """conver labels into one-hot vector, !!YOU CAN READ THIS LATER, if you are interested
        Args:
            label: [batch_size]
        Returns:
            onehot: [batch_size, categories]
        """
        dims = len(set(label))
        imgs_size = len(label)
        onehot = np.zeros((imgs_size, dims))
        onehot[np.arange(imgs_size), label] = 1
        return onehot
    
    # convert labels into one hot vector
    train_labels_onehot = int2onehot(train_labels)
    test_labels_onehot = int2onehot(test_labels)
    print(train_labels_onehot.shape)
    print(train_labels_onehot.shape)
    
    (210, 3)
    (210, 3)
    
    def get_accuracy(predictions, labels):
        """Compute accuracy, !!YOU CAN READ THIS LATER, if you are interested
        Inputs: 
            predictions:[batch_size, categories] one-hot vector
            labels: [batch_size, categories]
        """
        predictions = np.argmax(predictions, axis=1)
        labels = np.argmax(labels, axis=1)
        all_imgs = len(labels)
        predict_true = np.sum(predictions == labels)
        return predict_true/all_imgs
    
    # Please read this function carefully, related to implementation of GD, SGD, and mini-batch
    def generate_batch(train_data, train_labels, batch_size):
        """Generate batch
        when batch_size=len(train_data), it's GD
        when batch_size=1, it's SGD
        when batch_size>1 & batch_size<len(train_data), it's mini-batch, usually, batch_size=2,4,8,16...
        """
        iterations = math.ceil(len(train_data)/batch_size)
        for i in range(iterations):
            index_from = i*batch_size
            index_end = (i+1)*batch_size
            yield (train_data[index_from:index_end], train_labels[index_from:index_end])
    
    def show_curve(ys, title):
        """plot curve for Loss and Accuacy, !!YOU CAN READ THIS LATER, if you are interested
        Args:
            ys: loss or acc list
            title: Loss or Accuracy
        """
        x = np.array(range(len(ys)))
        y = np.array(ys)
        plt.plot(x, y, c='b')
        plt.axis()
        plt.title('{} Curve:'.format(title))
        plt.xlabel('Epoch')
        plt.ylabel('{} Value'.format(title))
        plt.show()
    

    3.5 Training model and make predictions

    learning_rate = 0.1
    
    epochs = 400 # training epoch
    
    batch_size = len(train_data) # GD
    # batch_size = 1               # SGD
    # batch_size = 8               # mini-batch
    
    model = FeedForward_Neural_Network(learning_rate) # declare a simple feedforward neural model
    
    losses = []
    accuracies = []
    
    for i in range(epochs):
        loss = 0
        for index, (xs, ys) in enumerate(generate_batch(train_data, train_labels_onehot, batch_size)):
            predictions = model.forward(xs) # forward phase
            loss += 1/2 * np.mean(np.sum(np.square(ys-predictions), axis=1)) # Mean square error
            model.backward(xs, ys, predictions) # backward phase
            
        losses.append(loss)
        
        # train dataset acc computation
        predictions = model.forward(train_data)
        # compute acc on train dataset
        accuracy = get_accuracy(predictions, train_labels_onehot)
        accuracies.append(accuracy)
        
        if i % 50 == 0:
            print('Epoch: {}, has {} iterations'.format(i, index+1))
            print('\tLoss: {:.4f}, \tAccuracy: {:.4f}'.format(loss, accuracy))
            
    test_predictions = model.forward(test_data)
    # compute acc on test dataset
    test_accuracy = get_accuracy(test_predictions, test_labels_onehot)
    print('Test Accuracy: {:.4f}'.format(test_accuracy))
    
    Epoch: 0, has 1 iterations
        Loss: 0.4185,   Accuracy: 0.3381
    Epoch: 50, has 1 iterations
        Loss: 0.0309,   Accuracy: 0.9571
    Epoch: 100, has 1 iterations
        Loss: 0.0334,   Accuracy: 0.9714
    Epoch: 150, has 1 iterations
        Loss: 0.0233,   Accuracy: 1.0000
    Epoch: 200, has 1 iterations
        Loss: 0.0044,   Accuracy: 1.0000
    Epoch: 250, has 1 iterations
        Loss: 0.0955,   Accuracy: 0.8286
    Epoch: 300, has 1 iterations
        Loss: 0.0322,   Accuracy: 0.9667
    Epoch: 350, has 1 iterations
        Loss: 0.0151,   Accuracy: 0.9476
    Test Accuracy: 0.9111
    

    3.6 Show results

    # Draw losses curve using losses 
    show_curve(losses, 'Loss')
    
    image
    # Draw Accuracy curve using accuracies
    show_curve(accuracies, 'Accuracy')
    
    image

    4. Problems

    4.1 Problem 1

    Describe the training procedure, based on codes above.

    The procedure above uses a feedforward neural network to complete a process.

    4.1.1 Data Process

    Python uses Pandas to load csv files to convert the raw data into dataframe format.

    4.1.1.1 Files
    • label.csv includes the total observations of the whole dataset, which then split into two parts, train.csv and test.csv.
    • train.csv is the training set of this neural network. This file includes 210 observations which are 2-D, and each observation in it owns a label. Labels range from 0 to 2.
    • test.csv is the test set of this neural network. This file includes 90 observations which are 2-D, and each observation in it owns a label. Labels range from 0 to 2.
    4.1.1.2 Data and Lable

    After the I/O, we further split the raw data into two classes, i.e. data and labels. From the plots above, we can clearly see that the whole data are shown in a 2-D figure, and there emerges 3 clusters according to labels.

    In this process, we only use one iteration. In other words, the batch is all of the test as a whole. And we have 400 epochs, meaning the train data is used 400 times for training. We check the loss and accuracy of our network every 50 ephoch. Now we has 0.1 \alpha, learning rate.

    4.1.1.3 One-Hot

    There is a trick of one-hot. One-hot coding treats each state bit as a feature.

    • Advantages: Firstly, it solves the problem that the classifier can not handle discrete data well, and secondly, to some extent, it also plays the role of expanding features (the number of features of the above samples is expanded from 3 to 9).
    • Disadvantage: There are some shortcomings in the representation of text features, which are very prominent. Firstly, it is a bag of words model, which does not consider the order between words (the order information of words in text is also very important); secondly, it assumes that words and words are independent (in most cases, words and words interact with each other); lastly, the features it obtains are discrete and sparse.

    4.1.2 Neural Network

    Our neural network requires 2-D input data, 5 neurals in the hidden layer classify the data into 3 classes. Notice that although we usually use random matrix to initialize the weight matrices. For simplicity, here we use two pre-set matrices.

    4.1.2.1 Forward

    The neural network predict labels by forward procedure. By multiplying the origin input and nn.weight1, we activate the result to get the intermediate product.By multiplying the origin intermediate product and nn.weight2, we activate the result and get the final prediction.

    4.1.2.2 Backward

    Notice that the training set has important value - labels. We calculate mean-square errors and use grediant descent method to minimize them. To decrease the grediant, we need to find the direction which can decrease the MSE in the faster manner. Finally, weights are renewen in each step.

    4.1.3 Test

    To verify the correctness of our method, we plot the curve of 400 epochs. We can see that the accuracy are imcreasing and the loss is decreasing, generally. However, strangely, we can see that the loss curve has a huge improvement in epoch 150, 300 and 350. Why?

    After further exploration, we finally get the problem that the learing step is too long, our method cannot find the global optimization. That's a really an interesting thing! We are able to get the optimized answers below in problem2.

    4.2 Problem 2

    Set learning rate = 0.01 to train the model and show two curve below.

    learning_rate = 0.01
    epochs = 400 # training epoch
    batch_size = len(train_data) # GD
    # batch_size = 1               # SGD
    # batch_size = 8               # mini-batch
    
    model = FeedForward_Neural_Network(learning_rate) # declare a simple feedforward neural model
    
    losses = []
    accuracies = []
    
    for i in range(epochs):
        loss = 0
        for index, (xs, ys) in enumerate(generate_batch(train_data, train_labels_onehot, batch_size)):
            predictions = model.forward(xs) # forward phase
            loss += 1/2 * np.mean(np.sum(np.square(ys-predictions), axis=1)) # Mean square error
            model.backward(xs, ys, predictions) # backward phase
            
        losses.append(loss)
        
        # train dataset acc computation
        predictions = model.forward(train_data)
        # compute acc on train dataset
        accuracy = get_accuracy(predictions, train_labels_onehot)
        accuracies.append(accuracy)
        
        if i % 50 == 0:
    #         print('Epoch: {}, has {} iterations'.format(i, index+1))
    #         print('\tLoss: {:.4f}, \tAccuracy: {:.4f}'.format(loss, accuracy))
            pass
            
    test_predictions = model.forward(test_data)
    # compute acc on test dataset
    test_accuracy = get_accuracy(test_predictions, test_labels_onehot)
    print('Test Accuracy: {:.4f}'.format(test_accuracy))
    
    # Draw losses curve using losses 
    show_curve(losses, 'Loss')
    
    # Draw Accuracy curve using accuracies
    show_curve(accuracies, 'Accuracy')
    
    Test Accuracy: 1.0000
    
    image image

    See? Loss decreases positive correlate with epochs and the accuracy is increasing to one!

    4.3 Problem 3

    Use SGD and mini-batch to train model and show four curve below.

    4.3.1 SGD

    learning_rate = 0.1
    epochs = 400 # training epoch
    # batch_size = len(train_data) # GD
    batch_size = 1               # SGD
    # batch_size = 8               # mini-batch
    
    model = FeedForward_Neural_Network(learning_rate) # declare a simple feedforward neural model
    
    losses = []
    accuracies = []
    
    for i in range(epochs):
        loss = 0
        for index, (xs, ys) in enumerate(generate_batch(train_data, train_labels_onehot, batch_size)):
            predictions = model.forward(xs) # forward phase
            loss += 1/2 * np.mean(np.sum(np.square(ys-predictions), axis=1)) # Mean square error
            model.backward(xs, ys, predictions) # backward phase
            
        losses.append(loss)
        
        # train dataset acc computation
        predictions = model.forward(train_data)
        # compute acc on train dataset
        accuracy = get_accuracy(predictions, train_labels_onehot)
        accuracies.append(accuracy)
        
        if i % 50 == 0:
    #         print('Epoch: {}, has {} iterations'.format(i, index+1))
    #         print('\tLoss: {:.4f}, \tAccuracy: {:.4f}'.format(loss, accuracy))
            pass
            
    test_predictions = model.forward(test_data)
    # compute acc on test dataset
    test_accuracy = get_accuracy(test_predictions, test_labels_onehot)
    print('Test Accuracy: {:.4f}'.format(test_accuracy))
    
    # Draw losses curve using losses 
    show_curve(losses, 'Loss')
    
    # Draw Accuracy curve using accuracies
    show_curve(accuracies, 'Accuracy')
    
    Test Accuracy: 1.0000
    
    image image

    4.3.2 Mini-Batch

    learning_rate = 0.1
    epochs = 400 # training epoch
    # batch_size = len(train_data) # GD
    # batch_size = 1               # SGD
    batch_size = 8               # mini-batch
    
    model = FeedForward_Neural_Network(learning_rate) # declare a simple feedforward neural model
    
    losses = []
    accuracies = []
    
    for i in range(epochs):
        loss = 0
        for index, (xs, ys) in enumerate(generate_batch(train_data, train_labels_onehot, batch_size)):
            predictions = model.forward(xs) # forward phase
            loss += 1/2 * np.mean(np.sum(np.square(ys-predictions), axis=1)) # Mean square error
            model.backward(xs, ys, predictions) # backward phase
            
        losses.append(loss)
        
        # train dataset acc computation
        predictions = model.forward(train_data)
        # compute acc on train dataset
        accuracy = get_accuracy(predictions, train_labels_onehot)
        accuracies.append(accuracy)
        
        if i % 50 == 0:
    #         print('Epoch: {}, has {} iterations'.format(i, index+1))
    #         print('\tLoss: {:.4f}, \tAccuracy: {:.4f}'.format(loss, accuracy))
            pass
            
    test_predictions = model.forward(test_data)
    # compute acc on test dataset
    test_accuracy = get_accuracy(test_predictions, test_labels_onehot)
    print('Test Accuracy: {:.4f}'.format(test_accuracy))
    
    # Draw losses curve using losses 
    show_curve(losses, 'Loss')
    
    # Draw Accuracy curve using accuracies
    show_curve(accuracies, 'Accuracy')
    
    Test Accuracy: 1.0000
    
    image image

    From this perspective, we can see that a remedy to the defficiency of step length is to increase iterations. We are able to copensate this problem by adopting Mini-batch and SGD.

    相关文章

      网友评论

          本文标题:深度学习笔记(三)—— 前馈神经网络的 Numpy 实现

          本文链接:https://www.haomeiwen.com/subject/ftqrmqtx.html