美文网首页机器学习大数据
神经网络思想建立LR模型(DL公开课第二周答案)

神经网络思想建立LR模型(DL公开课第二周答案)

作者: CodingFish | 来源:发表于2018-01-09 20:08 被阅读68次

    LR回顾

    [图片上传中...(Screen Shot 2018-01-09 at 7.19.20 PM.png-9af2c3-1515499661578-0)]

    LR计算图求导

    算法结构

    设计一个简单的算法实现判别是否是猫。
    用一个神经网络的思想建立一个LR模型,下面这个图解释了为什么LR事实上是一个简单的神经网。

    [图片上传失败...(image-4b2c8b-1515499689320)]

    Mathematical expression of the algorithm:

    For one example $x^{(i)}$:
    $$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
    $$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$
    $$ \mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$$

    The cost is then computed by summing over all training examples:
    $$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$

    构建算法的各个部分。

    建立神经网络的主要步骤是:

    1. 定义模型结构(例如输入特性的数量)

    2. 初始化模型的参数

    3. 循环:

      • 计算当前损失(正向传播)
      • 计算当前梯度(向后传播)
      • 更新参数(梯度下降)

    您通常将1-3单独构建并将它们集成到一个我们称为model()的函数中。

    1.工具函数

    # GRADED FUNCTION: sigmoid
    def sigmoid(z):
        """
        Compute the sigmoid of z
    
        Arguments:
        z -- A scalar or numpy array of any size.
    
        Return:
        s -- sigmoid(z)
        """
        s = 1/(1+np.exp(-z))
        
        return s
    

    2.初始化参数

    # GRADED FUNCTION: initialize_with_zeros
    def initialize_with_zeros(dim):
        """
        This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
        
        Argument:
        dim -- size of the w vector we want (or number of parameters in this case)
        
        Returns:
        w -- initialized vector of shape (dim, 1)
        b -- initialized scalar (corresponds to the bias)
        """
        
        w = np.zeros((dim,1))
        b = 0
    
        assert(w.shape == (dim, 1))
        assert(isinstance(b, float) or isinstance(b, int))
        
        return w, b
    

    3.向前和向后传播

    现在参数已经初始化,可以执行向前和向后传播步骤来学习参数。

    Exercise: 实现方法 propagate()计算代价函数和梯度
    Hints:

    Forward Propagation:

    • You get X
    • You compute $A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
    • You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}{m}y{(i)}\log(a{(i)})+(1-y{(i)})\log(1-a^{(i)})$

    Here are the two formulas you will be using:

    $$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$$
    $$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a{(i)}-y{(i)})\tag{8}$$

    # GRADED FUNCTION: propagate
    
    def propagate(w, b, X, Y):
        """
        Implement the cost function and its gradient for the propagation explained above
    
        Arguments:
        w -- weights, a numpy array of size (num_px * num_px * 3, 1)
        b -- bias, a scalar
        X -- data of size (num_px * num_px * 3, number of examples)
        Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
    
        Return:
        cost -- negative log-likelihood cost for logistic regression
        dw -- gradient of the loss with respect to w, thus same shape as w
        db -- gradient of the loss with respect to b, thus same shape as b
        
        Tips:
        - Write your code step by step for the propagation. np.log(), np.dot()
        """
        
        m = X.shape[1]
        
        # FORWARD PROPAGATION (FROM X TO COST)
        A = sigmoid(np.dot(w.T,X)+b)                             # compute activation
        cost = -np.sum((Y*np.log(A)+(1-Y)*np.log(1-A)))/m    # compute cost
        
        # BACKWARD PROPAGATION (TO FIND GRAD)
        dw = np.dot(X,(A-Y).T)/m
        db = np.sum((A-Y))/m
        assert(dw.shape == w.shape)
        assert(db.dtype == float)
        cost = np.squeeze(cost)
        assert(cost.shape == ())
        
        grads = {"dw": dw,
                 "db": db}
        
        return grads, cost
    

    4.优化

    • 已经初始化了参数。
    • 也可以计算一个成本函数和它的梯度。
    • 现在,需要使用梯度下降来更新参数。

    目标是通过最小化代价函数$J$来学习$w$ 和 $b$。对于$\theta$,更新规则是 $ \theta = \theta - \alpha \text{ } d\theta$,$\alpha$是学习率。

    # GRADED FUNCTION: optimize
    
    def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
        """
        This function optimizes w and b by running a gradient descent algorithm
        
        Arguments:
        w -- weights, a numpy array of size (num_px * num_px * 3, 1)
        b -- bias, a scalar
        X -- data of shape (num_px * num_px * 3, number of examples)
        Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
        num_iterations -- number of iterations of the optimization loop
        learning_rate -- learning rate of the gradient descent update rule
        print_cost -- True to print the loss every 100 steps
        
        Returns:
        params -- dictionary containing the weights w and bias b
        grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
        costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
        
        Tips:
        You basically need to write down two steps and iterate through them:
            1) Calculate the cost and the gradient for the current parameters. Use propagate().
            2) Update the parameters using gradient descent rule for w and b.
        """
        
        costs = []
        
        for i in range(num_iterations):
        
            # Cost and gradient calculation (≈ 1-4 lines of code)
            grads, cost = propagate(w, b, X, Y)
            # Retrieve derivatives from grads
            dw = grads["dw"]
            db = grads["db"]
            
            # update rule (≈ 2 lines of code)
            w = w-learning_rate*dw
            b = b-learning_rate*db
            
            # Record the costs
            if i % 100 == 0:
                costs.append(cost)
            
            # Print the cost every 100 training examples
            if print_cost and i % 100 == 0:
                print ("Cost after iteration %i: %f" %(i, cost))
        
        params = {"w": w,
                  "b": b}
        
        grads = {"dw": dw,
                 "db": db}
        
        return params, grads, costs
    

    5.预测

    前面的函数将输出学习的w和b,我们可以使用w和b来预测数据集x的标签,实现预测()函数。计算预测有两个步骤:

    1. Calculate $\hat{Y} = A = \sigma(w^T X + b)$

    2. Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).

    # GRADED FUNCTION: predict
    
    def predict(w, b, X):
        '''
        Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
        
        Arguments:
        w -- weights, a numpy array of size (num_px * num_px * 3, 1)
        b -- bias, a scalar
        X -- data of size (num_px * num_px * 3, number of examples)
        
        Returns:
        Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
        '''
        
        m = X.shape[1]
        Y_prediction = np.zeros((1,m))
        w = w.reshape(X.shape[0], 1)
        
        # Compute vector "A" predicting the probabilities of a cat being present in the picture
        A = sigmoid(np.dot(w.T,X)+b)
        for i in range(A.shape[1]):
            
            # Convert probabilities A[0,i] to actual predictions p[0,i]
            if A[0][i]>0.5:
                Y_prediction[0][i]=1            
                    
        assert(Y_prediction.shape == (1, m))
        
        return Y_prediction
    

    6.合并各个部分组成模型

    现在,将通过将所有构建块(在前面部分中实现的函数)组合在一起,以正确的顺序将整个模型构建起来。

    # GRADED FUNCTION: model
    
    def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
        """
        Builds the logistic regression model by calling the function you've implemented previously
        
        Arguments:
        X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
        Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
        X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
        Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
        num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
        learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
        print_cost -- Set to true to print the cost every 100 iterations
        
        Returns:
        d -- dictionary containing information about the model.
        """
            
        # initialize parameters with zeros (≈ 1 line of code)
        w, b = initialize_with_zeros(X_train.shape[0])
    
        # Gradient descent (≈ 1 line of code)
        parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = False)
        
        # Retrieve parameters w and b from dictionary "parameters"
        w = parameters["w"]
        b = parameters["b"]
        
        # Predict test/train set examples (≈ 2 lines of code)
        Y_prediction_test = predict(w, b, X_test)
        Y_prediction_train = predict(w, b, X_train)
    
        # Print train/test Errors
        print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
        print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
    
        
        d = {"costs": costs,
             "Y_prediction_test": Y_prediction_test, 
             "Y_prediction_train" : Y_prediction_train, 
             "w" : w, 
             "b" : b,
             "learning_rate" : learning_rate,
             "num_iterations": num_iterations}
        
        return d
    

    相关文章

      网友评论

        本文标题:神经网络思想建立LR模型(DL公开课第二周答案)

        本文链接:https://www.haomeiwen.com/subject/vcbgnxtx.html