美文网首页
Stanford cs231n Assignment #1 (c

Stanford cs231n Assignment #1 (c

作者: 麦兜胖胖次 | 来源:发表于2016-12-09 22:10 被阅读0次

    这篇文章讲的是用softmax分类器来实现分类任务。其实softmax和svm的分类效果在很多情况下都是差不多的,只是它们的各自数学理解不大一样。softmax是以概率的形式来呈现,最后的输出都是百分比的形式。

    有一篇比较好的文章是关于softmax和logistic的介绍和联系的,看完之后就可以理解,其实softmax只是在概率计算上和二分类的logistic不同罢了。http://blog.csdn.net/zhangliyao22/article/details/48379291
    还有另外一篇:http://www.cnblogs.com/guyj/p/3800519.html

    loss function:

    Paste_Image.png

    其实对于 f(x, W) = Wx 来说没有变,只是interpretion变为了以概率模型为解释,那么随之而来的是loss function的改变。

    Paste_Image.png Paste_Image.png

    个人比较喜欢的是用熵来解释softmax的意义:

    Paste_Image.png

    进一步解释的话就是说,对于一个数据,它分别计算出了相对n个类的score。但是“true"数据分布,其实就是一个[0,0,...0,1,0,...0,0]。根据截图中的公式,不难把softmax的公式和上图所示的交叉熵的公式互相转换并理解出来了。由于对于真实的分布,除了yi之外的所有类都应该为0,那么最后的公式就是 Li = - ( 1 * log (score(yi) / sum_score) ) 。。。易知分类正确的情况下这个loss function的值为0。也就是理想情况中的最小值,在训练时我们要最小化交叉熵代价函数。

    gradient

    此处我参考了一篇写的非常非常好的博客http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
    结论:

    Paste_Image.png
    这个结果是对softmax函数求导,也就是loss function的equivalent 表达式的前半部分(没负号),而不是加了log之后的...其中Si,Sj指的都是由矩阵乘法Wx计算得到并由softmax函数化为0-1之间的概率的score,i代表的是当前正在计算这个输入数据对应于第i个类的概率,j代表的是当前数据求对应第j个a值的偏导。这个计算得到的表达式不是最终的dL/dW,这只是dP/dA的表达,这个A是直接由矩阵乘法Wx得到的还没有映射为概率的值,而P则是softmax函数表达式。本身由A映射为P是一个N->N的映射,Jacobian为一个N by N的矩阵。(N代表类别个数)根据chain rule,想得到dL/dW的话还需要求dL/dP和dP/dW。

    详细推导如下:

    softmax.png

    代码如下:

    def softmax_loss_naive(W, X, y, reg):
      """
      Softmax loss function, naive implementation (with loops)
    
      Inputs have dimension D, there are C classes, and we operate on minibatches
      of N examples.
    
      Inputs:
      - W: A numpy array of shape (D, C) containing weights.
      - X: A numpy array of shape (N, D) containing a minibatch of data.
      - y: A numpy array of shape (N,) containing training labels; y[i] = c means
        that X[i] has label c, where 0 <= c < C.
      - reg: (float) regularization strength
    
      Returns a tuple of:
      - loss as single float
      - gradient with respect to weights W; an array of same shape as W
      """
      # Initialize the loss and gradient to zero.
      loss = 0.0
      dW = np.zeros_like(W)
    
      num_train = X.shape[0]
      num_classes = W.shape[1]
    
      #############################################################################
      # TODO: Compute the softmax loss and its gradient using explicit loops.     #
      # Store the loss in loss and the gradient in dW. If you are not careful     #
      # here, it is easy to run into numeric instability. Don't forget the        #
      # regularization!                                                           #
      #############################################################################
      for i in xrange(num_train):
        scores = X[i].dot(W)
        log_c = np.max(scores)           # this is for computation stability...
        p = []
        for j in xrange(num_classes):
          p.append(np.exp(scores[j] - log_c))
        loss += -np.log(p[y[i]]/np.sum(p))
        for j in xrange(num_classes):
          dW[:,j] += (p[j]/np.sum(p) - (j==y[i]))*X[i,:]
      
      loss /= num_train
      loss += 0.5 * reg * np.sum(W * W)
      dW /= num_train
      dW += reg*W
          
      #############################################################################
      #                          END OF YOUR CODE                                 #
      #############################################################################
    
      return loss, dW
    
    
    def softmax_loss_vectorized(W, X, y, reg):
      """
      Softmax loss function, vectorized version.
    
      Inputs and outputs are the same as softmax_loss_naive.
      """
      # Initialize the loss and gradient to zero.
      loss = 0.0
      dW = np.zeros_like(W)
    
      num_train = X.shape[0]
      num_classes = W.shape[1]
    
      #############################################################################
      # TODO: Compute the softmax loss and its gradient using no explicit loops.  #
      # Store the loss in loss and the gradient in dW. If you are not careful     #
      # here, it is easy to run into numeric instability. Don't forget the        #
      # regularization!                                                           #
      #############################################################################
      scores = X.dot(W)
      # log_c = np.max(scores, axis=1).reshape(X.shape[0], 1)
      scores -= np.max(scores, axis=1).reshape(X.shape[0], 1)
      scores_exp = np.exp(scores)
      sum_p = np.sum(scores_exp, axis=1).reshape(X.shape[0], 1)
      p_yi = scores_exp[:, y]
      p = scores_exp/sum_p
      loss = np.mean(-np.log( p_yi/sum_p))
      binary = np.zeros(p.shape)
      print binary.shape
      binary[range(binary.shape[0]), y] = 1
      print binary
      dW = np.dot(X.transpose(), p - binary)
      loss += 0.5 * reg * np.sum(W * W)
      dW /= num_train
      dW += reg*W
      #############################################################################
      #                          END OF YOUR CODE                                 #
      #############################################################################
    
      return loss, dW
    

    相关文章

      网友评论

          本文标题:Stanford cs231n Assignment #1 (c

          本文链接:https://www.haomeiwen.com/subject/svdymttx.html