美文网首页
cs231n:assignment1——Q2: Training

cs231n:assignment1——Q2: Training

作者: X_Y | 来源:发表于2016-11-13 22:03 被阅读2507次

    Multiclass Support Vector Machine exercise

    Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

    In this exercise you will:

    • implement a fully-vectorized loss function for the SVM
    • implement the fully-vectorized expression for its analytic gradient
    • check your implementation using numerical gradient
    • use a validation set to tune the learning rate and regularization strength
    • optimize the loss function with SGD
    • visualize the final learned weights
    # Run some setup code for this notebook.
    
    import random
    import numpy as np
    from cs231n.data_utils import load_CIFAR10
    import matplotlib.pyplot as plt
    
    # This is a bit of magic to make matplotlib figures appear inline in the
    # notebook rather than in a new window.
    %matplotlib inline
    plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    
    # Some more magic so that the notebook will reload external python modules;
    # see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
    %load_ext autoreload
    %autoreload 2
    

    CIFAR-10 Data Loading and Preprocessing

    # Load the raw CIFAR-10 data.
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # As a sanity check, we print out the size of the training and test data.
    print 'Training data shape: ', X_train.shape
    print 'Training labels shape: ', y_train.shape
    print 'Test data shape: ', X_test.shape
    print 'Test labels shape: ', y_test.shape
    
    Training data shape:  (50000, 32, 32, 3)
    Training labels shape:  (50000,)
    Test data shape:  (10000, 32, 32, 3)
    Test labels shape:  (10000,)
    
    # Visualize some examples from the dataset.
    # We show a few examples of training images from each class.
    classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
    num_classes = len(classes)
    samples_per_class = 7
    for y, cls in enumerate(classes):
        idxs = np.flatnonzero(y_train == y)
        idxs = np.random.choice(idxs, samples_per_class, replace=False)
        for i, idx in enumerate(idxs):
            plt_idx = i * num_classes + y + 1
            plt.subplot(samples_per_class, num_classes, plt_idx)
            plt.imshow(X_train[idx].astype('uint8'))
            plt.axis('off')
            if i == 0:
                plt.title(cls)
    plt.show()
    
    output_4_0.png
    # Split the data into train, val, and test sets. In addition we will
    # create a small development set as a subset of the training data;
    # we can use this for development so our code runs faster.
    num_training = 49000
    num_validation = 1000
    num_test = 1000
    num_dev = 500
    
    # Our validation set will be num_validation points from the original
    # training set.
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    
    # Our training set will be the first num_train points from the original
    # training set.
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    
    # We will also make a development set, which is a small subset of
    # the training set.
    mask = np.random.choice(num_training, num_dev, replace=False)
    X_dev = X_train[mask]
    y_dev = y_train[mask]
    
    # We use the first num_test points of the original test set as our
    # test set.
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]
    
    print 'Train data shape: ', X_train.shape
    print 'Train labels shape: ', y_train.shape
    print 'Validation data shape: ', X_val.shape
    print 'Validation labels shape: ', y_val.shape
    print 'Test data shape: ', X_test.shape
    print 'Test labels shape: ', y_test.shape
    
    Train data shape:  (49000, 32, 32, 3)
    Train labels shape:  (49000,)
    Validation data shape:  (1000, 32, 32, 3)
    Validation labels shape:  (1000,)
    Test data shape:  (1000, 32, 32, 3)
    Test labels shape:  (1000,)
    
    # Preprocessing: reshape the image data into rows
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_val = np.reshape(X_val, (X_val.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
    
    # As a sanity check, print out the shapes of the data
    print 'Training data shape: ', X_train.shape
    print 'Validation data shape: ', X_val.shape
    print 'Test data shape: ', X_test.shape
    print 'dev data shape: ', X_dev.shape
    
    Training data shape:  (49000, 3072)
    Validation data shape:  (1000, 3072)
    Test data shape:  (1000, 3072)
    dev data shape:  (500, 3072)
    
    # Preprocessing: subtract the mean image
    # first: compute the image mean based on the training data
    mean_image = np.mean(X_train, axis=0)
    print mean_image[:10] # print a few of the elements
    plt.figure(figsize=(4,4))
    plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
    plt.show()
    
    [ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082
      131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]
    
    output_7_1.png
    # second: subtract the mean image from train and test data
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image
    X_dev -= mean_image
    
    # third: append the bias dimension of ones (i.e. bias trick) so that our SVM
    # only has to worry about optimizing a single weight matrix W.
    X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
    X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
    X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
    X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
    
    print X_train.shape, X_val.shape, X_test.shape, X_dev.shape
    
    (49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)
    

    SVM Classifier

    Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

    As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.

    # Evaluate the naive implementation of the loss we provided for you:
    from cs231n.classifiers.linear_svm import svm_loss_naive
    import time
    
    # generate a random SVM weight matrix of small numbers
    W = np.random.randn(3073, 10) * 0.0001 
    
    loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)
    print 'loss: %f' % (loss, )
    #print 'gradient:\n',grad
    
    loss: 8.653856
    

    The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

    To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:

    # Once you've implemented the gradient, recompute it with the code below
    # and gradient check it with the function we provided for you
    
    # Compute the loss and its gradient at W.
    loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)
    
    # Numerically compute the gradient along several randomly chosen dimensions, and
    # compare them with your analytically computed gradient. The numbers should match
    # almost exactly along all dimensions.
    from cs231n.gradient_check import grad_check_sparse
    f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
    grad_numerical = grad_check_sparse(f, W, grad)
    
    # do the gradient check once again with regularization turned on
    # you didn't forget the regularization gradient did you?
    loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)
    f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]
    grad_numerical = grad_check_sparse(f, W, grad)
    
    numerical: 9.876634 analytic: 9.876634, relative error: 6.314484e-11
    numerical: 11.418154 analytic: 11.418154, relative error: 3.110099e-11
    numerical: -4.932795 analytic: -4.932795, relative error: 2.367806e-11
    numerical: 6.115792 analytic: 6.115792, relative error: 6.813146e-11
    numerical: -6.553106 analytic: -6.553106, relative error: 1.830888e-11
    numerical: -8.081339 analytic: -8.081339, relative error: 2.038405e-12
    numerical: 2.793968 analytic: 2.793968, relative error: 3.716540e-11
    numerical: 16.980493 analytic: 16.980493, relative error: 4.563996e-12
    numerical: -14.402479 analytic: -14.402479, relative error: 2.293079e-12
    numerical: -34.970885 analytic: -34.970885, relative error: 9.304989e-12
    numerical: -20.692355 analytic: -20.692355, relative error: 5.414653e-12
    numerical: -8.209565 analytic: -8.209565, relative error: 2.027556e-12
    numerical: -6.901006 analytic: -6.901006, relative error: 2.459033e-11
    numerical: 12.632919 analytic: 12.632919, relative error: 1.403082e-11
    numerical: 5.002348 analytic: 5.002348, relative error: 7.206023e-11
    numerical: 6.653082 analytic: 6.653082, relative error: 3.586485e-11
    numerical: 3.535120 analytic: 3.535120, relative error: 5.062464e-11
    numerical: -6.794936 analytic: -6.794936, relative error: 3.387246e-11
    numerical: -12.991590 analytic: -12.991590, relative error: 2.995940e-11
    numerical: -0.018506 analytic: -0.018506, relative error: 9.479644e-09
    

    Inline Question 1:

    It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? Hint: the SVM loss function is not strictly speaking differentiable

    Your Answer: in zero loss function is not differentiable so the numerical will fail to check

    # Next implement the function svm_loss_vectorized; for now only compute the loss;
    # we will implement the gradient in a moment.
    tic = time.time()
    loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
    toc = time.time()
    print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)
    
    from cs231n.classifiers.linear_svm import svm_loss_vectorized
    tic = time.time()
    loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
    toc = time.time()
    print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)
    
    # The losses should match but your vectorized implementation should be much faster.
    print 'difference: %f' % (loss_naive - loss_vectorized)
    
    Naive loss: 8.653856e+00 computed in 0.164609s
    Vectorized loss: 8.653856e+00 computed in 0.040048s
    difference: -0.000000
    
    # Complete the implementation of svm_loss_vectorized, and compute the gradient
    # of the loss function in a vectorized way.
    
    # The naive implementation and the vectorized implementation should match, but
    # the vectorized version should still be much faster.
    tic = time.time()
    _, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
    toc = time.time()
    print 'Naive loss and gradient: computed in %fs' % (toc - tic)
    
    tic = time.time()
    _, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
    toc = time.time()
    print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)
    
    # The loss is a single number, so it is easy to compare the values computed
    # by the two implementations. The gradient on the other hand is a matrix, so
    # we use the Frobenius norm to compare them.
    difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
    print 'difference: %f' % difference
    
    Naive loss and gradient: computed in 0.167615s
    Vectorized loss and gradient: computed in 0.004274s
    difference: 0.000000
    

    Stochastic Gradient Descent

    We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.

    # In the file linear_classifier.py, implement SGD in the function
    # LinearClassifier.train() and then run it with the code below.
    from cs231n.classifiers import LinearSVM
    svm = LinearSVM()
    tic = time.time()
    loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
                          num_iters=1500, verbose=True)
    toc = time.time()
    print 'That took %fs' % (toc - tic)
    
    iteration 0 / 1500: loss 791.326093
    iteration 100 / 1500: loss 288.963939
    iteration 200 / 1500: loss 108.654002
    iteration 300 / 1500: loss 43.053838
    iteration 400 / 1500: loss 18.983433
    iteration 500 / 1500: loss 9.707113
    iteration 600 / 1500: loss 6.626212
    iteration 700 / 1500: loss 6.721240
    iteration 800 / 1500: loss 5.687028
    iteration 900 / 1500: loss 5.966098
    iteration 1000 / 1500: loss 4.982919
    iteration 1100 / 1500: loss 4.878817
    iteration 1200 / 1500: loss 5.248245
    iteration 1300 / 1500: loss 5.602829
    iteration 1400 / 1500: loss 5.545351
    That took 5.282246s
    
    # A useful debugging strategy is to plot the loss as a function of
    # iteration number:
    plt.plot(loss_hist)
    plt.xlabel('Iteration number')
    plt.ylabel('Loss value')
    plt.show()
    
    output_19_0.png
    # Write the LinearSVM.predict function and evaluate the performance on both the
    # training and validation set
    y_train_pred = svm.predict(X_train)
    print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
    y_val_pred = svm.predict(X_val)
    print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )
    
    training accuracy: 0.369857
    validation accuracy: 0.377000
    
    # Use the validation set to tune hyperparameters (regularization strength and
    # learning rate). You should experiment with different ranges for the learning
    # rates and regularization strengths; if you are careful you should be able to
    # get a classification accuracy of about 0.4 on the validation set.
    learning_rates = [1e-7, 5e-5]
    regularization_strengths = [5e4, 1e5]
    
    # results is dictionary mapping tuples of the form
    # (learning_rate, regularization_strength) to tuples of the form
    # (training_accuracy, validation_accuracy). The accuracy is simply the fraction
    # of data points that are correctly classified.
    results = {}
    best_val = -1   # The highest validation accuracy that we have seen so far.
    best_svm = None # The LinearSVM object that achieved the highest validation rate.
    
    ################################################################################
    # TODO:                                                                        #
    # Write code that chooses the best hyperparameters by tuning on the validation #
    # set. For each combination of hyperparameters, train a linear SVM on the      #
    # training set, compute its accuracy on the training and validation sets, and  #
    # store these numbers in the results dictionary. In addition, store the best   #
    # validation accuracy in best_val and the LinearSVM object that achieves this  #
    # accuracy in best_svm.                                                        #
    #                                                                              #
    # Hint: You should use a small value for num_iters as you develop your         #
    # validation code so that the SVMs don't take much time to train; once you are #
    # confident that your validation code works, you should rerun the validation   #
    # code with a larger value for num_iters.                                      #
    ################################################################################
    num_splt = 10
    for i in xrange(num_splt):
        for j in xrange(num_splt):
            learning_rate_ij = learning_rates[0] + i * (learning_rates[1] - learning_rates[0]) / num_splt
            reg_ij = regularization_strengths[0] + j * (regularization_strengths[1] - regularization_strengths[0])/ num_splt
            svm = LinearSVM()
            loss_hist = svm.train(X_train, y_train, learning_rate=learning_rate_ij, reg=reg_ij,
                                  num_iters=1500, verbose=False)
            
            y_train_pred = svm.predict(X_train)
            accuracy_train = np.mean(y_train == y_train_pred)
            y_val_pred = svm.predict(X_val)
            accuracy_val = np.mean(y_val == y_val_pred)
            
            results[(learning_rate_ij, reg_ij)] = (accuracy_train, accuracy_val) 
            
            if accuracy_val > best_val:
                best_val = accuracy_val
                best_svm = svm
    ################################################################################
    #                              END OF YOUR CODE                                #
    ################################################################################
        
    # Print out results.
    for lr, reg in sorted(results):
        train_accuracy, val_accuracy = results[(lr, reg)]
        print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
                    lr, reg, train_accuracy, val_accuracy)
        
    print 'best validation accuracy achieved during cross-validation: %f' % best_val
    
    lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.372939 val accuracy: 0.374000
    lr 1.000000e-07 reg 5.500000e+04 train accuracy: 0.362510 val accuracy: 0.378000
    lr 1.000000e-07 reg 6.000000e+04 train accuracy: 0.366796 val accuracy: 0.368000
    .
    .
    .
    lr 4.501000e-05 reg 8.500000e+04 train accuracy: 0.100265 val accuracy: 0.087000
    lr 4.501000e-05 reg 9.000000e+04 train accuracy: 0.100265 val accuracy: 0.087000
    lr 4.501000e-05 reg 9.500000e+04 train accuracy: 0.100265 val accuracy: 0.087000
    best validation accuracy achieved during cross-validation: 0.378000
    
    # Visualize the cross-validation results
    import math
    x_scatter = [math.log10(x[0]) for x in results]
    y_scatter = [math.log10(x[1]) for x in results]
    
    # plot training accuracy
    marker_size = 100
    colors = [results[x][0] for x in results]
    plt.subplot(2, 1, 1)
    plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
    plt.colorbar()
    plt.xlabel('log learning rate')
    plt.ylabel('log regularization strength')
    plt.title('CIFAR-10 training accuracy')
    
    # plot validation accuracy
    colors = [results[x][1] for x in results] # default size of markers is 20
    plt.subplot(2, 1, 2)
    plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
    plt.colorbar()
    plt.xlabel('log learning rate')
    plt.ylabel('log regularization strength')
    plt.title('CIFAR-10 validation accuracy')
    plt.show()
    
    output_22_0.png
    # Evaluate the best svm on test set
    y_test_pred = best_svm.predict(X_test)
    test_accuracy = np.mean(y_test == y_test_pred)
    print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy
    
    linear SVM on raw pixels final test set accuracy: 0.361000
    
    # Visualize the learned weights for each class.
    # Depending on your choice of learning rate and regularization strength, these may
    # or may not be nice to look at.
    w = best_svm.W[:-1,:] # strip out the bias
    w = w.reshape(32, 32, 3, 10)
    w_min, w_max = np.min(w), np.max(w)
    classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
    for i in xrange(10):
      plt.subplot(2, 5, i + 1)
        
      # Rescale the weights to be between 0 and 255
      wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
      plt.imshow(wimg.astype('uint8'))
      plt.axis('off')
      plt.title(classes[i])
    
    output_24_0.png

    Inline question 2:

    Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

    Your answer: fill this in

    相关文章

      网友评论

          本文标题:cs231n:assignment1——Q2: Training

          本文链接:https://www.haomeiwen.com/subject/ecncpttx.html