美文网首页
Andrew Ng Machine Learning Notes

Andrew Ng Machine Learning Notes

作者: pstrike | 来源:发表于2018-12-19 22:20 被阅读5次

    Notation

    n = number of features
    m = number of training examples
    x(i) = input of ith training example
    xj(i) = value of feature j in ith training example

    Model

    x(i) to denote the “input” variables (living area in this example), also called input features
    y(i) to denote the “output” or target variable
    A pair (x(i),y(i)) is called a training example
    a list of m training examples (x(i),y(i));i=1,...,m—is called a training set

    Cost Function

    Gradient Descent

    The gradient descent algorithm is:


    where j=0,1 represents the feature index number.

    Feature Scaling
    To achieve gradient decent goal, Two techniques to help with this are feature scaling and mean normalization.

    • Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1.
    • Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero.

    Where μi is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation.

    Learning Rate

    • If α is too small: slow convergence.
    • If α is too large: may not decrease on every iteration and thus may not converge.

    Polynomial Regression
    Our hypothesis function need not be linear (a straight line) if that does not fit the data well. We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).

    Normal Equation

    In the "Normal Equation" method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero. This allows us to find the optimum theta without iteration. The normal equation formula is given below:


    In practice, when n exceeds 10,000 it might be a good time to go from a normal solution to an iterative process.

    Logistic Regression





    For multiple classification, we predict the probability that 'y' is a member of one of our classes.


    We are basically choosing one class and then lumping all the others into a single second class. We do this repeatedly, applying binary logistic regression to each case, and then use the hypothesis that returned the highest value as our prediction.

    Overfitting




    The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.



    Neural Networks



    Reference

    • Coursera - Machine Learning Stanford University

    相关文章

      网友评论

          本文标题:Andrew Ng Machine Learning Notes

          本文链接:https://www.haomeiwen.com/subject/nxiskqtx.html