美文网首页
深度学习基础2

深度学习基础2

作者: Rain师兄 | 来源:发表于2022-02-07 19:24 被阅读0次

Stochastic Gradient Descent

如何训练神经网络

训练神经网络就是调整权重。

这里介绍了损失函数:measures how good the network's predictions are

An "optimizer" that can tell the network how to change its weights.

The Loss Function
损失函数衡量真实值和预测值的不同。

针对回归问题一个常见的损失函数是:mean absolute error or MAE
For each prediction y_pred, MAE measures the disparity from the true target y_true by an absolute difference abs(y_true - y_pred).

还有两个损失函数:mean-squared error (MSE) or the Huber loss (both available in Keras).

In other words, the loss function tells the network its objective.

随机梯度下降

The optimizer is an algorithm that adjusts the weights to minimize the loss.

Virtually all of the optimization algorithms used in deep learning belong to a family called stochastic gradient descent.

One step of training goes like this:

  1. Sample some training data and run it through the network to make predictions.
  2. Measure the loss between the predictions and the true values.
  3. Finally, adjust the weights in a direction that makes the loss smaller.

Then just do this over and over until the loss is as small as you like (or until it won't decrease any further.)

Each iteration's sample of training data is called a minibatch (or often just "batch"), while a complete round of the training data is called an epoch. The number of epochs you train for is how many times the network will see each training example.

Learning Rate and Batch Size
训练的时候,图中的线会慢慢的转动,因为在调整权重,但是什么控制着转动的快慢呢?是learning rate.A smaller learning rate means the network needs to see more minibatches before its weights converge to their best values.

影响随机梯度下降的最大因素:The learning rate and the size of the minibatches are the two parameters that have the largest effect on how the SGD training proceeds.

Their interaction is often subtle and the right choice for these parameters isn't always obvious.

幸运的是,对于大多数工作来说,不需要进行广泛的超参数搜索即可获得令人满意的结果。 Adam 是一种 SGD 算法,它具有自适应学习率,使其适用于大多数问题而无需任何参数调整(从某种意义上说,它是“自我调整”)。 Adam 是一个伟大的通用优化器。
Fortunately, for most work it won't be necessary to do an extensive hyperparameter search to get satisfactory results. Adam is an SGD algorithm that has an adaptive learning rate that makes it suitable for most problems without any parameter tuning (it is "self tuning", in a sense). Adam is a great general-purpose optimizer.

Adding the Loss and Optimizer
加损失函数:
model.compile(
optimizer="adam",
loss="mae",
)

有时候我们需要缩放特征,因为:neural networks tend to perform best when their inputs are on a common scale.

定义好模型之后,model.compile:
model.compile(
optimizer='adam',
loss='mae',
)

然后
history = model.fit(
X_train, y_train,
validation_data=(X_valid, y_valid),
batch_size=256,
epochs=10,
)

plot loss:

import pandas as pd

convert the training history to a dataframe

history_df = pd.DataFrame(history.history)

use Pandas native plot method

history_df['loss'].plot();

3) Evaluate Training

If you trained the model longer, would you expect the loss to decrease further?

This depends on how the loss has evolved during training: if the learning curves have levelled off, there won't usually be any advantage to training for additional epochs. Conversely, if the loss appears to still be decreasing, then training for longer could be advantageous.

With the learning rate and the batch size, you have some control over:

  • How long it takes to train a model
  • How noisy the learning curves are
  • How small the loss becomes
    You probably saw that smaller batch sizes gave noisier weight updates and loss curves. This is because each batch is a small sample of data and smaller samples tend to give noisier estimates.Smaller batches can have an "averaging" effect though which can be beneficial.

Smaller learning rates make the updates smaller and the training takes longer to converge. Large learning rates can speed up training, but don't "settle in" to a minimum as well. When the learning rate is too large, the training can fail completely. (Try setting the learning rate to a large value like 0.99 to see this.)

重点:
损失函数
model.compile(
optimizer='adam',
loss='mae',
)

相关文章

  • 深度学习

    零基础入门深度学习(1) - 感知器零基础入门深度学习(2) - 线性单元和梯度下降零基础入门深度学习(3) - ...

  • 深度学习基础教程

    深度学习基础教程 [tag]深度学习,机器学习,数据分析,挖掘,算法, [content]深度学习的入门基础。 [...

  • 深度学习基础2

    Stochastic Gradient Descent 如何训练神经网络 训练神经网络就是调整权重。 这里介绍了损...

  • 吴恩达 —— 深度学习 Course 1 笔记

    Course1:神经网络和深度学习,包括: [1] Week1:深度学习概述[2] Week2:神经网络基础[3]...

  • 深度学习:Ubuntu16.04+双TitanX+CUDA8.0

    本文基于深度学习基础平台环境,搭建深度学习基础平台请参考深度学习:Ubuntu16.04+双TitanX+CUDA...

  • 深度学习:Ubuntu16.04+双TitanX+CUDA8.0

    本文基于深度学习基础平台环境,搭建深度学习基础平台请参考深度学习:Ubuntu16.04+双TitanX+CUDA...

  • 浅谈语音识别基础

    承接前面的《浅谈机器学习基础》、《浅谈深度学习基础》和《浅谈自然语言处理基础》,主要参考了《解析深度学习:语音识别...

  • 2018

    2018学习目标 1.python为基础的人工智能,深度学习,数学基础。 2.python相关类库,flask,n...

  • Tensorflow基础

    Tensorflow基础 1. 深度学习介绍 机器学习与深度学习的区别 深度学习的算法本身设计复杂,数据量大,特征...

  • Pytorch_第六篇_深度学习 (DeepLearning)

    深度学习 (DeepLearning) 基础 [2]---神经网络常用的损失函数 Introduce 在上一篇“深...

网友评论

      本文标题:深度学习基础2

      本文链接:https://www.haomeiwen.com/subject/gbzmkrtx.html