目的:使模型(e.g., MLP)不会退化成线性模型
ReLu
修正线性单元(Rectified linear unit,ReLu)
data:image/s3,"s3://crabby-images/313f5/313f5fae029c112bf65f0fe216fd0ac06353935b" alt=""
实现:
%matplotlib inline
import torch
from d2l import torch as d2l
# ReLu函数图
x = torch.arange(-8.0, 8.0, 0.1, requires_grad= True)
y = torch.relu(x)
d2l.plot(x.detach(), y.detach(), 'x', 'relu(x)', figsize = (5, 2.5))
data:image/s3,"s3://crabby-images/ee5df/ee5df40fbec00962f5601eb63670a09a34b803ff" alt=""
当输入为负时,ReLU函数的导数为0,而当输入为正时,ReLU函数的导数为1。 注意,当输入值精确等于0时,ReLU函数不可导。 在此时,我们默认使用左侧的导数,即当输入为0时导数为0。 我们可以忽略这种情况,因为输入可能永远都不会是0。
y.backward(torch.ones_like(x), retain_graph = True) #返回一个用1填充的张量,其大小与输入相同。
# 进行一次backward之后,各个节点的值会清除,这样进行第二次backward会报错,如果加上retain_graph==True后,计算节点中间值不会被释放,可以再来一次backward。
d2l.plot(x.detach(), x.grad, 'x','grad of relu', figsize=(5,2.5))
data:image/s3,"s3://crabby-images/60d08/60d08aad178d56a82c2a0de541a1593c8ce45a40" alt=""
PReLU
y = torch.prelu(x, torch.tensor([0.25]))
d2l.plot(x.detach(), y.detach(), 'x', 'prelu(x)', figsize = (5, 2.5))
data:image/s3,"s3://crabby-images/84dd2/84dd2ca1733282d3618ca9a2343a3f6097cd05ff" alt=""
PReLU函数梯度图
x.grad.data.zero_()
y.backward(torch.ones_like(x), retain_graph=True)
d2l.plot(x.detach(), x.grad, 'x', 'grad of prelu', figsize = (5, 2.5))
data:image/s3,"s3://crabby-images/4b9bf/4b9bf1ffed7c4f0986f0408b2664541cd2407bc2" alt=""
sigmoid
data:image/s3,"s3://crabby-images/e7571/e7571e395a948626e591c5afcbcc02b4d29fde97" alt=""
y = torch.sigmoid(x)
d2l.plot(x.detach(), y.detach(), 'x', 'sigmoid(x)', figsize = (5, 2.5))
data:image/s3,"s3://crabby-images/2738e/2738e40aa5fa785c77767547313947a8cf75f694" alt=""
sigmoid函数的导数
# 清除以前的梯度
x.grad.data.zero_()
y.backward(torch.ones_like(x), retain_graph = True)
d2l.plot(x.detach(), x.grad, 'x', 'grad of sigmoid', figsize = (5,2.5))
data:image/s3,"s3://crabby-images/d7b9b/d7b9bb6dc91261fcaf0384184f8087dd2e829b6d" alt=""
data:image/s3,"s3://crabby-images/f78dd/f78dd54742009747266ba2b40e06c3e744732da1" alt=""
tanh函数
data:image/s3,"s3://crabby-images/96228/96228f1932d32afe9bfd0e6a4438d85d44c8ca59" alt=""
注意,当输入在0附近时,tanh函数接近线性变换。 函数的形状类似于sigmoid函数, 不同的是tanh函数关于坐标系原点中心对称。
y = torch.tanh(x)
d2l.plot(x.detach(), y.detach(), 'x', 'tanh(x)', figsize = (5, 2.5))
data:image/s3,"s3://crabby-images/3b797/3b7970ba94abd0ba1762afeb8ae4246b7e88718f" alt=""
tanh函数的导数:
# 清楚以前的梯度
x.grad.data.zero_()
y.backward(torch.ones_like(x), retain_graph = True)
d2l.plot(x.detach(), x.grad, 'x', 'grad of tanh', figsize = (5, 2.5))
data:image/s3,"s3://crabby-images/8b5e4/8b5e4403e13dbbc8e0527e0b1d436fca612333ab" alt=""
data:image/s3,"s3://crabby-images/17a31/17a310674bce7ae640b25f5b411ca5d70783c726" alt=""
网友评论