知识点
-
卷积运算
图像处理中的卷积运算与数学定义中的卷积不太一样,更准确的是指数学中的互相关运算。输入二矩阵通过每个位置上,卷积核与该位置处的输入子数组按元素相乘并求和,得到输出数组中相应位置的元素,该过程称为卷积运算。原理如下图:
卷积运算 -
卷积层
由每组m个,n组卷积核构成的网络层,其中m代表图片的通道数,n决定了卷积层输出的通道数,其中的核是要学习的参数,输入通过与卷积核进行互相关运算加上一个偏置,再通过激活函数输出到下一层:
卷积层 -
填充和步幅
填充(padding)是指在输入高和宽的两侧填充元素。
卷积核在输入数组上滑动,每次滑动的行数与列数即是步幅(stride)。
-
1×1卷积核
1×1 卷积核可在不改变高宽的情况下,调整通道数。1×1卷积核不识别高和宽维度上相邻元素构成的模式,其主要计算发生在通道维上。如上图所示。
-
池化层
池化层主要用于缓解卷积层对位置的过度敏感性,通常使用最大池化层或者平均池化层。池化层直接计算池化窗口内元素的最大值或者平均值,该运算也分别叫做最大池化或平均池化,如下图所示:
池化层
从零开始实现
import torch
import numpy as np
import torch.nn as nn
#二维卷积运算
def corr2d(X, K):
H, W = X.shape
h, w = K.shape
Y = torch.zeros(H - h + 1, W - w + 1)
for i in range(Y.shape[0]):
for j in range(Y.shape[1]):
Y[i, j] = (X[i: i + h, j: j + w] * K).sum()
return Y
X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
K = torch.tensor([[0, 1], [2, 3]])
Y = corr2d(X, K)
print(Y)
#二维卷积层
class Conv2D(nn.Module):
def __init__(self, kernel_size):
super(Conv2D, self).__init__()
self.weight = nn.Parameter(torch.randn(kernel_size))
self.bias = nn.Parameter(torch.randn(1))
def forward(self, x):
return corr2d(x, self.weight) + self.bias
# 学习一个检测边缘的卷积核
X = torch.ones(6, 8)
Y = torch.zeros(6, 7)
X[:, 2: 6] = 0
Y[:, 1] = 1
Y[:, 5] = -1
print(X)
print(Y)
#开始学习检测器
conv2d = Conv2D(kernel_size=(1, 2))
step = 100
lr = 0.01
for i in range(step):
Y_hat = conv2d(X)
l = ((Y_hat - Y) ** 2).sum()
l.backward()
# 梯度下降
conv2d.weight.data -= lr * conv2d.weight.grad
conv2d.bias.data -= lr * conv2d.bias.grad
# 梯度清零
conv2d.weight.grad.zero_()
conv2d.bias.grad.zero_()
if (i + 1) % 5 == 0:
print('Step %d, loss %.3f' % (i + 1, l.item()))
print(conv2d.weight.data)
print(conv2d.bias.data)
简洁实现
#卷积层的简洁实现
'''
使用Pytorch中的nn.Conv2d类来实现二维卷积层,主要关注以下几个构造函数参数:
in_channels (python:int) – Number of channels in the input imag
out_channels (python:int) – Number of channels produced by the convolution
kernel_size (python:int or tuple) – Size of the convolving kernel
stride (python:int or tuple, optional) – Stride of the convolution. Default: 1
padding (python:int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
'''
X = torch.rand(4, 2, 3, 5)
print(X.shape)
conv2d = nn.Conv2d(in_channels=2, out_channels=3, kernel_size=(3, 5), stride=1, padding=(1, 2))
Y = conv2d(X)
print('Y.shape: ', Y.shape)
print('weight.shape: ', conv2d.weight.shape)
print('bias.shape: ', conv2d.bias.shape)
#池化层的简洁实现
'''
使用Pytorch中的nn.MaxPool2d实现最大池化层,主要有以下构造函数参数:
kernel_size – the size of the window to take a max over
stride – the stride of the window. Default value is kernel_size
padding – implicit zero padding to be added on both sides
'''
X = torch.arange(32, dtype=torch.float32).view(1, 2, 4, 4)
pool2d = nn.MaxPool2d(kernel_size=3, padding=1, stride=(2, 1))
Y = pool2d(X)
print(X)
print(Y)
网友评论