Pytorch版-计算机视觉之四

作者: 深思海数_willschang | 来源:发表于2021-08-12 10:07 被阅读0次

Section 2 Object Classification and Detection

在第二篇章里我们主要通过对较复杂模型网络结构的学习去解决实际中复杂的问题，如图片分类，目标检测等。

Chapter 4 Introducing Convolutional Neural Networks

传统深度神经网络问题

基于第三章的所训练得到的模型，我们对测试数据中的一张图片的像素进行位移，看一下传统深度神经网络模型的识别率是否会受到影响。
即随机抽取一张图片，并对该图片进行相关像素位移

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.optim import Adam, SGD
from torchvision import datasets


device = 'cuda' if torch.cuda.is_available() else 'cpu'

data_folder = './data/FMNIST'
# 加载训练数据
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
tr_images = fmnist.data
tr_targets = fmnist.targets# 数据根目录

fig=plt.figure()
fig,axe=plt.subplots(2,5,figsize=(10,4))

ix = 24300
for tmp_plt, px in zip(axe.flatten(),range(-5,5)):
    img = tr_images[ix]/255.
    img = img.view(28, 28)
    img2 = np.roll(img, px, axis=1)
    tmp_plt.imshow(img2)
    
plt.tight_layout()
plt.show()

如下图所示：

image.png

上面截取前后6张图，可以发现裤子实现了水平移动。

现在我们将该图片通过之前训练的模型进行预测，通过结果热力图，我们可以发现在2个像素值的位移空间内，模型还是可以准确预测，但超过2个像素值后，预测结果就不确定了。

image.png

示例代码（jupyter notebook）

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from torch.optim import Adam, SGD
from torchvision import datasets

device = 'cuda' if torch.cuda.is_available() else 'cpu'


# 数据根目录
data_folder = './data/FMNIST'
# 加载训练数据
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
tr_images = fmnist.data
tr_targets = fmnist.targets
# 加载验证数据
val_fmnist = datasets.FashionMNIST(data_folder, download=True, train=False)
val_images = val_fmnist.data
val_targets = val_fmnist.targets


# 构建数据加载器
class FMNISTDataset(Dataset):
    def __init__(self, x, y):
        x = x.float()/255
        x = x.view(-1,28*28)
        self.x, self.y = x, y 
    def __getitem__(self, ix):
        x, y = self.x[ix], self.y[ix]        
        return x.to(device), y.to(device)
    def __len__(self): 
        return len(self.x)

# 构建网络模型，损失函数，优化器
def get_model():
    model = nn.Sequential(
        nn.Linear(28 * 28, 1000),
        nn.ReLU(),
        nn.Linear(1000, 10)
    ).to(device)

    loss_fn = nn.CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=1e-3)
    return model, loss_fn, optimizer

# 批量训练
def train_batch(x, y, model, opt, loss_fn):
    prediction = model(x)
    batch_loss = loss_fn(prediction, y)
    batch_loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    return batch_loss.item()

# 模型评测
def accuracy(x, y, model):
    with torch.no_grad():
        prediction = model(x)
    max_values, argmaxes = prediction.max(-1)
    is_correct = argmaxes == y
    return is_correct.cpu().numpy().tolist()


def get_data():     
    train = FMNISTDataset(tr_images, tr_targets)     
    trn_dl = DataLoader(train, batch_size=32, shuffle=True)
    val = FMNISTDataset(val_images, val_targets)     
    val_dl = DataLoader(val, batch_size=len(val_images), shuffle=True)
    return trn_dl, val_dl


def val_loss(x, y, model):
    with torch.no_grad():
        prediction = model(x)
    val_loss = loss_fn(prediction, y)
    return val_loss.item()


trn_dl, val_dl = get_data()
model, loss_fn, optimizer = get_model()

train_losses, train_accuracies = [], []
val_losses, val_accuracies = [], []
for epoch in range(5):
    print(epoch)
    train_epoch_losses, train_epoch_accuracies = [], []
    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        batch_loss = train_batch(x, y, model, optimizer, loss_fn)
        train_epoch_losses.append(batch_loss)        
    train_epoch_loss = np.array(train_epoch_losses).mean()

    for ix, batch in enumerate(iter(trn_dl)):
        x, y = batch
        is_correct = accuracy(x, y, model)
        train_epoch_accuracies.extend(is_correct)
    train_epoch_accuracy = np.mean(train_epoch_accuracies)

    for ix, batch in enumerate(iter(val_dl)):
        x, y = batch
        val_is_correct = accuracy(x, y, model)
        validation_loss = val_loss(x, y, model)
    val_epoch_accuracy = np.mean(val_is_correct)

    train_losses.append(train_epoch_loss)
    train_accuracies.append(train_epoch_accuracy)
    val_losses.append(validation_loss)
    val_accuracies.append(val_epoch_accuracy)

ix = 24300
plt.imshow(tr_images[ix], cmap='gray')
plt.title(fmnist.classes[tr_targets[ix]])


img = tr_images[ix]/255.
img = img.view(28*28)
img = img.to(device)

np_output = model(img).cpu().detach().numpy()
np.exp(np_output)/np.sum(np.exp(np_output))

"""
array([2.3472281e-05, 9.9997413e-01, 3.2075516e-08, 2.0036161e-06,
       1.7420588e-08, 4.2372212e-13, 3.5205579e-07, 5.9302212e-19,
       6.5659894e-10, 4.1716305e-14], dtype=float32)
"""

print(tr_targets[ix])

preds = []
for px in range(-5,6):
  img = tr_images[ix]/255.
  img = img.view(28, 28)
  #img2 = np.zeros((28,28))
  img2 = np.roll(img, px, axis=1)
  plt.imshow(img2)
  plt.show()
  img3 = torch.Tensor(img2).view(28*28).to(device)
  np_output = model(img3).cpu().detach().numpy()
  preds.append(np.exp(np_output)/np.sum(np.exp(np_output)))

import seaborn as sns
fig, ax = plt.subplots(1,1, figsize=(12,10))
plt.title('Probability of each class for various translations')
sns.heatmap(np.array(preds), annot=True, ax=ax, fmt='.2f', xticklabels=fmnist.classes,
            yticklabels=[str(i)+str(' pixels') for i in range(-5,6)], cmap='gray')

卷积神经网络 CNN（Convolution Neural Network）

CNN在计算机视觉领域（图片分类，目标检测，图像分割，GANs）还是比较重要的一个结构，具有局部连接，权值共享等优势。
接下来主要通过以下几点来了解CNN：（这些基础概念会在后续专门专题来记录，这里只做简单记录）

卷积 Convolutions

A convolution is basically multiplication between two matrices.
过滤器 Filters

A filter is a matrix of weights that is initialized randomly at the start.
The model learns the optimal weight values of a filter over increasing epochs.

In general, the more filters there are in a CNN, the more features of an image that the model can learn about.

Filters

步长和填充 Strides and padding
卷积核移动的步长，原始图片边缘填充。一个是速度问题，一个是信息损失补偿问题。
池化 Pooling

Pooling aggregates information in a small patch.
最终可实现减少计算参数量。

一个完整的CNN网络模型，如下图所示

CNN

网友评论

本文标题：Pytorch版-计算机视觉之四

本文链接：https://www.haomeiwen.com/subject/znokbltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！