美文网首页
Pytorch版-计算机视觉之四

Pytorch版-计算机视觉之四

作者: 深思海数_willschang | 来源:发表于2021-08-12 10:07 被阅读0次

    Section 2 Object Classification and Detection

    在第二篇章里我们主要通过对较复杂模型网络结构的学习去解决实际中复杂的问题,如图片分类,目标检测等。

    Chapter 4 Introducing Convolutional Neural Networks

    传统深度神经网络问题

    基于第三章的所训练得到的模型,我们对测试数据中的一张图片的像素进行位移,看一下传统深度神经网络模型的识别率是否会受到影响。
    即随机抽取一张图片,并对该图片进行相关像素位移

    %matplotlib inline
    import matplotlib.pyplot as plt
    import numpy as np
    import torch
    import torch.nn as nn
    from torch.utils.data import Dataset, DataLoader
    from torch.optim import Adam, SGD
    from torchvision import datasets
    
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    data_folder = './data/FMNIST'
    # 加载训练数据
    fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
    tr_images = fmnist.data
    tr_targets = fmnist.targets# 数据根目录
    
    fig=plt.figure()
    fig,axe=plt.subplots(2,5,figsize=(10,4))
    
    ix = 24300
    for tmp_plt, px in zip(axe.flatten(),range(-5,5)):
        img = tr_images[ix]/255.
        img = img.view(28, 28)
        img2 = np.roll(img, px, axis=1)
        tmp_plt.imshow(img2)
        
    plt.tight_layout()
    plt.show()
    
    

    如下图所示:

    image.png

    上面截取前后6张图,可以发现裤子实现了水平移动。

    现在我们将该图片通过之前训练的模型进行预测,通过结果热力图,我们可以发现在2个像素值的位移空间内,模型还是可以准确预测,但超过2个像素值后,预测结果就不确定了。

    image.png

    示例代码(jupyter notebook)

    %matplotlib inline
    import matplotlib.pyplot as plt
    import numpy as np
    import torch
    import torch.nn as nn
    from torch.utils.data import Dataset, DataLoader
    from torch.optim import Adam, SGD
    from torchvision import datasets
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    
    # 数据根目录
    data_folder = './data/FMNIST'
    # 加载训练数据
    fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)
    tr_images = fmnist.data
    tr_targets = fmnist.targets
    # 加载验证数据
    val_fmnist = datasets.FashionMNIST(data_folder, download=True, train=False)
    val_images = val_fmnist.data
    val_targets = val_fmnist.targets
    
    
    # 构建数据加载器
    class FMNISTDataset(Dataset):
        def __init__(self, x, y):
            x = x.float()/255
            x = x.view(-1,28*28)
            self.x, self.y = x, y 
        def __getitem__(self, ix):
            x, y = self.x[ix], self.y[ix]        
            return x.to(device), y.to(device)
        def __len__(self): 
            return len(self.x)
    
    # 构建网络模型,损失函数,优化器
    def get_model():
        model = nn.Sequential(
            nn.Linear(28 * 28, 1000),
            nn.ReLU(),
            nn.Linear(1000, 10)
        ).to(device)
    
        loss_fn = nn.CrossEntropyLoss()
        optimizer = Adam(model.parameters(), lr=1e-3)
        return model, loss_fn, optimizer
    
    # 批量训练
    def train_batch(x, y, model, opt, loss_fn):
        prediction = model(x)
        batch_loss = loss_fn(prediction, y)
        batch_loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        return batch_loss.item()
    
    # 模型评测
    def accuracy(x, y, model):
        with torch.no_grad():
            prediction = model(x)
        max_values, argmaxes = prediction.max(-1)
        is_correct = argmaxes == y
        return is_correct.cpu().numpy().tolist()
    
    
    def get_data():     
        train = FMNISTDataset(tr_images, tr_targets)     
        trn_dl = DataLoader(train, batch_size=32, shuffle=True)
        val = FMNISTDataset(val_images, val_targets)     
        val_dl = DataLoader(val, batch_size=len(val_images), shuffle=True)
        return trn_dl, val_dl
    
    
    def val_loss(x, y, model):
        with torch.no_grad():
            prediction = model(x)
        val_loss = loss_fn(prediction, y)
        return val_loss.item()
    
    
    trn_dl, val_dl = get_data()
    model, loss_fn, optimizer = get_model()
    
    train_losses, train_accuracies = [], []
    val_losses, val_accuracies = [], []
    for epoch in range(5):
        print(epoch)
        train_epoch_losses, train_epoch_accuracies = [], []
        for ix, batch in enumerate(iter(trn_dl)):
            x, y = batch
            batch_loss = train_batch(x, y, model, optimizer, loss_fn)
            train_epoch_losses.append(batch_loss)        
        train_epoch_loss = np.array(train_epoch_losses).mean()
    
        for ix, batch in enumerate(iter(trn_dl)):
            x, y = batch
            is_correct = accuracy(x, y, model)
            train_epoch_accuracies.extend(is_correct)
        train_epoch_accuracy = np.mean(train_epoch_accuracies)
    
        for ix, batch in enumerate(iter(val_dl)):
            x, y = batch
            val_is_correct = accuracy(x, y, model)
            validation_loss = val_loss(x, y, model)
        val_epoch_accuracy = np.mean(val_is_correct)
    
        train_losses.append(train_epoch_loss)
        train_accuracies.append(train_epoch_accuracy)
        val_losses.append(validation_loss)
        val_accuracies.append(val_epoch_accuracy)
    
    ix = 24300
    plt.imshow(tr_images[ix], cmap='gray')
    plt.title(fmnist.classes[tr_targets[ix]])
    
    
    img = tr_images[ix]/255.
    img = img.view(28*28)
    img = img.to(device)
    
    np_output = model(img).cpu().detach().numpy()
    np.exp(np_output)/np.sum(np.exp(np_output))
    
    """
    array([2.3472281e-05, 9.9997413e-01, 3.2075516e-08, 2.0036161e-06,
           1.7420588e-08, 4.2372212e-13, 3.5205579e-07, 5.9302212e-19,
           6.5659894e-10, 4.1716305e-14], dtype=float32)
    """
    
    print(tr_targets[ix])
    
    preds = []
    for px in range(-5,6):
      img = tr_images[ix]/255.
      img = img.view(28, 28)
      #img2 = np.zeros((28,28))
      img2 = np.roll(img, px, axis=1)
      plt.imshow(img2)
      plt.show()
      img3 = torch.Tensor(img2).view(28*28).to(device)
      np_output = model(img3).cpu().detach().numpy()
      preds.append(np.exp(np_output)/np.sum(np.exp(np_output)))
    
    import seaborn as sns
    fig, ax = plt.subplots(1,1, figsize=(12,10))
    plt.title('Probability of each class for various translations')
    sns.heatmap(np.array(preds), annot=True, ax=ax, fmt='.2f', xticklabels=fmnist.classes,
                yticklabels=[str(i)+str(' pixels') for i in range(-5,6)], cmap='gray')
    

    卷积神经网络 CNN(Convolution Neural Network)

    CNN在计算机视觉领域(图片分类,目标检测,图像分割,GANs)还是比较重要的一个结构,具有局部连接,权值共享等优势。
    接下来主要通过以下几点来了解CNN:(这些基础概念会在后续专门专题来记录,这里只做简单记录)

    • 卷积 Convolutions

      A convolution is basically multiplication between two matrices.

    • 过滤器 Filters

    A filter is a matrix of weights that is initialized randomly at the start.
    The model learns the optimal weight values of a filter over increasing epochs.

    In general, the more filters there are in a CNN, the more features of an image that the model can learn about.

    Filters
    • 步长和填充 Strides and padding
      卷积核移动的步长,原始图片边缘填充。一个是速度问题,一个是信息损失补偿问题。
    • 池化 Pooling

    Pooling aggregates information in a small patch.
    最终可实现减少计算参数量。

    一个完整的CNN网络模型,如下图所示

    CNN

    相关文章

      网友评论

          本文标题:Pytorch版-计算机视觉之四

          本文链接:https://www.haomeiwen.com/subject/znokbltx.html