AlexNet是ILSVRC 2012的图像分类项目的第一名, 它具有6000万个参数和65万个神经元,基于两块GTX 580 3GB GPU,花了五到六天的时间来训练。本文基于一块GTX-1080Ti GPU,在Cifar10数据集上实现并训练AlexNet,大约只需要30分钟,大家可以感受到技术的进步。
AlexNet的结构图如下所示: From https://learnopencv.com/understanding-alexnet/根据上述结构图,用PaddlePaddle实现其网络范例如下:
import paddle
import paddle.nn.functional as F # 组网相关的函数,如conv2d, relu...
import numpy as np
from paddle.nn.layer.common import Dropout
from paddle.vision.transforms import Compose, Resize, Transpose, Normalize, ToTensor
from paddle.vision.datasets import Cifar10
# 构建AlexNet 网络
# Sequential:顺序容器,子Layer将按构造函数参数的顺序添加到此容器中,传递给构造函数的参数可以Layers或可迭代的name Layer元组
from paddle.nn import Sequential, Conv2D, ReLU, MaxPool2D, Linear, Dropout, Flatten
class AlexNet(paddle.nn.Layer):
def __init__(self, num_classes=10):
super().__init__()
self.conv_relu_pool1 = Sequential(
Conv2D(3,96,11,4,0),
ReLU(),
MaxPool2D(3,2))
self.conv_relu_pool2 = Sequential(
Conv2D(96,256,5,1,2),
ReLU(),
MaxPool2D(3,2))
self.conv_relu3 = Sequential(
Conv2D(256,384,3,1,1),
ReLU())
self.conv_relu4 = Sequential(
Conv2D(384,384,3,1,1),
ReLU())
self.conv_relu_pool5 = Sequential(
Conv2D(384,256,3,1,1),
ReLU(),
MaxPool2D(3,2))
self.fc = Sequential(
Linear(256*6*6, 4096),
ReLU(),
Dropout(0.5),
Linear(4096,4096),
ReLU(),
Dropout(0.5),
Linear(4096,num_classes))
self.flatten = Flatten()
def forward(self,x):
x = self.conv_relu_pool1(x)
x = self.conv_relu_pool2(x)
x = self.conv_relu3(x)
x = self.conv_relu4(x)
x = self.conv_relu_pool5(x)
x = self.flatten(x)
x = self.fc(x)
return x
alex_net = AlexNet(num_classes=10)
model = paddle.Model(alex_net)
from paddle.static import InputSpec
input = InputSpec([None, 3, 227, 227], 'float32', 'image')
label = InputSpec([None, 1], 'int64', 'label')
model = paddle.Model(alex_net, input, label)
model.summary()
Layer (type) Input Shape Output Shape Param #
Conv2D-1 [[1, 3, 227, 227]] [1, 96, 55, 55] 34,944
ReLU-1 [[1, 96, 55, 55]] [1, 96, 55, 55] 0
MaxPool2D-1 [[1, 96, 55, 55]] [1, 96, 27, 27] 0
Conv2D-2 [[1, 96, 27, 27]] [1, 256, 27, 27] 614,656
ReLU-2 [[1, 256, 27, 27]] [1, 256, 27, 27] 0
MaxPool2D-2 [[1, 256, 27, 27]] [1, 256, 13, 13] 0
Conv2D-3 [[1, 256, 13, 13]] [1, 384, 13, 13] 885,120
ReLU-3 [[1, 384, 13, 13]] [1, 384, 13, 13] 0
Conv2D-4 [[1, 384, 13, 13]] [1, 384, 13, 13] 1,327,488
ReLU-4 [[1, 384, 13, 13]] [1, 384, 13, 13] 0
Conv2D-5 [[1, 384, 13, 13]] [1, 256, 13, 13] 884,992
ReLU-5 [[1, 256, 13, 13]] [1, 256, 13, 13] 0
MaxPool2D-3 [[1, 256, 13, 13]] [1, 256, 6, 6] 0
Flatten-1 [[1, 256, 6, 6]] [1, 9216] 0
Linear-1 [[1, 9216]] [1, 4096] 37,752,832
ReLU-6 [[1, 4096]] [1, 4096] 0
Dropout-1 [[1, 4096]] [1, 4096] 0
Linear-2 [[1, 4096]] [1, 4096] 16,781,312
ReLU-7 [[1, 4096]] [1, 4096] 0
Dropout-2 [[1, 4096]] [1, 4096] 0
Linear-3 [[1, 4096]] [1, 10] 40,970
===========================================
Total params: 58,322,314
Trainable params: 58,322,314
Non-trainable params: 0
===========================================
Input size (MB): 0.59
Forward/backward pass size (MB): 11.11
Params size (MB): 222.48
Estimated Total Size (MB): 234.18
============================================
训练代码如下:
# Compose: 以列表的方式组合数据集预处理功能
# Resize: 调整图像大小
# Transpose: 调整通道顺序, eg, HWC(img) -> CHW(NN)
# Normalize: 对图像数据归一化
# ToTensor: 将 PIL.Image 或 numpy.ndarray 转换成 paddle.Tensor
# cifar10 手动计算均值和标准差:mean = [125.31, 122.95, 113.86] 和 std = [62.99, 62.08, 66.7] link:https://www.jianshu.com/p/a3f3ffc3cac1
t = Compose([Resize(size=227),
Normalize(mean=[125.31, 122.95, 113.86], std=[62.99, 62.08, 66.7], data_format='HWC'),
Transpose(order=(2,0,1)),
ToTensor(data_format='HWC')])
train_dataset = Cifar10(mode='train', transform=t, backend='cv2')
test_dataset = Cifar10(mode='test', transform=t, backend='cv2')
BATCH_SIZE = 256
train_loader = paddle.io.DataLoader(train_dataset, shuffle=True, batch_size=BATCH_SIZE)
test_loader = paddle.io.DataLoader(test_dataset, batch_size=BATCH_SIZE)
# 为模型训练做准备,设置优化器,损失函数和精度计算方式
learning_rate = 0.0001
loss_fn = paddle.nn.CrossEntropyLoss()
opt = paddle.optimizer.Adam(learning_rate=learning_rate, parameters=model.parameters())
model.prepare(optimizer=opt, loss=loss_fn, metrics=paddle.metric.Accuracy())
# 启动模型训练,指定训练数据集,设置训练轮次,设置每次数据集计算的批次大小,设置日志格式
model.fit(train_loader, batch_size=256, epochs=20, verbose=1)
model.evaluate(test_loader, verbose=1)
训练结果:在测试数据集上,精度可以达到78.79%
Epoch 20/20
step 196/196 [==============================] - loss: 0.0318 - acc: 0.9822 - 748ms/step
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 40/40 [==============================] - loss: 0.9298 - acc: 0.7879 - 641ms/step
心得:深度学习图像分类技术已经非常成熟,直接用PaddlePaddle框架的高层API实现即可。目标检测网络,由于需要合并Loss函数,训练过程需要动手实现,所以使用普通API函数。
网友评论