在创建神经网络模型前,探索数据集是一个必须的工作,包括:
- 数据集的类型、格式
- 数据集的均值和标准差
范例1:了解数据类型、格式和通道顺序
import paddle
from paddle.vision.datasets import Cifar10
data_train = Cifar10(mode='train', backend='cv2')
img, label = data_train[0]
print(type(img), img.shape, type(label), label)
<class 'numpy.ndarray'> (32, 32, 3) <class 'numpy.ndarray'> 0
范例2:了解数据集的均值和标准差
import paddle
from paddle.vision.datasets import Cifar10
data_train = Cifar10(mode='train', backend='cv2')
imgs=[]
l = len(data_train)
for i in range(l):
imgs.append(data_train[i][0])
imgs = np.array(imgs)
print(imgs.shape)
imgs_r = imgs[:,:,:,0]
imgs_r_mean = np.mean(imgs_r)
imgs_r_std = np.std(imgs_r)
print(f"r mean:{imgs_r_mean}, std:{imgs_r_std}")
imgs_g = imgs[:,:,:,1]
imgs_g_mean = np.mean(imgs_g)
imgs_g_std = np.std(imgs_g)
print(f"g mean:{imgs_g_mean}, std:{imgs_g_std}")
imgs_b = imgs[:,:,:,2]
imgs_b_mean = np.mean(imgs_b)
imgs_b_std = np.std(imgs_b)
print(f"b mean:{imgs_b_mean}, std:{imgs_b_std}")
(50000, 32, 32, 3)
r mean:125.30689239501953, std:62.99320983886719
g mean:122.9505386352539, std:62.0887565612793
b mean:113.86553955078125, std:66.70484924316406
由此可以得到数据集的平均值和标准差:
mean = [125.31, 122.95, 113.86]
std = [62.99, 62.08, 66.7]
网友评论