1、数据的加载
MNIST数据集的加载:
In [1]: import tensorflow as tf
In [2]: (x,y),(x_test,y_test) = tf.keras.datasets.mnist.load_data()
In [3]: x.shape,y.shape
Out[3]: ((60000, 28, 28), (60000,))
In [4]: x.max(), x.min(), x.mean()
Out[4]: (255, 0, 33.318421449829934)
In [5]: y[:4]
Out[5]: array([5, 0, 4, 1], dtype=uint8)
In [6]: tf.one_hot(y[:4],depth=10)
Out[6]:
<tf.Tensor: id=4, shape=(4, 10), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>
CIFAR10数据集的加载:
In [7]: (x,y),(x_test,y_test) = tf.keras.datasets.cifar10.load_data()
In [8]: x.shape,y.shape
Out[8]: ((50000, 32, 32, 3), (50000, 1))
In [9]: y[:4]
Out[9]:
array([[6],
[9],
[9],
[4]], dtype=uint8)
In [10]: db = tf.data.Dataset.from_tensor_slices(x_test)
In [11]: next(iter(db)).shape
Out[11]: TensorShape([32, 32, 3])
In [12]: next(iter(db)).shape
Out[12]: TensorShape([32, 32, 3])
2、tf.data.Dataset.from_tensor_slices
In [13]: db = tf.data.Dataset.from_tensor_slices((x_test,y_test))
In [14]: next(iter(db))[0].shape,next(iter(db))[1].shape
Out[14]: (TensorShape([32, 32, 3]), TensorShape([1]))
shuffle:通过shuffle将原始的数据的顺序进行一个打乱的操作
In [15]: db = db.shuffle(10000)
batch:使用批量训练的时候,就能通过batch来控制一批一批的数据。
In [16]: db2 = db.batch(128)
In [17]: res = next(iter(db2))
In [18]: res[0],res[1]
Out[18]:(TensorShape([128, 32, 32, 3]), TensorShape([128, 1]))
map:通过map来对数据进行一个数据预处理的操作,这里是需要写上数据预处理的逻辑的,比如对数据进行类型的变化、维度的变化、或者维度的增加和减少等操作,这联系到对Tensor的一些操作了。
3、全连接层
一个简单的全连接层,也就是一层512个节点的全连接层,net就是一个对象,它的kernel属性就是我们的权值w,bias就是我们的权值b了。这里注意,是要喂入训练的样本之后才能调用到。不然就得采用另外一种方式。
In [20]: x = tf.random.normal([10, 784])
In [21]: x.shape
Out[21]: TensorShape([10, 784])
In [22]: net = tf.keras.layers.Dense(512)
In [23]: out = net(x)
In [24]: out.shape
Out[24]: TensorShape([10, 512])
In [25]: net.kernel.shape, net.bias.shape
Out[25]: (TensorShape([784, 512]), TensorShape([512]))
多层的神经网络结构
In [3]: x = tf.random.normal([3,4])
In [4]: model = Sequential([
...: layers.Dense(3, activation='relu'),
...: layers.Dense(3, activation='relu'),
...: layers.Dense(3)
...: ])
In [5]: model.build(input_shape=[None,3])
In [6]: model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) multiple 12
_________________________________________________________________
dense_1 (Dense) multiple 12
_________________________________________________________________
dense_2 (Dense) multiple 12
=================================================================
Total params: 36
Trainable params: 36
Non-trainable params: 0
_________________________________________________________________
In [7]: for i in model.trainable_variables:
...: print(i.name, i.shape)
...:
dense/kernel:0 (3, 3)
dense/bias:0 (3,)
dense_1/kernel:0 (3, 3)
dense_1/bias:0 (3,)
dense_2/kernel:0 (3, 3)
dense_2/bias:0 (3,)
激活函数:relu、sigmoid、softmax、tanh
In [8]: a = tf.linspace(-5.,5.,10)
In [9]: a
Out[9]:
<tf.Tensor: id=14, shape=(10,), dtype=float32, numpy=
array([-5. , -3.8888888 , -2.7777777 , -1.6666665 , -0.55555534,
0.5555558 , 1.666667 , 2.7777781 , 3.8888893 , 5. ],
dtype=float32)>
In [10]: tf.nn.relu(a)
Out[10]:
<tf.Tensor: id=16, shape=(10,), dtype=float32, numpy=
array([0. , 0. , 0. , 0. , 0. , 0.5555558,
1.666667 , 2.7777781, 3.8888893, 5. ], dtype=float32)>
In [11]: tf.nn.sigmoid(a)
Out[11]:
<tf.Tensor: id=18, shape=(10,), dtype=float32, numpy=
array([0.00669286, 0.02005756, 0.05853692, 0.15886909, 0.36457652,
0.6354236 , 0.841131 , 0.9414632 , 0.97994244, 0.9933072 ],
dtype=float32)>
In [12]: tf.nn.softmax(a)
Out[12]:
<tf.Tensor: id=20, shape=(10,), dtype=float32, numpy=
array([3.0455043e-05, 9.2514209e-05, 2.8103351e-04, 8.5370446e-04,
2.5933252e-03, 7.8778267e-03, 2.3930728e-02, 7.2695136e-02,
2.2082832e-01, 6.7081696e-01], dtype=float32)>
In [13]: tf.reduce_sum(tf.nn.softmax(a))
Out[13]: <tf.Tensor: id=24, shape=(), dtype=float32, numpy=1.0>
In [14]: tf.nn.tanh(a)
Out[14]:
<tf.Tensor: id=26, shape=(10,), dtype=float32, numpy=
array([-0.99990916, -0.9991625 , -0.99229795, -0.9311096 , -0.5046722 ,
0.5046726 , 0.93110967, 0.99229795, 0.9991625 , 0.99990916],
dtype=float32)>
误差计算:均方误差和交叉熵损失函数
In [15]: y = tf.constant([3,2,0,1,2])
In [16]: y = tf.cast(tf.one_hot(y,depth=4),dtype=tf.float32)
In [17]: y.shape
Out[17]: TensorShape([5, 4])
In [18]: y_pred = tf.random.normal([5,4])
In [19]: loss1 = tf.reduce_mean(tf.square(y-y_pred))
In [20]: loss2 = tf.square(tf.norm(y-y_pred))/(5*4)
In [21]: loss3 = tf.reduce_mean(tf.losses.MSE(y,y_pred))
In [22]: loss1, loss2, loss3
Out[22]:
(<tf.Tensor: id=42, shape=(), dtype=float32, numpy=2.3951304>,
<tf.Tensor: id=51, shape=(), dtype=float32, numpy=2.3951302>,
<tf.Tensor: id=56, shape=(), dtype=float32, numpy=2.3951306>)
In [23]: tf.losses.categorical_crossentropy(y,y_pred)
Out[23]:
<tf.Tensor: id=74, shape=(5,), dtype=float32, numpy=
array([16.118095 , 16.118095 , 2.406584 , 0.8261622, 0.4953419],
dtype=float32)>
In [24]: tf.reduce_mean(tf.losses.categorical_crossentropy(y,y_pred))
Out[24]: <tf.Tensor: id=92, shape=(), dtype=float32, numpy=7.192856>
In [25]: x = tf.random.normal([1,784])
In [26]: w = tf.random.normal([784,2])
In [27]: b = tf.zeros([2])
In [28]: logits = x@w + b
In [29]: logits
Out[29]: <tf.Tensor: id=16, shape=(1, 2), dtype=float32, numpy=array([[29.228487, -2.549611]], dtype=float32)>
In [30]: prob = tf.nn.softmax(logits,axis=1)
In [31]: prob
Out[31]: <tf.Tensor: id=18, shape=(1, 2), dtype=float32, numpy=array([[1.0000000e+00, 1.5810548e-14]], dtype=float32)>
In [32]: tf.losses.categorical_crossentropy([0,1], logits, from_logits=True)
Out[32]: <tf.Tensor: id=55, shape=(1,), dtype=float32, numpy=array([31.7781], dtype=float32)>
4、梯度相关
GradientTape:进行梯度的计算的时候,这里和tensorflow1.0里面的是不相同的,采用了GradientTape的方式来对相关的系数进行求梯度,然后这样的方式只能使用一次,要是在使用的话,需要添加参数persistent=True。
In [33]: with tf.GradientTape() as tape:
...: tape.watch([w,b])
...: logits = x@w + b
In [34]: grad1 = tape.gradient(logits, [w,b])
In [35]: grad1
Out[35]:
[<tf.Tensor: id=71, shape=(784, 2), dtype=float32, numpy=
array([[ 1.218403 , 1.218403 ],
[-0.40789095, -0.40789095],
[-0.04514167, -0.04514167],
...,
[ 1.5529867 , 1.5529867 ],
[-0.91804993, -0.91804993],
[-0.6200174 , -0.6200174 ]], dtype=float32)>,
<tf.Tensor: id=69, shape=(2,), dtype=float32, numpy=array([1., 1.], dtype=float32)>]
In [36]: grad2 = tape.gradient(logits.[w,b])
RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes.
知道如何求梯度之后,我们就能够去计算损失函数的梯度,也就是我们前向的传播求损失,反向传播求梯度。
MSE:
In [36]: x = tf.random.normal([2,4])
In [37]: w = tf.random.normal([4,3])
In [38]: b= tf.zeros([3])
In [39]: y = tf.constant([2,0])
In [40]: with tf.GradientTape() as tape:
...: tape.watch([w,b])
...: prob = tf.nn.softmax(x@w, axis=1)
...: loss = tf.reduce_mean(tf.losses.MSE(tf.one_hot(y,depth=3), prob))
In [41]: grads = tape.gradient(loss,[w,b])
In [42]: grads
Out[42]:
[<tf.Tensor: id=157, shape=(4, 3), dtype=float32, numpy=
array([[-0.02426479, 0.00430018, 0.0199646 ],
[ 0.01239181, 0.05346029, -0.06585211],
[ 0.02509922, -0.00778245, -0.01731675],
[-0.01928005, -0.02895365, 0.0482337 ]], dtype=float32)>, None]
交叉熵:
In [43]: with tf.GradientTape() as tape:
...: tape.watch([w,b])
...: logits = x@w+b
...: loss = tf.reduce_mean(tf.losses.categorical_crossentropy(tf.one_hot(y),logits,from_logits=True))
In [44]: grads = tape.gradient(loss,[w,b])
In [45]: grads
Out[45]:
[<tf.Tensor: id=89, shape=(4, 3), dtype=float32, numpy=
array([[-0.5620854 , 0.650493 , -0.08840758],
[-0.76033497, -0.05264251, 0.81297755],
[-0.5856365 , 0.17601593, 0.40962052],
[ 0.33403078, 0.29103735, -0.6250682 ]], dtype=float32)>,
<tf.Tensor: id=87, shape=(3,), dtype=float32, numpy=array([-0.4841027 , 0.43291122, 0.05119145], dtype=float32)>]
参考资料:
1、https://study.163.com/course/courseMain.htm?courseId=1209092816&share=1&shareId=1355397
2、https://github.com/dragen1860/TensorFlow-2.x-Tutorials
网友评论