tensorflow2.0(4)-前向传播计算

作者: copain_sir | 来源:发表于2019-12-11 10:47 被阅读0次

在网络的前向计算中，我们都可以用 $y = x*w + b$ 的形式去描述，此文介绍如何用tensorflow2.0计算网络中的前向计算和参数更新

前向计算

tf.constant() and tf.Variable()

tf2中有两种创建张量的方式，分别为tf.constant()和tf.Variable()

tf.constant(): 是创建一个常量数值/列表tensor，当然创建后值是不可变的，一般定义网络中不需要更新的参数。
tf.Variable(): 类型在普通的张量类型基础上添加了 name，trainable 等属性来支持计算图的构建，对于需要计算梯度并优化的张量，要通过此函数封装

简单的试验即可窥探两者的关系

x = tf.constant(1.)
<tf.Tensor: id=5732, shape=(), dtype=float32, numpy=1.0>

y = tf.Varible(1.)
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=1.0>
print(y.name, y.trainable)
Variable:0 True

在tf.Variable中，trainable参数默认是True，这里我们便可以根据情况手动设置True/False
constant也可以转换到Variable

x = tf.constant(1.)
<tf.Tensor: id=5732, shape=(), dtype=float32, numpy=1.0>

y = tf.Variable(x)
<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=1.0>

对于 $w$ 和 $b$ ，用tf.Variable定义，数据依然使用fashion_mnist

# 数据准备
x, y = keras.datasets.fashion_mnist.load_data()[0]  
x = tf.reshape(x, [-1, 28 * 28])   # 转换格式
x = tf.cast(x, tf.float32)
x = x / 255.
y = tf.one_hot(y, depth=10)  # 转换成独热编码

# 参数定义
w1 = tf.Variable(tf.random.truncated_normal([784, 128], stddev=0.1))
b1 = tf.Variable(tf.zeros([128]))

w2 = tf.Variable(tf.random.truncated_normal([128, 10], stddev=0.1))
b2 = tf.Variable(tf.zeros([10]))

tf.GradientTape()

在使用 TensorFlow2自动求导功能计算梯度时，需要将前向计算过程放置在 tf.GradientTape()环境中，从而利用 GradientTape 对象的 gradient()方法自动求解参数的梯度，并利用 optimizers 对象更新参数,形式如下

with tf.GradientTape() as tape:
    h1 = x@w1 + b1
    h1 = tf.nn.relu(h1)
    
    out = h1@w2 +b2
    
    loss = tf.square(y - out)
    loss = tf.reduce_mean(loss)

grads = tape.gradient(loss, [w1, b1, w2, b2])
print(grads)

[<tf.Tensor: id=5731, shape=(784, 128), dtype=float32, numpy=
 array([[-1.6100607e-03,  6.9914912e-03,  5.4573239e-04, ...,
          1.1394346e-02,  6.7010368e-03, -4.5570749e-04],
        [-4.6598859e-02,  8.9481764e-02,  1.0636392e-02, ...,
          8.8637583e-02,  6.8306737e-02,  3.1169101e-03],
        [-2.4604234e-01,  3.0288315e-01,  1.2252450e-02, ...,
          5.1393658e-01,  2.8754672e-01, -1.1293454e-02],
        ...,
        [-1.7132786e+01,  2.6959139e+01,  1.4552463e+00, ...,
          3.1937271e+01,  3.0912199e+01, -6.0341411e+00],
        [-5.6701980e+00,  7.6560483e+00,  5.6733060e-01, ...,
          8.5440788e+00,  9.5710907e+00, -1.1315466e+00],
        [-5.4063934e-01,  6.1388338e-01,  9.2204645e-02, ...,
          6.7820299e-01,  8.7314850e-01, -1.0959221e-01]], dtype=float32)>,
 <tf.Tensor: id=5730, shape=(128,), dtype=float32, numpy=
 array([-7.42730665e+00,  1.12210531e+01,  1.39284635e+00,  1.01494350e+01,
        -8.07421327e-01,  1.23828259e+01, -6.74541807e+00,  4.68487740e+00,
         2.48209095e+00,  6.72132134e-01,  1.45874703e+00,  3.87370616e-01,
         4.15165663e+00, -2.76716614e+00, -1.40455708e-01,  2.00560951e+00,
         2.68204361e-01, -9.28761959e+00, -8.48146820e+00,  4.43407488e+00,
         2.67940640e+00,  1.73321190e+01, -9.16018337e-02, -5.43434918e-01,
         2.07012024e+01,  2.75893402e+00,  1.56505895e+00,  2.19261336e+00,
         1.77217662e+00, -6.82915926e+00,  1.27866373e+01, -3.91614413e+00,
        -6.03287840e+00, -7.77332306e-01,  7.70938247e-02,  2.78442907e+00,
        -4.03842735e+00,  1.42863894e+00,  6.28497660e-01, -2.08575636e-01,
         1.29537325e+01,  1.57910907e+00,  6.15310764e+00, -1.28530553e-02,
        -5.76123714e-01,  8.60795438e-01,  1.76193161e+01,  7.69186687e+00,
         1.05096633e-02,  5.15164211e-02, -2.48046803e+00,  1.32270586e+00,
        -2.25072289e+00,  5.69219828e+00, -1.38959253e+00,  1.62736397e+01,
        -8.12963390e+00,  2.23423982e+00,  7.58430099e+00,  2.93936163e-01,
         4.16086674e+00,  9.06507683e+00,  7.49802440e-02,  9.94850695e-01,
        -5.30858874e-01,  2.83726931e+00,  6.86642528e-01,  1.60034144e+00,
         1.67230380e+00,  1.03161788e+00,  1.59616947e+01,  1.31335080e+00,
         4.84881115e+00,  6.14683032e-01, -8.90313816e+00, -2.12549075e-01,
        -2.11784393e-01, -1.48440564e+00,  4.82855380e-01, -9.86536026e-01,
        -3.39097095e+00, -6.08709872e-01,  6.58010149e+00, -1.88684082e+00,
        -1.24421378e-03,  1.05124637e-01,  5.68092060e+00, -5.80791092e+00,
         1.15198154e+01,  5.63493919e+00,  5.89573622e-01,  6.79265213e+00,
         6.17962408e+00, -8.74048519e+00,  5.99117374e+00,  1.38388929e+01,
        -3.47207069e-01,  4.13742256e+00,  9.31134319e+00,  7.82230973e-01,
         1.00510216e+01,  6.51883888e+00,  4.51659933e-02,  2.16893425e+01,
         1.84543401e-01, -3.50218683e-01,  9.49227095e-01,  4.19830494e-02,
        -8.80231977e-01,  1.04302864e+01,  1.07236528e+01, -3.01085401e+00,
         1.77256107e+00, -1.66283143e+00, -8.47181702e+00, -2.34353259e-01,
         1.53497944e+01,  7.91017103e+00,  1.79229784e+00,  5.81019521e-01,
         3.54849339e-01,  1.95794022e+00,  1.26474485e+01, -3.29144746e-01,
         1.15832796e+01,  1.18883247e+01,  8.93653870e+00, -8.44317555e-01],
       dtype=float32)>,
 <tf.Tensor: id=5721, shape=(128, 10), dtype=float32, numpy=
 array([[ 3.1336699e+03,  1.8366328e+03, -2.4875808e+03, ...,
         -4.4612781e+02, -2.6972102e+03,  4.0494846e+03],
        [ 1.2611420e+04,  9.4572920e+03, -8.0339561e+03, ...,
         -3.4947998e+02, -9.3134062e+03,  1.3998470e+04],
        [ 3.3486831e+03,  2.8550195e+03, -2.3804250e+03, ...,
         -3.6743832e+02, -3.3628354e+03,  4.0677268e+03],
        ...,
        [ 7.8448711e+03,  5.0678994e+03, -4.8276855e+03, ...,
         -1.4275117e+03, -7.3473486e+03,  1.0647826e+04],
        [ 1.6595094e+04,  1.1535066e+04, -1.0277844e+04, ...,
         -1.7369330e+03, -1.4152996e+04,  1.9956799e+04],
        [ 1.9223188e+03,  1.3224614e+03, -1.4478511e+03, ...,
         -5.4362202e+00, -1.2794425e+03,  2.1074729e+03]], dtype=float32)>,
 <tf.Tensor: id=5719, shape=(10,), dtype=float32, numpy=
 array([ 52.70592  ,  34.86287  , -35.887447 ,   2.6505039,  39.521217 ,
        -13.3063545,  46.65375  ,  -2.53192  , -39.36912  ,  62.809788 ],
       dtype=float32)>]

grads为一个列表, 分别记录了 $w1$ , $b1$ , $w2$ , $b2$ 的梯度

更新参数

熟悉深度学习里面,参数的更新形式如下: ${w}' = w - lr*\frac{\vartheta l}{\vartheta w}$
用assign_sub更新参数

w1.assign_sub(lr * grads[0]) 
b1.assign_sub(lr * grads[1]) 
w2.assign_sub(lr * grads[2]) 
b2.assign_sub(lr * grads[3])

当我们不断的循环更新参数,理论上loss的值是不断减少的

for i in range(50):
    with tf.GradientTape() as tape:
        h1 = x@w1 + b1
        h1 = tf.nn.relu(h1)

        out = h1@w2 +b2

        loss = tf.square(y - out)
        loss = tf.reduce_mean(loss)

    grads = tape.gradient(loss, [w1, b1, w2, b2])
    lr = 0.01
    w1.assign_sub(lr * grads[0]) 
    b1.assign_sub(lr * grads[1]) 
    w2.assign_sub(lr * grads[2]) 
    b2.assign_sub(lr * grads[3])
    print(loss.numpy())

loss:
0.41989234
0.35508955
0.31600225
0.29151577
0.27551207
0.26453122
0.25657973
0.25048772
0.24556401
0.24139453
0.23772821
0.2344122
0.2313513
0.22848529
0.22577567
0.2231966
0.22073068
...

实验证明也如此
当网络参数多的时候,不肯能再用assign_sub一个个地计算,这时候就要用到optimizer了

optimizer = keras.optimizers.SGD()
for i in range(50):
    with tf.GradientTape() as tape:
        h1 = x@w1 + b1
        h1 = tf.nn.relu(h1)

        out = h1@w2 +b2

        loss = tf.square(y - out)
        loss = tf.reduce_mean(loss)

    grads = tape.gradient(loss, [w1, b1, w2, b2])
    optimizer.apply_gradients(zip(grads, [w1, b1, w2, b2]))

end

此文写的东西比较简单基础,但也是非常重要的部分,操作再骚也离不开如此,熟练理解将很好帮助模型代码的构建

网友评论

本文标题：tensorflow2.0(4)-前向传播计算

本文链接：https://www.haomeiwen.com/subject/ylxogctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！