机器学习的基本步骤

使用机器学习解决问题，通常包含有一些基本步骤：

获取训练数据
定义模型
定义损失函数
计算训练数据的损失值
计算损失对于训练参数的梯度，并使用优化器更新变量
评价模型
为了熟悉流程，我们以简单的线性回归为例进行讲解。

数据获取

有监督学习需要的数据包含有输入特征和输出标签。通过从输入和输出的匹配中训练到一个模型，使得其可以根据输入预测输出。
在TensorFlow中，一条输入数据被表示为一个张量或一个向量。同时，有监督学习的输出也被表示为一个张量。
下列的代码中，是在一条线性关系上增加了高斯噪声。

TRUE_W = 3.0
TRUE_B = 2.0

NUM_EXAMPLES = 201

x = tf.linspace(-2, 2, NUM_EXAMPLES)
x = tf.cast(x, tf.float32)


def f(x):
    return x * TRUE_W + TRUE_B


noise = tf.random.normal(shape=[NUM_EXAMPLES])

y = f(x) + noise

plt.plot(x, y, '.')
plt.show()

模型定义

我们一般使用tf.Variable表示模型的参数。一个tf.Variable对象存储了一个张量值。tf.Module封装了变量和计算。当然，想要封装变量和计算，只需要一个Python类即可，但是你可以通过继承tf.Module从而使用一些已经实现的功能。下面代码定义了两个变量w和b：

class MyModel(tf.Module):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.w = tf.Variable(5.0)
        self.b = tf.Variable(0.0)

    def __call__(self, x):
        return self.w * x + self.b


model = MyModel()

print("Variables:", model.variables)

assert model(3.0).numpy() == 15.0

上述代码中的初始化变量设置成了一个固定的数字。但是keras已经提供了多种初始化的方法，你可以自由使用。

定义损失函数

损失函数用于评估，模型的预测值与现实标签之间的接近程度。训练过程即最小化损失函数的过程。本章采用L2损失函数，即均方误差作为损失函数。

def loss(target_y, predicted_y):
    return tf.reduce_mean(tf.square(target_y - predicted_y))

训练过程

所谓训练过程，就是不断的重复以下四个步骤：

输入一批数据，让模型进行预测
计算损失函数评估预测值与真实值之间的差距
使用gradient tape得到梯度
使用梯度更新模型中的参数。
tf.keras.optimizers里面封装了一些常用的梯度下降法的变种。但是，我们将借助于tape自己实现一个梯度下降法。最终的代码如下：

import tensorflow as tf
import matplotlib.pyplot as plt

colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

TRUE_W = 3.0
TRUE_B = 2.0

NUM_EXAMPLES = 201

# A vector of random x values
x = tf.linspace(-2, 2, NUM_EXAMPLES)
x = tf.cast(x, tf.float32)


def f(x):
    return x * TRUE_W + TRUE_B


# Generate some noise
noise = tf.random.normal(shape=[NUM_EXAMPLES])

# Calculate y
y = f(x) + noise


class MyModel(tf.Module):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.w = tf.Variable(5.0)
        self.b = tf.Variable(0.0)

    def __call__(self, x):
        return self.w * x + self.b


def loss(target_y, predicted_y):
    return tf.reduce_mean(tf.square(target_y - predicted_y))


def train(model, x, y, learning_rate):
    with tf.GradientTape() as tape:
        current_loss = loss(y, model(x))
    dw, db = tape.gradient(current_loss, [model.w, model.b])

    model.w.assign_sub(learning_rate * dw)
    model.b.assign_sub(learning_rate * db)


model = MyModel()
weights = []
biases = []
epochs = range(100)


def report(model, loss):
    return f"W = {model.w.numpy():1.2f}, b = {model.b.numpy():1.2f}, loss={loss:2.5f}"


def training_loop(model, x, y):
    for epoch in epochs:
        train(model, x, y, learning_rate=0.01)
        weights.append(model.w.numpy())
        biases.append(model.b.numpy())
        current_loss = loss(y, model(x))

        print(f"Epoch {epoch:2d}: ")
        print("    ", report(model, current_loss))


training_loop(model, x, y)

plt.plot(x, y, '.', label="Data")
plt.plot(x, f(x), label="Ground truth")
plt.plot(x, model(x), label="Predictions")
plt.legend()
plt.show()

print("Current loss: %1.6f" % loss(model(x), y).numpy())

结果为:

Epoch  0: 
     W = 4.95, b = 0.04, loss=9.74990
Epoch  1: 
     W = 4.90, b = 0.08, loss=9.33798
Epoch  2: 
     W = 4.85, b = 0.12, loss=8.94586
Epoch  3: 
     W = 4.80, b = 0.16, loss=8.57255
Epoch  4: 
     W = 4.75, b = 0.19, loss=8.21715
......
Epoch 93: 
     W = 3.20, b = 1.72, loss=1.12299
Epoch 94: 
     W = 3.20, b = 1.72, loss=1.11778
Epoch 95: 
     W = 3.19, b = 1.73, loss=1.11279
Epoch 96: 
     W = 3.19, b = 1.73, loss=1.10802
Epoch 97: 
     W = 3.19, b = 1.74, loss=1.10346
Epoch 98: 
     W = 3.18, b = 1.75, loss=1.09909
Epoch 99: 
     W = 3.18, b = 1.75, loss=1.09492
Current loss: 1.094921

预测结果对比.png

基于Keras的解决方案

下面使用keras来解决上面提到的线性回归的问题。
基于tf.keras.Model的代码与上面的建模代码相差不大，需要记住，Keras的模型都继承自tf.kears.Module。你可以使用model.compile()方法来配置参数，也可以使用model.fit()方法进行训练，还使用极其简洁的代码来指定使用L2损失函数和梯度下降法。你还可以使用在外部已经定义好的损失函数和优化器，那么重写上面的例子的代码就变成了：

TRUE_W = 3.0
TRUE_B = 2.0

NUM_EXAMPLES = 201

# A vector of random x values
x = tf.linspace(-2, 2, NUM_EXAMPLES)
x = tf.cast(x, tf.float32)


def f(x):
    return x * TRUE_W + TRUE_B


# Generate some noise
noise = tf.random.normal(shape=[NUM_EXAMPLES])

# Calculate y
y = f(x) + noise


class MyModelKeras(tf.keras.Model):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # Initialize the weights to `5.0` and the bias to `0.0`
        # In practice, these should be randomly initialized
        self.w = tf.Variable(5.0)
        self.b = tf.Variable(0.0)

    def call(self, x):
        return self.w * x + self.b


keras_model = MyModelKeras()
keras_model.compile(
    # By default, fit() uses tf.function().  You can
    # turn that off for debugging, but it is on now.
    run_eagerly=False,

    # Using a built-in optimizer, configuring as an object
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.1),

    # Keras comes with built-in MSE error
    # However, you could use the loss function
    # defined above
    loss=tf.keras.losses.mean_squared_error,
)
print(x.shape[0])
keras_model.fit(x, y, epochs=100, batch_size=1000)

fit方法需要指定数据和批次和epoch大小，上面的代码中，设置了epoch为10，每批次数据量为1000。