欠拟合 underfitting

就是模型的复杂度小于真实的复杂度，因此模型不能够表达真实的情况。如果遇到无论怎么训练，训练的accuracy很低，测试的accuracy很低，loss也下不去，这个时候很可能出现了underfitting。可以使用容量更大的模型来表达更加复杂的情况，或者更多的层数以及更多的节点。

1.png

提高模型容量(model capacity)如下图可以解决欠拟合，然而在实际的应用中过拟合的情况更多

2.png

过拟合Overfitting（Generalization Performance泛化能力）

模型复杂度大于真实模型的复杂度。表现为训练loss和训练accuracy都很好，但是测试accuracy不好。

4.png

5.png

如何检测overfitting：

使用交叉验证，将数据集分为Train、Validation、Test三个部分，其中Validation做模型参数的挑选，test做最后的性能检测
使用K-fold方式，将数据集划分为K份，每次去K-1份用来做train，一份用来做validation，每个epoch切换train和validation的数据集，这样既防止了死记硬背又防止了记忆的特性。这样会对网络有一定的提升（提升不算很大），Kera是提供了一个很方便的方法：network.fit(db_train, epochs=6, validation_split=0.1, validation_freq=2) 会将数据按照0.1和0.9来分。

import  tensorflow as tf
from    tensorflow.keras import datasets, layers, optimizers, Sequential, metrics
 
 
def preprocess(x, y):
    """
    x is a simple image, not a batch
    """
    x = tf.cast(x, dtype=tf.float32) / 255.
    x = tf.reshape(x, [28*28])
    y = tf.cast(y, dtype=tf.int32)
    y = tf.one_hot(y, depth=10)
    return x,y
 
 
batchsz = 128
(x, y), (x_test, y_test) = datasets.mnist.load_data()
print('datasets:', x.shape, y.shape, x.min(), x.max())
 
idx = tf.range(60000)
idx = tf.random.shuffle(idx)
x_train, y_train = tf.gather(x, idx[:50000]), tf.gather(y, idx[:50000])
x_val, y_val = tf.gather(x, idx[-10000:]) , tf.gather(y, idx[-10000:])
print(x_train.shape, y_train.shape, x_val.shape, y_val.shape)
## train
db_train = tf.data.Dataset.from_tensor_slices((x_train,y_train))
db_train = db_train.map(preprocess).shuffle(50000).batch(batchsz)
 
db_val = tf.data.Dataset.from_tensor_slices((x_val,y_val))
db_val = db_val.map(preprocess).shuffle(10000).batch(batchsz)
 
db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
db_test = db_test.map(preprocess).batch(batchsz) 
 
sample = next(iter(db_train))
print(sample[0].shape, sample[1].shape)
 
 
network = Sequential([layers.Dense(256, activation='relu'),
                     layers.Dense(128, activation='relu'),
                     layers.Dense(64, activation='relu'),
                     layers.Dense(32, activation='relu'),
                     layers.Dense(10)])
network.build(input_shape=(None, 28*28))
network.summary()
 
network.compile(optimizer=optimizers.Adam(lr=0.01),
      loss=tf.losses.CategoricalCrossentropy(from_logits=True),
      metrics=['accuracy']
   )
 
network.fit(db_train, epochs=6, validation_data=db_val, validation_freq=2)
 
print('Test performance:') 
network.evaluate(db_test)
 
 
sample = next(iter(db_test))
x = sample[0]
y = sample[1] # one-hot
pred = network.predict(x) # [b, 10]
# convert back to number 
y = tf.argmax(y, axis=1)
pred = tf.argmax(pred, axis=1)
 
print(pred)
print(y)

如何减轻Overfitting

原则：如果不是必要的就选择最小的。

主流的做法：

提供更多的数据
降低模型的复杂度，数据集的大小和网络的大小是相对的
Dropout
Data argumentation
Early Stopping 使用Validation set来做一个提前的终结
Regularization
Regularization

6.png

经过Regularization退化成更少次方的网络结构，更低复杂度的网络结构从而降低Overfitting，是一种weight decay的方法