昨天将模型压缩到了4.4MB,之后还想进一步压缩,这时最直接的一个想法是将模型中的单精度浮点数换成半精度浮点数,从理论上,不仅可以减少一半的模型大小,还可以减少一半的计算量和现存占用。查阅资料,找到了用float16和float32进行分类模型准确率对比的实验,其结果显示,使用float16和float32的准确率差距极小(后来发现这里仅仅是测试是使用float16,而训练时用的还是float32)。
084936fmmnmvwlv33vwnm2.jpg 于是动手改模型,把如下网络中所有参数的数据类型从tf.float32
改成tf.float16
,之后进行训练,得到的模型大小为2.2M,但多次测试发现该模型不收敛。
def net(image, training):
conv1 = relu(instance_norm(conv2d(image, 3, 32, 9, 1)))
conv2 = relu(instance_norm(conv2d(conv1, 32, 64, 3, 2)))
conv3 = relu(instance_norm(conv2d(conv2, 64, 128, 3, 2)))
res1 = residual(conv3, 128, 3, 1)
res2 = residual(res1, 128, 3, 1)
res3 = residual(res2, 128, 3, 1)
res4 = residual(res3, 128, 3, 1)
res5 = residual(res4, 128, 3, 1)
deconv1 = relu(instance_norm(resize_conv2d(res5, 128, 64, 3, 2, training)))
deconv2 = relu(instance_norm(resize_conv2d(deconv1, 64, 32, 3, 2, training)))
deconv3 = tf.nn.tanh(instance_norm(conv2d(deconv2, 32, 3, 9, 1)))
后来发现TensorFlow中还有另外一种数据类型:tf.bfloat16
,是一种TensorFlow特有的数据类型,叫做截断浮点数(truncated 16-bit floating point),它是由一个float32截断前16位而成的。它和IEEE定义的float16不同,主要是用于取代float32进行神经网络训练,同时由于其表示范围和float32一致,可以避免float16的NaN问题。
于是,将模型中的数据类型改为tf.bfloat16,在修改过程中报了很多错,其中一个错误是由以下函数引起的。这个函数的作用是代替conv2d_transpose
函数,可以减轻生成的图像中的棋盘效应(了解详情点击这里),但它使用的tf.image.resize_images
函数不支持tf.bfloat16类型,所以每次在每次调用之前,要先将数据转换为tf.float32,在处理完成之后,再转换回tf.bfloat16。
def resize_conv2d(x, input_filters, output_filters, kernel, strides, training):
with tf.variable_scope('conv_transpose'):
height = x.get_shape()[1].value if training else tf.shape(x)[1]
width = x.get_shape()[2].value if training else tf.shape(x)[2]
new_height = height * strides * 2
new_width = width * strides * 2
x_resized = tf.image.resize_images(x, [new_height, new_width], tf.image.ResizeMethod.NEAREST_NEIGHBOR)
return conv2d(x_resized, input_filters, output_filters, kernel, strides)
虽然已经解决了这些问题,但网络仍然报错,类似这样:
InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'Floor' with these attrs. Registered devices: [CPU,GPU], Registered kernels:
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_HALF]
device='GPU'; T in [DT_FLOAT]
[[Node: model/model/dropout/Floor = Floor[T=DT_BFLOAT16, _device="/gpu:0"](model/model/dropout/add)]]
多次修改代码,无解,在Google上搜索找到了原因:bfloat16只支持Google自家的TPU(issue 21317),无奈,放弃。
bfloat16
support isn't complete for GPUs, as it's not supported natively by the devices.For performance you'll want to use float32 or float16 for GPU execution (though float16 can be difficult to train models with). TPUs support bfloat16 for effectively all operations (but you currently have to migrate your model to work on the TPU).
虽然bfloat16用不了,但最终还是找到了进一步压缩模型的方法。列出被保存的所有变量可以发现,保存的时候存储了很多没有用的变量,其名字包含Adam,如下所示。将这些变量去掉后,模型大小从4.4MB减小到了1.5MB,且不影响模型的加载。
<tf.Variable 'conv1/conv/weight/Adam:0' shape=(9, 9, 3, 32) dtype=float32_ref>
<tf.Variable 'conv1/conv/weight/Adam_1:0' shape=(9, 9, 3, 32) dtype=float32_ref>
<tf.Variable 'conv2/conv/weight/Adam:0' shape=(3, 3, 32, 64) dtype=float32_ref>
...
<tf.Variable 'deconv2/conv_transpose/conv/weight/Adam_1:0' shape=(3, 3, 64, 32) dtype=float32_ref>
网友评论