用全卷积网络(FCN)实现图像分割

作者: LabVIEW_Python | 来源:发表于2021-10-06 15:32 被阅读0次

用全卷积网络(FCN)实现图像分割
（20）图像分割——FCN
全连接卷积神经网络 FCN
语义分割
论文笔记 | CVPR2015 | Fully Convolut
全卷积网络FCN
图像分割：全卷积神经网络（FCN）详解
EAST: An Efficient and Accurate
Pytorch-UNet介绍
2019-05-29（卷积单元—更新中)

全卷积网络(FCN，Fully Convolutional Networks)，是2015年，由{jonlong,shelhamer,trevor}@cs.berkeley.edu提出的，这是深度学习在图像分割领域的开山之作，奠定了使用深度网络解决图像语义分割问题的基础框架。参考：Fully Convolutional Networks for Semantic Segmentation

它的意义在于：提出非常简单易懂的全卷积模型结构，先用卷积网络，例如MobileNetV2的特征提取层提取图像特征，然后移除最后的全连接层，接着使用上采样的转置卷积层（Transpose Convolution）将多次下采样的特征图恢复到和原图一样的大小，最后对每个像素生成一个分类的标签

全卷积网络

转置卷积层（Transpose Convolution）可以理解为一个可以被训练的自动插值器，实现将低分辨率的图片转换为高分辨率的图片。插值算法，例如：

最近邻插值(Nearest neighbor interpolation)
双线性插值(Bi-Linear interpolation)
双立方插值(Bi-Cubic interpolation)
已经很成熟了，这些插值算法类似于手动特征工程，依据设计者的经验，不可以被训练，并没有给神经网络学习的余地。
转置卷积这个方法不会使用预先定义的插值方法，它具有可以学习的参数

参考《Up-sampling with Transposed Convolution》

图像语义分割的目标是给每个像素赋予一个类别标签，目前经典语义分割模型在PASCAL VOC2012数据集上的测试结果如下：

经典语义分割模型测试结果

基于TensorFlow实现的语义分割模型源代码，在运行过程中，可以感受模型的预测是怎样随着训练而改善的

import tensorflow as tf 
from tensorflow_examples.models.pix2pix import pix2pix 

import tensorflow_datasets as tfds
tfds.disable_progress_bar()

import matplotlib.pyplot as plt 

dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)
print(info.splits['train'].num_examples)

TRAIN_LENGTH = info.splits['train'].num_examples
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE

def normalize(input_image, input_mask):
    input_image = tf.cast(input_image, tf.float32) / 255.0 #图像标准化到 [0,1]
    input_mask -= 1                                        #分割掩码都减 1，得到了以下的标签：{0, 1, 2}
    return input_image, input_mask

@tf.function 
def load_image_train(datapoint):
    input_image = tf.image.resize(datapoint['image'], (128,128))
    input_mask  = tf.image.resize(datapoint['segmentation_mask'], (128,128))

    if tf.random.uniform(()) > 0.5:
        input_image = tf.image.flip_left_right(input_image)
        input_mask  = tf.image.flip_left_right(input_mask)
    
    input_image, input_mask = normalize(input_image, input_mask)

    return input_image, input_mask

def load_image_test(datapoint):
    # 测试数据与训练数据做一样的resize + normalize操作
    input_image = tf.image.resize(datapoint['image'], (128,128))
    input_mask  = tf.image.resize(datapoint['segmentation_mask'], (128,128)) 
    # 测试数据无需图像增强操作
    input_image, input_mask = normalize(input_image, input_mask)
    return input_image, input_mask

train = dataset['train'].map(load_image_train, num_parallel_calls=tf.data.AUTOTUNE)
test  = dataset['test'].map(load_image_test)

#  If you wish to randomize the iteration order, make sure to call shuffle after calling cache
train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
test_dataset  = test.batch(BATCH_SIZE)

def display(display_list):
    plt.figure(figsize=(9, 9))

    title = ['Input Image', 'True Mask', 'Predicted Mask']

    for i in range(len(display_list)):
        plt.subplot(1, len(display_list), i+1)
        plt.title(title[i])
        plt.imshow(display_list[i])
        plt.axis('off')
    plt.show()

for batch_images, batch_masks in train_dataset.take(1):
    print(batch_images.shape, batch_masks.shape)
    sample_image, sample_mask = batch_images[0].numpy(), batch_masks[0].numpy()
    print(type(sample_image),type(sample_mask))

#display([sample_image, sample_mask])

# 输出信道数量为 3 是因为每个像素有三种可能的标签。
# 把这想象成一个多类别分类，对每个像素进行分类，每个像素属于三种类别之一
OUTPUT_CHANNELS = 3

# 直接使用预训练模型： MobileNetV2 
base_model = tf.keras.applications.MobileNetV2(
    input_shape=[128,128,3],
    include_top=False
)

#base_model.summary()

# 使用这些层的激活设置
layer_names = [
    'block_1_expand_relu',   # 64x64
    'block_3_expand_relu',   # 32x32
    'block_6_expand_relu',   # 16x16
    'block_13_expand_relu',  # 8x8
    'block_16_project',      # 4x4
]

layers = [base_model.get_layer(name).output for name in layer_names]

# 创建特征提取模型
down_stack = tf.keras.Model(
    inputs=base_model.input,
    outputs=layers 
)

down_stack.trainable = False 
#down_stack.summary()

# 直接利用TensorFlow examples 的解码器/上取样器
up_stack = [
    pix2pix.upsample(512,3), # 4x4 -> 8x8
    pix2pix.upsample(256,3), # 8x8 -> 16x16
    pix2pix.upsample(128,3), # 16x16 -> 32x32
    pix2pix.upsample(64,3),  # 32x32 -> 64x64
]

def unet_model(output_channels):
    inputs = tf.keras.Input(shape=(128,128,3))
    x = inputs 

    # 在模型中降采样
    skips = down_stack(x)
    x = skips[-1]
    skips = reversed(skips[:-1])

    # 上采样然后建立跳跃连接
    for up, skip in zip(up_stack, skips):
        x = up(x)
        concat = tf.keras.layers.Concatenate()
        x = concat([x, skip])
    
    # 模型的最后一层
    last = tf.keras.layers.Conv2DTranspose(
        filters=output_channels,
        kernel_size=3,
        strides=2,
        padding='same'
    ) # 64x64 -> 128x128

    outputs = last(x) # Output shape (batch_size, new_rows, new_cols, filters)

    return tf.keras.Model(inputs=inputs, outputs=outputs)

print("build unet...")
model = unet_model(OUTPUT_CHANNELS)

# model.summary()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

def create_mask(pred_mask):
    print(pred_mask.shape)
    pred_mask = tf.argmax(pred_mask, axis=-1)
    print(pred_mask.shape)
    pred_mask = pred_mask[..., tf.newaxis]
    print(pred_mask.shape)
    return pred_mask[0]

def show_predictions(dataset=None, num=1):
    if dataset:
        for image, mask in dataset.take(num):
            pred_mask = model.predict(image)
            print("pred_mask shape:",pred_mask.shape, pred_mask[0,0,0])
            display([image[0], mask[0], create_mask(pred_mask)])
    else:
        pred_mask = model.predict(sample_image[tf.newaxis, ...])
        print(sample_image.shape, sample_mask.shape, pred_mask.shape)
        pred_mask = create_mask(pred_mask)
        display([sample_image, sample_mask, pred_mask])
#print("show_predictions...")
#show_predictions()

class DisplayCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        #clear_output(wait=True)
        show_predictions()
        print ('\nSample Prediction after epoch {}\n'.format(epoch+1))

EPOCHS = 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS

model_history = model.fit(train_dataset, epochs=EPOCHS,
                          steps_per_epoch=STEPS_PER_EPOCH,
                          validation_steps=VALIDATION_STEPS,
                          validation_data=test_dataset,
                          callbacks=[DisplayCallback()])

loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

epochs = range(EPOCHS)

plt.figure()
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'bo', label='Validation loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss Value')
plt.ylim([0, 1])
plt.legend()
plt.show()

show_predictions(test_dataset, 3)

model.save("unet.h5")

用全卷积网络(FCN)实现图像分割
全卷积网络(FCN，Fully Convolutional Networks)，是2015年，由{jonlong,...
（20）图像分割——FCN
FCN是全卷积神经网络，用于图像语义分割，将图像级别的分类扩展到像素级别的分类。FCN将传统网络后面的全连接层换成...
全连接卷积神经网络 FCN
(一)全连接卷积神经网络(FCN) (1) 全连接卷积神经网络简介 FCN是深度神经网络用于语义分割的奠基性工作，...
语义分割
FCN FCN对图像像素级的分类，从而解决了语义级别的图像分割问题。与经典的CNN在卷积层之后使用全连接层得到固定...
论文笔记 | CVPR2015 | Fully Convolut
一为什么读这篇大名鼎鼎的FCN，第一个去全连接层的全卷积网络，虽然最初是用于图像分割领域，但是对后来的网络也有...
全卷积网络FCN
Paper Address 什么是全卷积网络? FCN是深度学习应用在图像分割的代表作, 是一种端到端(end t...
图像分割：全卷积神经网络（FCN）详解
转载请注明出处作为计算机视觉三大任务（图像分类、目标检测、图像分割）之一，图像分割已经在近些年里有了长足的发展。...
EAST: An Efficient and Accurate
1. Network Design 只包含两个阶段：全卷积网络(FCN)和NMS，其中FCN网络包括三个部分，特征...
Pytorch-UNet介绍
简介 UNet网络主要用在医学图像分割任务上，网络的结构特点就是：全卷积网络，没有全连接层，训练参数少，模型体积...
2019-05-29（卷积单元—更新中)
卷积神经网络的基本模块视觉基本问题：图像分类、目标识别、目标检测、图像分割、目标分割。卷积层卷积运算：对应位...