美文网首页AI
数据增强:Mixup,Cutout,CutMix | Mosai

数据增强:Mixup,Cutout,CutMix | Mosai

作者: 毛十三_ | 来源:发表于2020-07-29 17:08 被阅读0次

    论文地址:https://arxiv.org/abs/1905.04899v2

    1.几种数据增强的比较

    • Mixup:将随机的两张样本按比例混合,分类的结果按比例分配;
    • Cutout:随机的将样本中的部分区域cut掉,并且填充0像素值,分类的结果不变;
    • CutMix:就是将一部分区域cut掉但不填充0像素而是随机填充训练集中的其他数据的区域像素值,分类结果按一定的比例分配


    区别
    上述三种数据增强的区别:cutout和cutmix就是填充区域像素值的区别;mixup和cutmix是混合两种样本方式上的区别:mixup是将两张图按比例进行插值来混合样本,cutmix是采用cut部分区域再补丁的形式去混合图像,不会有图像混合后不自然的情形。
    优点
    (1)在训练过程中不会出现非信息像素,从而能够提高训练效率;
    (2)保留了regional dropout的优势,能够关注目标的non-discriminative parts;
    (3)通过要求模型从局部视图识别对象,对cut区域中添加其他样本的信息,能够进一步增强模型的定位能力;
    (4)不会有图像混合后不自然的情形,能够提升模型分类的表现;
    (5)训练和推理代价保持不变。

    2.What does model learn with CutMix?

    作者通过热力图,给出了结果。CutMix的操作使得模型能够从一幅图像上的局部视图上识别出两个目标,提高训练的效率。由图可以看出,Cutout能够使得模型专注于目标较难区分的区域(腹部),但是有一部分区域是没有任何信息的,会影响训练效率;Mixup的话会充分利用所有的像素信息,但是会引入一些非常不自然的伪像素信息。


    3. 查看CutMix代码

    代码地址:https://github.com/clovaai/CutMix-PyTorch
    生成裁剪区域

    """输入为:样本的size和生成的随机lamda值"""
    def rand_bbox(size, lam):
        W = size[2]
        H = size[3]
        """1.论文里的公式2,求出B的rw,rh"""
        cut_rat = np.sqrt(1. - lam)
        cut_w = np.int(W * cut_rat)
        cut_h = np.int(H * cut_rat)
     
        # uniform
        """2.论文里的公式2,求出B的rx,ry(bbox的中心点)"""
        cx = np.random.randint(W)
        cy = np.random.randint(H)
        #限制坐标区域不超过样本大小
     
        bbx1 = np.clip(cx - cut_w // 2, 0, W)
        bby1 = np.clip(cy - cut_h // 2, 0, H)
        bbx2 = np.clip(cx + cut_w // 2, 0, W)
        bby2 = np.clip(cy + cut_h // 2, 0, H)
        """3.返回剪裁B区域的坐标值"""
        return bbx1, bby1, bbx2, bby2
    

    整体流程

    """train.py 220-244行"""
    for i, (input, target) in enumerate(train_loader):
        # measure data loading time
        data_time.update(time.time() - end)
     
        input = input.cuda()
        target = target.cuda()
        r = np.random.rand(1)
        if args.beta > 0 and r < args.cutmix_prob:
            # generate mixed sample
            """1.设定lamda的值,服从beta分布"""
            lam = np.random.beta(args.beta, args.beta)
            """2.找到两个随机样本"""
            rand_index = torch.randperm(input.size()[0]).cuda()
            target_a = target#一个batch
            target_b = target[rand_index] #batch中的某一张
            """3.生成剪裁区域B"""
            bbx1, bby1, bbx2, bby2 = rand_bbox(input.size(), lam)
            """4.将原有的样本A中的B区域,替换成样本B中的B区域"""
            input[:, :, bbx1:bbx2, bby1:bby2] = input[rand_index, :, bbx1:bbx2, bby1:bby2]
            # adjust lambda to exactly match pixel ratio
            """5.根据剪裁区域坐标框的值调整lam的值"""
            lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (input.size()[-1] * input.size()[-2]))
            # compute output
            """6.将生成的新的训练样本丢到模型中进行训练"""
            output = model(input)
            """7.按lamda值分配权重"""
            loss = criterion(output, target_a) * lam + criterion(output, target_b) * (1. - lam)
        else:
            # compute output
            output = model(input)
            loss = criterion(output, target)
    

    3. 查看CutOut代码

    import torch
    import numpy as np
    
    
    class Cutout(object):
        """Randomly mask out one or more patches from an image.
        Args:
            n_holes (int): Number of patches to cut out of each image.
            length (int): The length (in pixels) of each square patch.
        """
        def __init__(self, n_holes, length):
            self.n_holes = n_holes
            self.length = length
    
        def __call__(self, img):
            """
            Args:
                img (Tensor): Tensor image of size (C, H, W).
            Returns:
                Tensor: Image with n_holes of dimension length x length cut out of it.
            """
            h = img.size(1)
            w = img.size(2)
    
            mask = np.ones((h, w), np.float32)
    
            for n in range(self.n_holes):
                y = np.random.randint(h)
                x = np.random.randint(w)
    
                y1 = np.clip(y - self.length // 2, 0, h)
                y2 = np.clip(y + self.length // 2, 0, h)
                x1 = np.clip(x - self.length // 2, 0, w)
                x2 = np.clip(x + self.length // 2, 0, w)
    
                mask[y1: y2, x1: x2] = 0.
    
            mask = torch.from_numpy(mask)
            mask = mask.expand_as(img)
            img = img * mask
    
            return img
    

    4.Mosaic数据增强方法

    Yolov4的mosaic数据增强参考了CutMix数据增强方式,理论上具有一定的相似性。CutMix数据增强方式利用两张图片进行拼接,但是mosaic利用了四张图片,根据论文所说其拥有一个巨大的优点是丰富检测物体的背景,且在BN计算的时候一下子会计算四张图片的数据。

    实现思路

    1.每次读取四张图片


    2.分别对四张图片进行翻转、缩放、色域变化等,并且按照四个方向位置摆好。



    3.进行图片的组合和框的组合


    全部代码
    from PIL import Image, ImageDraw
    import numpy as np
    from matplotlib.colors import rgb_to_hsv, hsv_to_rgb
    import math
    def rand(a=0, b=1):
        return np.random.rand()*(b-a) + a
    
    def merge_bboxes(bboxes, cutx, cuty):
    
        merge_bbox = []
        for i in range(len(bboxes)):
            for box in bboxes[i]:
                tmp_box = []
                x1,y1,x2,y2 = box[0], box[1], box[2], box[3]
    
                if i == 0:
                    if y1 > cuty or x1 > cutx:
                        continue
                    if y2 >= cuty and y1 <= cuty:
                        y2 = cuty
                        if y2-y1 < 5:
                            continue
                    if x2 >= cutx and x1 <= cutx:
                        x2 = cutx
                        if x2-x1 < 5:
                            continue
                    
                if i == 1:
                    if y2 < cuty or x1 > cutx:
                        continue
    
                    if y2 >= cuty and y1 <= cuty:
                        y1 = cuty
                        if y2-y1 < 5:
                            continue
                    
                    if x2 >= cutx and x1 <= cutx:
                        x2 = cutx
                        if x2-x1 < 5:
                            continue
    
                if i == 2:
                    if y2 < cuty or x2 < cutx:
                        continue
    
                    if y2 >= cuty and y1 <= cuty:
                        y1 = cuty
                        if y2-y1 < 5:
                            continue
    
                    if x2 >= cutx and x1 <= cutx:
                        x1 = cutx
                        if x2-x1 < 5:
                            continue
    
                if i == 3:
                    if y1 > cuty or x2 < cutx:
                        continue
    
                    if y2 >= cuty and y1 <= cuty:
                        y2 = cuty
                        if y2-y1 < 5:
                            continue
    
                    if x2 >= cutx and x1 <= cutx:
                        x1 = cutx
                        if x2-x1 < 5:
                            continue
    
                tmp_box.append(x1)
                tmp_box.append(y1)
                tmp_box.append(x2)
                tmp_box.append(y2)
                tmp_box.append(box[-1])
                merge_bbox.append(tmp_box)
        return merge_bbox
    
    def get_random_data(annotation_line, input_shape, random=True, hue=.1, sat=1.5, val=1.5, proc_img=True):
        '''random preprocessing for real-time data augmentation'''
        h, w = input_shape
        min_offset_x = 0.4
        min_offset_y = 0.4
        scale_low = 1-min(min_offset_x,min_offset_y)
        scale_high = scale_low+0.2
    
        image_datas = [] 
        box_datas = []
        index = 0
    
        place_x = [0,0,int(w*min_offset_x),int(w*min_offset_x)]
        place_y = [0,int(h*min_offset_y),int(w*min_offset_y),0]
        for line in annotation_line:
            # 每一行进行分割
            line_content = line.split()
            # 打开图片
            image = Image.open(line_content[0])
            image = image.convert("RGB") 
            # 图片的大小
            iw, ih = image.size
            # 保存框的位置
            box = np.array([np.array(list(map(int,box.split(',')))) for box in line_content[1:]])
            
            # image.save(str(index)+".jpg")
            # 是否翻转图片
            flip = rand()<.5
            if flip and len(box)>0:
                image = image.transpose(Image.FLIP_LEFT_RIGHT)
                box[:, [0,2]] = iw - box[:, [2,0]]
    
            # 对输入进来的图片进行缩放
            new_ar = w/h
            scale = rand(scale_low, scale_high)
            if new_ar < 1:
                nh = int(scale*h)
                nw = int(nh*new_ar)
            else:
                nw = int(scale*w)
                nh = int(nw/new_ar)
            image = image.resize((nw,nh), Image.BICUBIC)
    
            # 进行色域变换
            hue = rand(-hue, hue)
            sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)
            val = rand(1, val) if rand()<.5 else 1/rand(1, val)
            x = rgb_to_hsv(np.array(image)/255.)
            x[..., 0] += hue
            x[..., 0][x[..., 0]>1] -= 1
            x[..., 0][x[..., 0]<0] += 1
            x[..., 1] *= sat
            x[..., 2] *= val
            x[x>1] = 1
            x[x<0] = 0
            image = hsv_to_rgb(x)
    
            image = Image.fromarray((image*255).astype(np.uint8))
            # 将图片进行放置,分别对应四张分割图片的位置
            dx = place_x[index]
            dy = place_y[index]
            new_image = Image.new('RGB', (w,h), (128,128,128))
            new_image.paste(image, (dx, dy))
            image_data = np.array(new_image)/255
    
            # Image.fromarray((image_data*255).astype(np.uint8)).save(str(index)+"distort.jpg")
            
            index = index + 1
            box_data = []
            # 对box进行重新处理
            if len(box)>0:
                np.random.shuffle(box)
                box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
                box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
                box[:, 0:2][box[:, 0:2]<0] = 0
                box[:, 2][box[:, 2]>w] = w
                box[:, 3][box[:, 3]>h] = h
                box_w = box[:, 2] - box[:, 0]
                box_h = box[:, 3] - box[:, 1]
                box = box[np.logical_and(box_w>1, box_h>1)]
                box_data = np.zeros((len(box),5))
                box_data[:len(box)] = box
            
            image_datas.append(image_data)
            box_datas.append(box_data)
    
            img = Image.fromarray((image_data*255).astype(np.uint8))
            for j in range(len(box_data)):
                thickness = 3
                left, top, right, bottom  = box_data[j][0:4]
                draw = ImageDraw.Draw(img)
                for i in range(thickness):
                    draw.rectangle([left + i, top + i, right - i, bottom - i],outline=(255,255,255))
            img.show()
    
        
        # 将图片分割,放在一起
        cutx = np.random.randint(int(w*min_offset_x), int(w*(1 - min_offset_x)))
        cuty = np.random.randint(int(h*min_offset_y), int(h*(1 - min_offset_y)))
    
        new_image = np.zeros([h,w,3])
        new_image[:cuty, :cutx, :] = image_datas[0][:cuty, :cutx, :]
        new_image[cuty:, :cutx, :] = image_datas[1][cuty:, :cutx, :]
        new_image[cuty:, cutx:, :] = image_datas[2][cuty:, cutx:, :]
        new_image[:cuty, cutx:, :] = image_datas[3][:cuty, cutx:, :]
    
        # 对框进行进一步的处理
        new_boxes = merge_bboxes(box_datas, cutx, cuty)
    
        return new_image, new_boxes
    
    def normal_(annotation_line, input_shape):
        '''random preprocessing for real-time data augmentation'''
        line = annotation_line.split()
        image = Image.open(line[0])
        box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
     
        iw, ih = image.size
        image = image.transpose(Image.FLIP_LEFT_RIGHT)
        box[:, [0,2]] = iw - box[:, [2,0]]
    
        return image, box
    
    if __name__ == "__main__":
        with open("2007_train.txt") as f:
            lines = f.readlines()
        a = np.random.randint(0,len(lines))
        # index = 0
        # line_all = lines[a:a+4]
        # for line in line_all:
        #     image_data, box_data = normal_(line,[416,416])
        #     img = image_data
        #     for j in range(len(box_data)):
        #         thickness = 3
        #         left, top, right, bottom  = box_data[j][0:4]
        #         draw = ImageDraw.Draw(img)
        #         for i in range(thickness):
        #             draw.rectangle([left + i, top + i, right - i, bottom - i],outline=(255,255,255))
        #     img.show()
        #     # img.save(str(index)+"box.jpg")
        #     index = index+1
            
        line = lines[a:a+4]
        image_data, box_data = get_random_data(line,[416,416])
        img = Image.fromarray((image_data*255).astype(np.uint8))
        for j in range(len(box_data)):
            thickness = 3
            left, top, right, bottom  = box_data[j][0:4]
            draw = ImageDraw.Draw(img)
            for i in range(thickness):
                draw.rectangle([left + i, top + i, right - i, bottom - i],outline=(255,255,255))
        img.show()
        # img.save("box_all.jpg")
    
    

    相关文章

      网友评论

        本文标题:数据增强:Mixup,Cutout,CutMix | Mosai

        本文链接:https://www.haomeiwen.com/subject/vzdhrktx.html