美文网首页
图像的相同比较

图像的相同比较

作者: xiaoyao_777 | 来源:发表于2018-12-12 14:58 被阅读0次

因为深度学习训练数据少,要生成样本。虽然在生成样本过程中,是以区间的形式生成,但是领导还是怕有重复,怕训练的时候出现问题,不贴合实际场景。这样,首先简单看下在生成样本过程中,会有多少重复样本。下面是简单判断在一个文件夹下,图像样本相同比较:

import os
import os.path as osp
import cv2
import numpy as np


def get_file_list(src_dir, ext='.jpg'):
    name_list = []
    for roots, dirs, files in os.walk(src_dir):
        if roots != src_dir:
            break

        for file_name in files:
            if file_name.endswith(ext):
                name_list.append(osp.splitext(file_name)[0])

    return name_list


def CheckImages(src_dir, im_name_list, src_ext):
    num = len(im_name_list)
    count = 0
    for i in range(0, num-1):
        im_name = src_dir + '/' + im_name_list[i] + src_ext
        print(i, '   of   ', num)
        img = cv2.imread(im_name, cv2.IMREAD_COLOR)

        fg = 0
        for j in range(i+1, num):
            im_name_t = src_dir + '/' + im_name_list[j] + src_ext
            img_t = cv2.imread(im_name_t, cv2.IMREAD_COLOR)

            # im_sub = img - img_t
            # p_min = np.min(im_sub)
            p_max = np.max(img - img_t)

            if 0 == p_max:
                fg = 1
                break

        if 1 == fg:
            count += 1

    print('ratio of the same', float(count) / num)


if __name__ == '__main__':
    src_dir = '/media/ada/m-disk-6t/pycharmprojects/mysamples/samples/data'
    ext = '.png'
    name_list = get_file_list(src_dir, ext=ext)
    CheckImages(src_dir, name_list, ext)

最终测试发现,1000张图像中有1张重复。

相关文章

网友评论

      本文标题:图像的相同比较

      本文链接:https://www.haomeiwen.com/subject/yltphqtx.html