验证码的图像处理

作者: 专注吃喝五十年 | 来源:发表于2018-10-24 14:47 被阅读0次

用4层神经网络识别教务处验证码
【高级特性】52、图像处理库（PIL）
验证码的图像处理
爬虫滑动验证识别 opencv-python
12306
Python破解验证码技术，识别率高达百分之八十！
Python破解验证码技术，识别率高达百分之八十！
用这个方法去破解验证码！识别率竟达到90%！
爬虫设计流程
python图像识别

突然发现可以用markdown插入代码的╮(╯▽╰)╭

最近正好碰到爬带有登陆页的数据的需求，在使用tesserocr识别验证码时遇到识别度低的情况，原验证码如图所示

图1. 原有验证码
如不对验证码图片进行处理，识别结果如图2，结果并不理想。

图2. 未经处理的结果

可以使用ImageEnhance方法对图像进行处理，首先改变图像的对比度，使用ImageEnhance.Contrast(对比度值)来调节图像对比度：

enhancer = ImageEnhance.Contrast(image)
for i in range(9):
  enhancer.enhance(i*0.5).save("E:\\code_contrast_"+str(i*0.5)+".png")

将处理后的不同对比度图片保存下来：

图3. 改变图片对比度

在下一步，我们使用对比度为4的图片进行处理。
将图片RGB模式转换为黑白（“1”）或灰度模式（“L”）,代码如下：

enhancer = ImageEnhance.Contrast(image)
image = enhancer.enhance(4)
image = image.convert('1').save("E:\\code_binary.png")

得

图4. 改变图片颜色为黑白两色

下一步需要对黑白两色的图片进行去噪
对于每一个像素点来说，计算其九宫格中黑点个数，若周边黑点个数小于3个，就判别该点为噪点，将其像素值置为255（白色）

#去噪
image = Image.open("E:\\code_binary.png")
width = image.size[0]
height = image.size[1]

def remove_noise(image, x, y, width, height):
    # 注：getpixel里面的参数是个元组
    loc = image.getpixel((x,y))
    # 255为白色
    if loc == 255:
        return

    loc_x = x
    loc_y = y
    black_num = 0
    for x in range(loc_x - 1, loc_x + 2):
        for y in range(loc_y - 1, loc_y + 2):
            if x >= 0 and y >= 0 and x < width and y < height:
                if image.getpixel((x,y)) == 0:
                    black_num = black_num + 1
    if black_num < 4:
        image.putpixel((loc_x, loc_y), 255)
    return


for x in range(width):
    for y in range(height):
        remove_noise(image, x, y, width, height)

image.save("E:\\code_remove_noise.png")

得

图5. 去噪后的结果

识别后得

图6. tesserocr识别结果

整体代码如下：

import tesserocr
from PIL import Image,ImageEnhance

image = Image.open('E:\code.jpg')
# # 改变对比度进行测试，选用对比度为4的图片
# enhancer = ImageEnhance.Contrast(image)
# for i in range(9):
#     enhancer.enhance(i*0.5).save("E:\\code_contrast_"+str(i*0.5)+".png")

enhancer = ImageEnhance.Contrast(image)
image = enhancer.enhance(4)
# convert 将“RGB”转换为其他模式  “1”为二值图像，仅黑白两色  “L”为灰色图像，每个像素用8个bit表示，0为黑，255为白
image.convert('1').save("E:\\code_binary.png")

#去噪
image = Image.open("E:\\code_binary.png")
width = image.size[0]
height = image.size[1]

def remove_noise(image, x, y, width, height):
    # 注：getpixel里面的参数是个元组
    loc = image.getpixel((x,y))
    # 255为白色
    if loc == 255:
        return

    loc_x = x
    loc_y = y
    black_num = 0
    for x in range(loc_x - 1, loc_x + 2):
        for y in range(loc_y - 1, loc_y + 2):
            if x >= 0 and y >= 0 and x < width and y < height:
                if image.getpixel((x,y)) == 0:
                    black_num = black_num + 1
    if black_num < 4:
        image.putpixel((loc_x, loc_y), 255)
    return


for x in range(width):
    for y in range(height):
        remove_noise(image, x, y, width, height)

image.save("E:\\code_remove_noise.png")

result = tesserocr.image_to_text(image)
print(result)