环境搭建
基于linux/mac、Python
1.安装Python图像库
pip install Pillow
2.安装Python库Pytesseract
pip install pytesseract
3.安装Python库tesseract
pip install tesseract
4.安装leptonica
./configure
make
make install
5.安装tesseract-ocr
./autogen.sh
CPPFLAGS="-I/usr/local/include" LDFLAGS="-L/usr/local/lib" ./configure
make
make install
安装完成后,再下载tessdata,将其放置于tessdata目录。
Python脚本:
#!/usr/bin/env python
#-*- coding:utf-8 -*-
# author:wdl
# time:2017-03-08 pm
import os
import sys
import time
import requests
from PIL import Image
import pytesseract
import subprocess
code_url = "https://www.jiguang.cn/captcha/login/"
def identification_code(url):
#获取验证码并保存
with open("captcha.jpg","wb") as i:
i.write(requests.get(url,stream=True).content)
#打开图片
im = Image.open("captcha.jpg")
#转化图片为灰度图
im = im.convert('L')
def initTable(threshold=140):
table = []
for i in range(256):
if i < threshold:
table.append(0)
else:
table.append(1)
return table
#灰度图二值化
bininaryImage = im.point(initTable(),'1')
#将图片转化为文本
return pytesseract.image_to_string(bininaryImage,lang="eng",config="-psm 7")
print(identification_code(code_url))
网友评论