Python PaddleOCR 识别图片中的中文

作者: g_s_007 | 来源:发表于2021-01-19 16:59 被阅读0次

Python PaddleOCR 识别图片中的中文
Python+Tesseract实现图片文字识别
文字识别软件（PC and Android 推荐）
paddleocr根据图片坐标识别文字
Python中文图片OCR
matplotlib（四）-中文乱码等问题记录
python库学习 - pytesseract 识别图片中文字
pytesseract
令Python matplotlib支持中文
python识别图片中的二维码

介绍与安装

关于OCR 的介绍，不再赘述，网上一搜就有很多。这里直接从安装开始，安装网上也很多，这里直接写下安装步骤。

pip install pytesseract
pip install Pillow
此链接下载稳定版本的tesseract: https://digi.bib.uni-mannheim.de/tesseract/
3安装后，需要配置环境变量和下载中文训练包
可参考此篇文章：https://blog.csdn.net/qq_40062513/article/details/103123386

使用

上述安装结束后，其实就可以直接使用了，但是识别效果并不理想，可试一下。

如果可以联网，接入第三方也是可以实现准确率很高的识别效果，如百度，腾讯都有OCR识别的接口。但是如果想要既安装简单，又不需要接口接入，又识别效果好怎么办？

网上找了三个OCR 中文识别的，可以直接pip install的：chineseocr_lite, cnocr, paddleocr 。最终选择paddleocr。

chineseocr_lite 安装较为复杂，识别准确率没有测试，不知道；

cnocr 直接pip install 即可，识别准确率很高，但是需要将cnocr-models 中的模型放置到相关路径下，安装简单，准确率较高；

paddleocr 直接pip install即可，识别准确率很高，最重要的是，指南很全！这也是直接选择它的原因。

这三个都可以搜到，这里只贴上使用的paddleocr的git 路径：

https://github.com/PaddlePaddle/PaddleOCR

识别代码和示例

识别图片如下（原图有图片）

OCR1.png

识别代码：

from paddleocr import PaddleOCR, draw_ocr
from PIL import Image
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `french`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='ch')  # need to run only once to download and load model into memory
img_path = 'a.png'
result = ocr.ocr(img_path, cls=True)
print(result)
for line in result:
    print(line)
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores)
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')

识别结果显示：

result.jpg