python freetype and tesseract

作者: 王国的荣耀 | 来源:发表于2021-08-19 20:25 被阅读0次

python freetype and tesseract
2、how to extract text from image
Tesseract Ocr文字识别
pytesseract识别验证码教程
tesseract安装说明
交叉编译freetype
font的术语
windows在vs2017编译使用FreeType2.10.1
OCR——tesseract-ocr
Python在Linux下打包OpenCV库

freetype

使用freetype-py，抽取文字所对应的图片，保存成为图片等。

# -*- coding: utf-8 -*-
# -----------------------------------------------------------------------------
#
#  pycairo/cairocffi-based FreeType example - Copyright 2017 Hin-Tak Leung
#  Distributed under the terms of the new BSD license.
#
#  rewrite of the numply,matplotlib-based example from Nicolas P. Rougier
#
# -----------------------------------------------------------------------------
#
# Direct translation of example 1 from the freetype tutorial:
# http://www.freetype.org/freetype2/docs/tutorial/step1.html
#
# Except we uses FreeType's own trigonometric functions instead of those
# from the system/python's math library.


from cairo import Context, ImageSurface, FORMAT_A8
from bitmap_to_surface import make_image_surface

from freetype.raw import *
from PIL import Image

WIDTH, HEIGHT = 640, 480
image = ImageSurface(FORMAT_A8, WIDTH, HEIGHT)
ctx = Context(image)

def to_c_str(text):
    ''' Convert python strings to null terminated c strings. '''
    cStr = create_string_buffer(text.encode(encoding='UTF-8'))
    return cast(pointer(cStr), POINTER(c_char))

def draw_bitmap( bitmap, x, y):
    global image, ctx
    # cairo does not like zero-width surface
    if (bitmap.width > 0):
        glyph_surface = make_image_surface(bitmap)
        ctx.set_source_surface(glyph_surface, x, y)
        ctx.paint()

def main():

    library = FT_Library()
    matrix  = FT_Matrix()
    face    = FT_Face()
    pen     = FT_Vector()
    filename= 'Vera.ttf'
    text    = 'Hello World !'
    num_chars = len(text)
    # FT_Angle is a 16.16 fixed-point value expressed in degrees.
    angle   = FT_Angle(25 * 65536)

    # initialize library, error handling omitted
    error = FT_Init_FreeType( byref(library) )

    # create face object, error handling omitted
    error = FT_New_Face( library, to_c_str(filename), 0, byref(face) )


    # set character size: 50pt at 100dpi, error handling omitted
    error = FT_Set_Char_Size( face, 50 * 64, 0, 100, 0 )
    slot = face.contents.glyph

    # set up matrix
    matrix.xx = FT_Cos( angle )
    matrix.xy = - FT_Sin( angle )
    matrix.yx = FT_Sin( angle )
    matrix.yy = FT_Cos( angle )

    # the pen position in 26.6 cartesian space coordinates; */
    # start at (300,200) relative to the upper left corner  */
    pen.x = 200 * 64;
    pen.y = ( HEIGHT - 300 ) * 64

    for n in range(num_chars):
        # set transformation
        FT_Set_Transform( face, byref(matrix), byref(pen) )

        # load glyph image into the slot (erase previous one)
        charcode = ord(text[n])
        index = FT_Get_Char_Index( face, charcode )
        FT_Load_Glyph( face, index, FT_LOAD_RENDER )

        # now, draw to our target surface (convert position)
        draw_bitmap( slot.contents.bitmap,
                     slot.contents.bitmap_left,
                     HEIGHT - slot.contents.bitmap_top )

        # increment pen position
        pen.x += slot.contents.advance.x
        pen.y += slot.contents.advance.y

    FT_Done_Face(face)
    FT_Done_FreeType(library)

    image.flush()
    image.write_to_png("example_1-cairo.png")
    Image.open("example_1-cairo.png").show()


if __name__ == '__main__':
    main()

tesseract

tesseract 安装

tesseract is not installed or it's not in your PATH. See README file for
解决：安装对应的tesseract库

➜  ~ brew install tesseract
==> Downloading https://mirrors.aliyun.com/homebrew/homebrew-bottles/tesseract-4.1.1.big_sur.bottle.tar.gz
######################################################################## 100.0%
==> Pouring tesseract-4.1.1.big_sur.bottle.tar.gz
==> Caveats
This formula contains only the "eng", "osd", and "snum" language data files.
If you need any other supported languages, run `brew install tesseract-lang`.
==> Summary
🍺  /usr/local/Cellar/tesseract/4.1.1: 65 files, 29.7MB
➜  ~ tesseract -v
tesseract 4.1.1
 leptonica-1.80.0
  libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.8 : libwebp 1.2.0 : libopenjp2 2.4.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
➜  ~

pytesseract psm 选项参数

单一文字识别出现问题

0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line

text = pytesseract.image_to_string(image, lang='eng', boxes=False, \
config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
print(text)

linux

sudo yum install tesseract

mac

brew install tesseract

➜  ~ tesseract -v
tesseract 4.1.1
 leptonica-1.80.0
  libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.8 : libwebp 1.2.0 : libopenjp2 2.4.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE

网友评论

本文标题：python freetype and tesseract

本文链接：https://www.haomeiwen.com/subject/vcjmbltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python freetype and tesseract

freetype

tesseract

tesseract 安装

pytesseract psm 选项参数

相关文章

python freetype and tesseract

2、how to extract text from image

Tesseract Ocr文字识别

pytesseract识别验证码教程

tesseract安装说明

交叉编译freetype

font的术语

windows在vs2017编译使用FreeType2.10.1

OCR——tesseract-ocr

Python在Linux下打包OpenCV库

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读