美文网首页
提取pdf中可能的图片

提取pdf中可能的图片

作者: 饭桶2018 | 来源:发表于2019-06-03 10:19 被阅读0次
import re

input_pdf = 'Ch05-2006.pdf'
output_base = input_pdf.replace('-2006','').split('.')[0]

with open(input_pdf,'rb') as f:
    pdf = f.read()

jpg_pattern = re.compile(rb'\xff\xd8.*?\xff\xd9\x0a',re.DOTALL)
png_pattern = re.compile(rb'\x89\x50\x4e\x47.*?\xae\x42\x60\x82',re.DOTALL)
jpgs = jpg_pattern.findall(pdf)
pngs = png_pattern.findall(pdf)

jpgn = jpgs.__len__()
pngn = pngs.__len__()
print('Find {} jpg and {} png in {}'.format(jpgn,pngn,input_pdf))

if jpgn:
    for i,jpg in enumerate(jpgs):
        output_jpg = '{}-{}.jpg'.format(output_base,str(i + 1).zfill(3))
        print('  Export {}'.format(output_jpg))
        with open(output_jpg,'wb') as f:
            f.write(jpg)
if pngn:
    for i,pngn in enumerate(pngn):
        output_png = '{}-{}.png'.format(output_base,str(i + 1).zfill(3))
        print('  Export {}'.format(output_png))
        with open(output_jpg,'wb') as f:
            f.write(jpg)

相关文章

网友评论

      本文标题:提取pdf中可能的图片

      本文链接:https://www.haomeiwen.com/subject/iyfizqtx.html