Python处理PDF文档-拆分&合并

作者: 长旅当歌 | 来源:发表于2020-12-02 13:29 被阅读0次

Python处理PDF文档-拆分&合并
Task04：Python与pdf
如何把pdf拆分合并？明白告诉你！
Java 合并、拆分PDF文档
PDF办公技巧：怎么拆分PDF文档
PDF文档的拆分方法，这个方法你了解吗？
职场达人的必备神器，文档图片在线处理，为你提高效率增加竞争力
Java 复制、压缩PDF文档
Python处理PDF文件-简译与总结
python ——PDF合并与拆分

使用Python处理PDF文档。将需要处理的PDF文档与处理程序放到一个新建的文件夹中。运行程序。

S模式，将单个PDF文档拆分到单页，并以数字命名。

image.png C模式，合并文档，需要提前将文档提前按照希望合并的顺序命名成数字，数字名称无需连续，程序是从小到大依次拼接各个PDF文档，最终输出合并成功.pdf。

image.png

通过组合使用S和C模式，在PDF中删除指定页面：先S模式拆分文档，删除不需要的文件，使用C模式合并剩余文件，即可实现PDF指定页面的删除。
通过组合使用S和C模式，在PDF中插入指定页面：先S模式拆分文档，并将需要插入的PDF文档改成前后文件之间的数字，然后使用C模式合并文档。
参考链接: https://zhuanlan.zhihu.com/p/98626155
答主写的单模块非常详细。我做了修改，将PDF文档的识别自动化了，去除了逐个输入名字的过程，对全新手会友好写。将合并和拆分集成到了一起。
exe打包了，在github:https://github.com/fangxiang0727/PDF_combine_split

image.png
百度网盘：https://link.zhihu.com/?target=https%3A//pan.baidu.com/s/1y4xZX5T4gbc3pMtBdcbZeA
提取码: 3msf

# merge and split pdf
from os import listdir, getcwd
from PyPDF2 import PdfFileReader, PdfFileWriter
def merge_pdfs(paths, output):
    pdf_writer = PdfFileWriter()
    for path in paths:
        pdf_reader = PdfFileReader(path)
        for page in range(pdf_reader.getNumPages()):
            # 把每张PDF页面加入到这个可读取对象中
            pdf_writer.addPage(pdf_reader.getPage(page))

    # 把这个已合并了的PDF文档存储起来
    with open(output, 'wb') as out:
        pdf_writer.write(out)

def list_all_pdfs():
    #将当前文件夹中所有的PDF文件枚举出来做成列表
    xlist=listdir(getcwd())
    pdflist=[]
    for ele in xlist:
        if '合并成功' not in ele and '.pdf' in ele:
            pdflist.append(ele)

    #按照数字大小将文件名字做成顺序列表，方便后续按照数字顺序逐个合并文件。
    def takeNo(elem):
        x=elem.split('.')
        return int(x[0])
    pdflist.sort(key=takeNo)#升序排列文件名称
    return pdflist

def split_pdf(path):
    pdf = PdfFileReader(path)
    for page in range(pdf.getNumPages()):
        pdf_writer = PdfFileWriter()
        pdf_writer.addPage(pdf.getPage(page))

        output = f'{page}{0}.pdf'
        with open(output, 'wb') as output_pdf:
            pdf_writer.write(output_pdf)
if __name__ == '__main__':
    mode_selection=input('模式选择<C/S>, C代表合并操作，S 代表拆分成单页：\n').upper()
    if mode_selection=='C':
        input('确保所有文件都是数字命名,程序会按照数字顺序逐个拼接PDF文档，并输出 合并成功.pdf 作为最终文档,回车确认,需要改名关掉本窗口即可')
        paths =list_all_pdfs()
        print(paths)
        merge_pdfs(paths, output='合并成功.pdf')
        input('合并完毕，回车退出')
    if mode_selection=='S':
        print('程序会将指定的pdf文件拆分到单页，并以数字命名')
        path =input('输入需要拆分的PDF文件名字，包括后缀，例如 XXX.pdf，回车确认\n')
        split(path)
        input('合并完毕，回车退出')