Python实战之多pdf文件合成

作者: SelectMan | 来源:发表于2018-07-08 01:11 被阅读0次

Python实战之多pdf文件合成
《Python数据分析与挖掘实战.pdf》PDF高清完整版-免费
分享《Python数据可视化编程实战》高清中文版PDF英文版PD
《Python数据可视化编程实战》中文版PDF+英文版PDF+源
《Python爬虫开发与项目实战》.pdf 免费下载
Python与量化投资
Python ☞ day 10
2017.6.20
《Python 3网络爬虫开发实战》高清中文版PDF+源代码免费
Python txt文件合并

选择君作为一个嵌入式工程师，受够了C语言的各种“低级”。趁着潮流，想学学最近大火的python，期望来年能找个好工作~。为了增加乐趣期间，还是以实战为主。

选择君每个月都要报销餐补，也就有很多电子发票要打印。每到此时，就是一件特别让人恼火的事情。尤其有一次女朋友给找了60几张 60块钱的电子发票。那感觉。。。（当然~是把这些用软件合在一起，然后打印出来哒，一张张点，不存在的）。用软件到时可以实现，但如果能脚本一件生成，那便是再好不过的事情了。

那就让我们来实现吧：

python 之所以省力，是因为有很多底层封装好的module可以使用。要操作pdf，当然要import一个pdf的module：PyPDF2

import PyPDF2,os

1. 安装 PyPDF2

pip install PyPDF2

2. 引用+生成路径下pdf文件列表

import PyPDF2，os

#1. get all pdf in current loaction

pdfs = []

for filename in os.listdir('.'): #在路径下所有文件中，查找 pdf 文件

if filename.endswith('.pdf'):

pdfs.append(filename)

if pdfs == []:

print ("no pdf in this file, please double check")

pdfs.sort(key=str.lower)

3. 调用 PyPDF2.PdfFileWriter(), 用来放合成之后的pdf

pdfWriter = PyPDF2.PdfFileWriter()

4. 调用 PyPDF2.PdfFileRead(), 用来读取所有pdf中的内容，并合成到 pdfWriter中：

fileCount = 0

pageCount = 0

for filename in pdfs:

pdfFileObj = open(filename,'rb')

print("file:"+filename+" combining")

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

fileCount += 1

#3.1 get all pages from read file, and then combine to one

for pageNum in range(0,pdfReader.numPages):

pageObj = pdfReader.getPage(pageNum)

pdfWriter.addPage(pageObj)

pageCount += 1

#pdfFileObj.close()

print("combine sucess: %d files, %d pages combined"%(fileCount,pageCount))

5. 将合成的pdf 写入文件中，并将之前打开的文件关闭掉

注：选择君本来是打算在第4部关闭之前打开的pdf的，但如果那时候打开，最后合成的文件会是一个由空白页构成的pdf。想了一下，合成的时候，应该只是将“指针”传给了 pdfWriter。在pdfWriter写入磁盘之前，如果关闭文件，就找不到内容了。所以，我们在合成文件写入磁盘之后，再关闭之

#4 save the write page

pdfOutput = open('result1.pdf','wb')

pdfWriter.write(pdfOutput)

pdfOutput.close()

#5 close other pdf files

for filename in pdfs:

pdfFileObj = open(filename,'rb')

pdfFileObj.close()

OK，这样就可以将目录下的所有文件，合成出了一个 “result1.pdf”啦

整体的源代码如下：

#! python3

# combine multiple pdf file to one

import PyPDF2, os

#1. get all pdf in current loaction

pdfs = []

for filename in os.listdir('.'):

if filename.endswith('.pdf'):

pdfs.append(filename)

if pdfs == []:

print ("no pdf in this file, please double check")

pdfs.sort(key=str.lower)

#2. create a empty pdf as the dest pdf

pdfWriter = PyPDF2.PdfFileWriter()

#3. read all pdfs, and combine to 1

fileCount = 0

pageCount = 0

for filename in pdfs:

pdfFileObj = open(filename,'rb')

print("file:"+filename+" combining")

pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

fileCount += 1

#3.1 get all pages from read file, and then combine to one

for pageNum in range(0,pdfReader.numPages):

pageObj = pdfReader.getPage(pageNum)

pdfWriter.addPage(pageObj)

pageCount += 1

#pdfFileObj.close()

print("combine sucess: %d files, %d pages combined"%(fileCount,pageCount))

#4 save the write page

pdfOutput = open('result1.pdf','wb')

pdfWriter.write(pdfOutput)

pdfOutput.close()

#5 close other pdf files

for filename in pdfs:

pdfFileObj = open(filename,'rb')

pdfFileObj.close()

网友评论

本文标题：Python实战之多pdf文件合成

本文链接：https://www.haomeiwen.com/subject/sppnuftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python实战之多pdf文件合成

相关文章