美文网首页
python获取pdf中的所有超链接url

python获取pdf中的所有超链接url

作者: 沫明 | 来源:发表于2021-01-12 18:42 被阅读0次
import PyPDF2
PDFFile = open("status.pdf",'rb')

PDF = PyPDF2.PdfFileReader(PDFFile)
pages = PDF.getNumPages()
key = '/Annots'
uri = '/URI'
ank = '/A'
url_list = []
for page in range(pages):
    print("Current Page: {}".format(page))
    pageSliced = PDF.getPage(page)
    pageObject = pageSliced.getObject()
    if key in pageObject.keys():
        ann = pageObject[key]
        for a in ann:
            u = a.getObject()
            if uri in u[ank].keys():
                print(u[ank][uri])
                url_list.append(u[ank][uri])

    print(len(url_list),url_list)

相关文章

网友评论

      本文标题:python获取pdf中的所有超链接url

      本文链接:https://www.haomeiwen.com/subject/pkpeaktx.html