前言

沉迷农药，抖音，总归是不好的。在要吃苦的年纪里选择安逸，那么老了一定会后悔，埋怨年轻的时候不知道努力的自己。于是又找出了尘封已久的kindle，打算好好看看书，给自己充充电。

Amazon上好书要钱，readfree上充斥着文学类的书，思来想去，找到自己喜欢的内容，自己制作电子书来看岂不是一个更好的选择。而且内容完全由自己来定，把每天搜集到的不错的网页制成PDF发到kindle上，下班回去就可以拿出kindle好好看了。

说搞就搞，网上搜了搜大致的实现，对比了各种语言的实现方式，发现了wkhtmltopdf和pdfkit最为舒适，人生苦短，那就它们吧。

安装wkhtmltopdf

Mac用户可以参考下面的链接进行安装；其他平台类似
http://macappstore.org/wkhtmltopdf/

Press Command+Space and type Terminal and press enter/return key.
Run in Terminal app:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null ; brew install caskroom/cask/brew-cask 2> /dev/null
and press enter/return key.
If the screen prompts you to enter a password, please enter your Mac's user password to continue. When you type the password, it won't be displayed on screen, but the system would accept it. So just type your password and press ENTER/RETURN key. Then wait for the command to finish.
Run:

brew cask install wkhtmltopdf

pdfkit

网上轮子很多，这里不过多叙述，放上我参考的一篇博客园博主的优秀文章，可以说很是简洁明了。

pip install pdfkit

https://www.cnblogs.com/linzenews/p/6972192.html

实现

#coding: utf8
import requests
import pdfkit
from bs4 import BeautifulSoup

# 生成HTML的文件时，需要将图片路径替换为本地的绝对路径，生成PDF之后再进行删除操作
def gethtml(url):
    headers = {
        "Referer": "https://blog.csdn.net/marksinoberg",
        "Host": "blog.csdn.net",
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36"
    }
    resp = requests.get("https://blog.csdn.net/Marksinoberg/article/details/82700073", headers=headers)
    soup = BeautifulSoup(resp.text, "html.parser")
    content = soup.find("div", {"id": "article_content"})
    return soup.title.string, str(content)

def savepdf(filename, title):
    options = {
        "page-size": "Letter",
        "encoding": "UTF-8",
        "custom-header": [
            ("Accept-Encoding", "gzip")
        ]
    }
    pdfkit.from_file(filename, "./{}.pdf".format(title), options=options)

if __name__ == "__main__":
    title, html = gethtml('')
    # print(html)
    print(title)
    exit()
    with open("./test.html", "w") as file:
         file.write(html)
         file.close()
    savepdf("./test.html", title)

执行脚本

$ python testpdf.py
Go+PHP实现敏感词检测 - CSDN博客
Loading pages (1/6)
Counting pages (2/6)                                               
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)
Done                                                                      
$

查看生成效果
[图片上传失败...(image-6462dc-1537459095648)]