美文网首页Python
Python: 爬取one上的文章,发送到自己的Kindle

Python: 爬取one上的文章,发送到自己的Kindle

作者: bluescorpio | 来源:发表于2016-09-23 23:28 被阅读365次

    我一直喜欢用kindle看书,最近在爬取网页的时候,忽然冒出一个念头,为什么不能把自己爬到的文章发送到自己的Kindle上去呢,这样自己有时间就可以把这些好文章读一读了。

    关于如何配置这个邮箱,网上有很多文章,自己可以搜一下。

    这个项目的主要思路是

    1. 使用requests和beautifulsoup把文章爬取到本地
    #!usr/bin/env
    # -*-coding:utf-8 -*-
    import requests
    from bs4 import BeautifulSoup as BS
    import os
    import codecs
    import smtplib
    import datetime
    
    from email.mime.text import MIMEText
    from email.mime.multipart import MIMEMultipart
    
    sub_folder = os.path.join(os.getcwd(), "one")
    if not os.path.exists(sub_folder):
        os.mkdir(sub_folder)
    
    one_url = "http://wufazhuce.com/article/"
    
    for i in range(2000):
        url = one_url + str(i + 1)
        r = requests.get(url)
        if r.status_code == 200:
            print url
            soup = BS(r.text, "lxml")
            title = soup.select('div.tab-content > div.one-articulo > h2')
            # print title
            print "标题: ", title[0].get_text().strip()
            # file_name = title[0].get_text().strip() + ".txt"
            file_name = str(i + 1) + ".txt"
    
            content = soup.select('div.articulo-contenido')
    
            filename = sub_folder + "/" + file_name
            # print filename
            # print os.path.join(sub_folder, file_name)
    
            print "start writing content into file", file_name
            f = codecs.open(filename, "a", "utf-8")
            f.write(content[0].get_text())
            f.close()
            print "finish writing into file \n"
    
    
    1. 使用smtplib模块依次把文章发送给自己的kindle邮箱,注意,发送最多只能依次发送25个附件
    sender = "youremailid@163.com"
    sender_password = raw_input("Please input your password: ")
    receiver = 'your_kindle_email_id@kindle.cn'
    msg = MIMEMultipart('alternative')
    msg['Subject'] = "convert" + str(datetime.datetime.now())
    msg['From'] = sender
    msg['To'] = "Your Kindle" + "<" + receiver + ">"
    att = MIMEText(open(os.path.join(sub_folder, file_name), 'rb').read(), 'base64', 'utf-8')
    att["Content-Type"] = 'application/octet-stream'
    att["Content-Disposition"] = 'attachment; filename="%s"' % file_name
    msg.attach(att)
    try:
        server = smtplib.SMTP()
        server.connect('smtp.163.com')
        print "start login"
        server.login(sender, sender_password)
        print "start sending email"
        server.sendmail(sender, receiver, msg.as_string())
        server.quit()
        print "success sending email \n"
    except Exception, e:
         print str(e)
    
    1. 下一步计划,继续爬取四大名著的每一章节

    相关文章

      网友评论

        本文标题:Python: 爬取one上的文章,发送到自己的Kindle

        本文链接:https://www.haomeiwen.com/subject/imqxyttx.html