美文网首页
爬虫爬取豆瓣top250

爬虫爬取豆瓣top250

作者: David5262 | 来源:发表于2019-11-07 11:57 被阅读0次

    爬虫爬取豆瓣top250并保存到mongoDB数据库中

    import requests
    from lxml import etree
    import pymongo
    import time
    
    class DouBan:
        def getUrl(self,url,):
            try:
                for page in range(10):
                    url = 'https://movie.douban.com/top250?start=' + str(page * 25) + '&filter='
                    r = requests.get(url)
                    r.raise_for_status()
                    r.encoding = r.apparent_encoding
                    re = etree.HTML(r.text)
                    title = re.xpath('//div[@class="hd"]/a/span[@class="title"][1]//text()')
                    href = re.xpath('//div[@class="hd"]/a/@href')
                    for i in range(len(title)):
                        data1.insert_one({'影名': title[i], '链接': href[i]})
                        time.sleep(0.1)
            except Exception as e:
                print(e)
    
    if __name__ == '__main__':
        # MongoDB的连接
        client = pymongo.MongoClient('localhost', 27017)
        data = client['douban']
        data1 = data['db']
        url = 'https://movie.douban.com/top250'
        douban = DouBan()
        douban.getUrl(url)
    

    相关文章

      网友评论

          本文标题:爬虫爬取豆瓣top250

          本文链接:https://www.haomeiwen.com/subject/gznmbctx.html