美文网首页
Python爬取豆瓣电影250保存到mongodb

Python爬取豆瓣电影250保存到mongodb

作者: 吴强_71b2 | 来源:发表于2017-10-24 22:42 被阅读0次

Python爬取豆瓣电影250的相关信息,然后保存到mongodb。

代码如下:

import requests

from bs4 importBeautifulSoup

import pymongo

from pymongo importMongoClient

conn =MongoClient('192.168.129.150', 27017)

movie=conn['movie']

top250=movie['top250']

urls=['https://movie.douban.com/top250?start={}&filter='.format(str(i*25)) fori in range(0,10)]

defget_movieinfo(urls,data = None):

web_data=requests.get(urls)

soup =BeautifulSoup(web_data.text,'lxml')

for data in soup.select('.item'):

rank = data.select('em')[0].text

name =data.select('.info')[0].select('a')[0].text.split('\n')[1]

score =data.select('.rating_num')[0].text

link = data.select('a')[0]['href']

#nation =data.select('.info')[0].select('.bd')[0].text

director_actor =data.select('.bd')[0].text.split('\n')[2].lstrip()

time_country =data.select('.bd')[0].text.split('\n')[3].lstrip()

print(rank, name, score,link,director_actor, time_country)

movie.top250.insert_one({'rank':rank,'name':name,'score':score,'link':link,'director_actor':director_actor,'time_country':time_country})

for a in urls:

get_movieinfo(a)

mogodb查询的结果如下:

相关文章

网友评论

      本文标题:Python爬取豆瓣电影250保存到mongodb

      本文链接:https://www.haomeiwen.com/subject/fthfpxtx.html