知识点:
1、csv文件的保存
2、requests的text content方法区别
3、xpath的使用
最受关注的读书排行榜,分为两类,虚构和非虚构类。
代码从拷贝的,简单修改一下。
import requests
from lxml import etree
import csv
fp = open('d:\\豆瓣.csv','wt',newline='')
writer= csv.writer(fp)
writer.writerow(('name','days','author','date','publisher','price','booktype','point','comment_num'))
url='https://book.douban.com/'
headers={
'Accept':'*/*',
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'zh-CN,zh;q=0.8',
'Connection':'keep-alive',
'Origin':'https://book.douban.com',
'Referer':'https://book.douban.com/',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0'
}
html = requests.get(url,headers=headers).content
sel= etree.HTML(html)
fiction = sel.xpath('//div[@class="section popular-books"]/div/h2/span/a/@href')[0]
non_fiction = sel.xpath('//div[@class="section popular-books"]/div/h2/span/a/@href')[1]
print(fiction,non_fiction)
colls=[]
colls.append(url+fiction)
colls.append(url + non_fiction)
for coll in colls:
html1 = requests.get(coll).content
sel = etree.HTML(html1)
infos = sel.xpath('//ul[@class="chart-dashed-list"]/li/div[@class="media__body"]')
print(len(infos))
for info in infos:
name = info.xpath('h2/a/text()')[0].strip()
days = info.xpath('h2/span/text()')[0].strip()
bookinfo = info.xpath('p[@class="subject-abstract color-gray"]/text()')[0].strip().split("/")
author = bookinfo[0]
date = bookinfo[1]
publisher = bookinfo[2]
price = bookinfo[3]
booktype = bookinfo[4]
point = info.xpath('p[@class="clearfix w250"]/span[2]/text()')[0].strip()
comment_num = info.xpath('p[@class="clearfix w250"]/span[3]/text()')[0].strip()
print(name,days,author,date,publisher,price,booktype,point,comment_num)
writer.writerow((name,days,author,date,publisher,price,booktype,point,comment_num))
fp.close()
主要是写csv及保存方法。
Paste_Image.png另外注意:
requests的text返回的是Unicode型的数据。
requests的content返回的是bytes型也就是二进制的数据。
也就是说,如果你想取文本,可以通过r.text。
如果想取图片,文件,则可以通过r.content。
requests的json()返回的是json格式数据
下面保存图片的代码,则必须 用content方法:
import requests
jpg_url = 'http:https://img.haomeiwen.com/i2744623/55f59803c7aa7301.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240'
content = requests.get(jpg_url).content
with open('demo.jpg', 'wb') as fp:
fp.write(content)
网友评论