python 六节课 爬虫 1-3
https://www.jianshu.com/p/645c731c5422
python 六节课 爬虫 4-6
https://www.jianshu.com/p/b3003cbcdf92
上面的例子用的是requests,然后现在用urllib,其实路子一样
https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001432688314740a0aed473a39f47b09c8c7274c9ab6aee000
主要是练习xpath和lxml
from urllib import request,parse
from lxml import etree
with request.urlopen('http://oabt004.com/index/index/k/%E7%BA%B8%E7%89%8C%E5%B1%8B/p/2') as f:
data = f.read().decode('utf-8')
#print(etree.HTML(data).xpath("//li/@data-ed2k"))
for element in etree.HTML(data).xpath("//li/@data-ed2k"):
print(parse.unquote(element))
结果

网友评论