爬取 https://read.douban.com/provider/all 上的出版商
data:image/s3,"s3://crabby-images/a8a7e/a8a7e3a4024be15886aa42a245c495ea1cbe66f0" alt=""
网页源码格式为:
data:image/s3,"s3://crabby-images/f3df5/f3df59473a85097f3e2233c52d9741fa96cf49e9" alt=""
python 程序如下:
data:image/s3,"s3://crabby-images/6a8c6/6a8c67e32167b87bd709a4683633dc73060d5e24" alt=""
data:image/s3,"s3://crabby-images/4fe4b/4fe4bf37feb021835561b5c78d181cd21b8cc9a3" alt=""
结果如下:
data:image/s3,"s3://crabby-images/27e73/27e73acbe884072f3b578b8195d96788a63886d0" alt=""
可复制代码如下:
import re
import urllib.request
data=urllib.request.urlopen("https://read.douban.com/provider/all").read()
data=data.decode()
pat='<div class="name">(.*?)</div>
mydata=recompile(pat).findall(data)
mydata
file=open("E:/py/data/da.txt","w")
for i in range(0,len(mydata)):
file.write(mydata[i]+"\n")
file.close()
网友评论