爬表情包的步骤

一、首先是拿到网页中的图片链接

这里就是一个xpath的用法，拿到网页中的图片连接

#获取网页中图片链接（页数）

def getHtmlText(p):

url='https://www.doutula.com/photo/list/?page='+str(p)

res=requests.get(url,headers=headers)

res.encoding='utf-8'

html=res.text

h=etree.HTML(html)

right=h.xpath('//div[@class="page-content text-center"]/div/a/img/@data-original')

return right

二、然后对图片链接解析

下载图片，之前我是用的 urllib.urlretrieve(),后来发现reqests好用一些，这里有一个功能os.path.splitex()分离文件名与扩展名,我用来把图片连接的后缀名保存成图片的后缀名

#下载图片（图片链接）

def biaoqing(url):

response = requests.get(url)

p=response.content#返回二进制格式

name = url[-9:-4]

try:

path=Docment()#创建文件夹

#取链接后缀名

file_suffix = os.path.splitext(url)[1]

file_image=path+'/'+name+file_suffix#保存的图片名称

f = open(file_image, 'ab')#保存

f.write(p)

f.close()

except FileNotFoundError:

print('储存文件失败')

3，保存到文件夹中

这里用的最多的就是os模块了，os.getcwd()#当前工作目录

os.path.join()连接目录与文件名或目录

os.path.isdir()判断name是不是一个目录，name不是目录就返回false

os.makedirs(()创建文件

#创建文件夹

def Docment():

_path = os.getcwd()#当前工作目录

new_path = os.path.join(_path ,'image')#连接当前工作目录

if not os.path.isdir(new_path):#如果没有这个目录就创建

os.makedirs(new_path)

return new_path