抓取微信公众号文章中的图片

作者: zqlmmd | 来源:发表于2018-05-23 12:16 被阅读0次

抓取微信公众号文章中的图片
开发了一个导出公众号文章的小工具
微信公众号爬虫
抓取微信公众号文章
微信公众号文章：音频、视频、图片一键保存
如何批量抓取微信公众号历史所有文章的链接？
Python抓取微信公众号文章
2018-02-28
Python抓取微信公众号全部文章
【零基础学爬虫】用代理抓取微信文章

最近在写微信公众号文章, 为了收集图片素材,就想做一个小工具,给一个url链接, 就可以自动下载链接中的图片.
用到的就是Python,requests库和BeautifulSoup.

理清思路

image
分析文章源码
以文章https://mp.weixin.qq.com/s?__biz=MjM5NDA5NDcyMA==&mid=2651695447&idx=1&sn=d538d53d3ec52fea83b73564f5aaead0&chksm=bd7402b88a038bae930946b1afea11acccbd0d1cd9c5dbb9ed922fcc9d224d0560171679507b&scene=0#rd 为例,
在chrome中打开链接, 按快捷键 alt+command+I 打开开发者模式, 查看网页源码.
image
获取img的src
可以看到, 图片都是使用img标签,使用BeautifulSoup的find_all()方法获取,图片的资源是对应的 data-src 字段.

        r = requests.get(url)
        r.raise_for_status()
        soup = BeautifulSoup(r.text, 'html.parser')
        imgs = soup.find_all('img', attrs={'data-copyright': 0})
        if imgs is not None:
            for img in imgs:
                print(img['data-src'])
                url_list.append(img['data-src'])
        return url_list

下载图片到本地
下载的路径path 通过os.getcwd()获取
图片的内容通过request.get(pic_url).content
图片的命名
保存通过open()和write()

path = os.getcwd()
    try:
        pic_name = url.split("/")[-2]
        fmt = url.split("=")[-1]
        resp = requests.get(url).content
        with open(path + '/' + pic_name+'.'+fmt, "wb+") as f:
            f.write(resp)
    except Exception as reason:
        print(str(reason))