美文网首页Python爬虫开发部落爬虫专题Python爬虫作业
python爬虫实战教程:下载某奶茶加盟品牌产品图片

python爬虫实战教程:下载某奶茶加盟品牌产品图片

作者: chengcxy | 来源:发表于2017-12-07 00:39 被阅读147次

    最近一直在写web的东西,想自己完成一个购物系统的demo,产品分类要一些图片,没有美学细胞的看到处理图片头就大,只好找个相关的网站爬一下,根据产品分类创建目录,以图片显示的alt属性命名图片,保存为jpg格式.

    一.代码

    import os
    import shutil
    import requests
    from lxml import etree
    
    
    BASEDIR = os.path.dirname(__file__)
    
    
    class AshuiSpider(object):
        #产品菜单 url
        def parse_product(self):
            url = 'http://www.ashui1998.com/product/'
            html = requests.get(url).text
            selector = etree.HTML(html)
            infos = selector.xpath('//div[@class="nr"]/h4/a')
            for info in infos:
                product_name = info.xpath('text()')[0]
                product_url = info.xpath('@href')[0]
                product_item = {}
                product_item['product_name'] = product_name
                product_item['product_url'] = product_url
                yield product_item
    
        #产品菜单下的图片 url 名称 要写入的路径
        def parse_pic(self, product_items):
            for product_item in product_items:
                # 创建存放路径
                picture_dir = product_item['product_name']
                os.chdir(BASEDIR)
                if os.path.exists(picture_dir):
                    #不论目录是否为空都删除
                    shutil.rmtree(picture_dir)
                os.mkdir(picture_dir)
                # 解析图片地址
                product_url = product_item['product_url']
                html = requests.get(product_url).text
                selector = etree.HTML(html)
                pics = selector.xpath('//div[@class="pro_main"]/dl[@class="pd_index_dl"]/dt/a')
                for pic in pics:
                    pic_url = pic.xpath('img/@src')[0]
                    pic_name = pic.xpath('img/@alt')[0]
                    pic_item = {}
                    pic_item['pic_url'] = pic_url
                    pic_item['pic_name'] = pic_name
                    pic_item['file'] = os.path.join(BASEDIR, picture_dir, pic_name+'.jpg')
                    pic_content = requests.get(pic_url).content
                    pic_item['pic_content'] = pic_content
                    yield pic_item
    
        def run(self):
            product_items = self.parse_product()
            pic_items = self.parse_pic(product_items)
            for pic_item in pic_items:
                with open(pic_item['file'],'wb') as pic:
                    pic.write(pic_item['pic_content'])
    
    if __name__ == '__main__':
        ashui = AshuiSpider()
        ashui.run()
    

    二.图片下载保存

    奶茶图.png

    三.号外

    新搭的微信公众号pythonfan,接入了图灵机器人,图片 语音 都支持回复...


    image.png

    本文git地址:https://github.com/chengcxy/spiders/blob/master/load_pics.py
    喜欢的给个star,土豪的来个微信打赏~~~

    image.png

    相关文章

      网友评论

        本文标题:python爬虫实战教程:下载某奶茶加盟品牌产品图片

        本文链接:https://www.haomeiwen.com/subject/ifyrixtx.html