美文网首页程序员
python爬虫--获取网页图片

python爬虫--获取网页图片

作者: chcvn | 来源:发表于2017-10-23 21:13 被阅读0次

    源码:

        
    import urllib.request
    import urllib.parse
    import os
    
    def url_open(url):
        req = urllib.request.urlopen(url)
        response = urllib.request.urlopen(url)
    
        html = response.read()
    
        return html
    
    def get_page(url):
        html = url_open(url).decode('utf-8')
    
        a = html.find('current-comment-page')+23
        b = html.find(']',a) 
    
        
        return html[a:b]
        
    def find_imgs(url):
        html = url_open(url).decode('utf-8')
        img_addrs = []
    
        a = html.find('img src=')
        
        while a!=-1:
            b = html.find('.jpg',a,a+255)
            if b!=-1:
                img_addrs.append(html[a+9:b+4])
            else:
                b = a + 9
    
            a = html.find('img src=',b)
    
        return img_addrs;
    
        
    def save_imgs(folder,img_addrs):
        
        for each in img_addrs:
            filename = each.split('/')[-1]
            if os.path.exists(filename):
                break
            else:
                with open(filename,'wb') as f:
                    img = url_open('http:'+each)
                    f.write(img)
    
    
    def download_mm(folder='ooxx',pages=10):
        #os.mkdir(folder)
    
        os.chdir(folder)
    
        url = 'https://jandan.net/ooxx/'
    
        page_num = int(get_page(url))
    
        for i in range(page_num):
            page_num -= 1
            if page_num!=0:
                page_url = url + 'page-' + str(page_num) + '#comments'
                img_addrs = find_imgs(page_url)
                save_imgs(folder,img_addrs)
    
    if __name__ == '__main__':
        download_mm()
    

    本人今天试了一下,还不错!

    注意: 必须得有Python的运行环境!
    在运行的时候,必须 在程序的下面 建一个 ooxx 的文件夹!

    相关文章

      网友评论

        本文标题:python爬虫--获取网页图片

        本文链接:https://www.haomeiwen.com/subject/ykkzuxtx.html