美文网首页
python千与千寻下载器 JS动态加载分析

python千与千寻下载器 JS动态加载分析

作者: 笨鸡 | 来源:发表于2019-04-28 21:19 被阅读0次

    1.分析网站

    target_url:http://www.dilidili.name/watch3/32061/
    目标数据源:

    千与千寻.png

    2.开个多进程爬取资源

    from multiprocessing import Pool
    from functools import reduce
    import requests
    import os
    
    # 安装python, 安装requests,就可以愉快的开车啦!
    
    path = '千与千寻/'
    
    start = 0
    
    length = 1866
    
    speed = 20
    
    if not os.path.exists(path):
        os.mkdir(path)
    
    def get_content(iter):
        for i in iter:
            try:
                if i < 10:
                    url = 'http://kbzy.zxziyuan-yun.com/20180404/XVXMBPEV/800kb/hls/2Tb76g334100{}.ts'.format(i)
                elif i < 100:
                    url = 'http://kbzy.zxziyuan-yun.com/20180404/XVXMBPEV/800kb/hls/2Tb76g33410{}.ts'.format(i)
                else:
                    url = 'http://kbzy.zxziyuan-yun.com/20180404/XVXMBPEV/800kb/hls/2Tb76g3341{}.ts'.format(i)
                r = requests.get(url)
                if b'<h1>\xb7\xfe\xce\xf1\xc6\xf7\xb4\xed\xce\xf3</h1>' not in r.content:
                    with open(path + '千与千寻{}.ts'.format(i), 'wb') as f:
                        f.write(r.content)
                        print('加载完成', i)
            except:
                print('下载完成...')
    
    
    def open_file(x):
        with open(x, 'rb') as f:
            return f.read()
    
    
    def start_task():
        lst = list(range(start, length))
        result = [lst[x:x + int((len(lst) / speed))] for x in range(len(lst)) if x % (len(lst) / speed) == 0]
        pool = Pool(speed)
        for target in result:
            pool.apply_async(get_content, args=(target,))
        pool.close()
        pool.join()
        print('下载完成...')
        return True
    
    
    if __name__ == '__main__':
        if start_task():
            # 视频拼接
            lst = [path + '千与千寻{}.ts'.format(x) for x in range(length)]
            source = list(map(open_file, lst))
            z = reduce(lambda x, y: x + y, source)
            with open('千与千寻.ts', 'wb') as f:
                f.write(z)
    
    

    优化可以把地址做个缓存池避免重复爬取,但我懒。。。

    成果:http://59.110.157.193:8000/media/video/%E5%8D%83%E4%B8%8E%E5%8D%83%E5%AF%BB.ts

    相关文章

      网友评论

          本文标题:python千与千寻下载器 JS动态加载分析

          本文链接:https://www.haomeiwen.com/subject/nchsnqtx.html