【request爬虫】批量爬取转录组小工具库

作者: Geekero | 来源:发表于2020-06-28 13:57 被阅读0次

【request爬虫】批量爬取转录组小工具库
快速获取功能相关基因
Python·爬取当当网图书信息
用python做爬虫
【request爬虫3】批量爬取Cell Blast
爬取中华诗词网所有诗词
Python爬虫简谱网-简谱
python-爬虫学习（文字、图片、视频）
python爬虫学习（文字、图片、视频）
python爬虫：用selenium控制浏览器，爬取蛋壳公寓租房

目的：

用官网的方法同步下载太慢，于是直接将下载链接都爬取下来，然后用迅雷下载

import requests
from pyquery import PyQuery as pq
import time
import re

headers = {
    'Accept': 'application/json, text/plain, */*',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,ja;q=0.7',
    'Connection': 'keep-alive',
    'Host': 'bioinfo.life.hust.edu.cn',
    'Referer': 'http://bioinfo.life.hust.edu.cn/AnimalTFDB/',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36'
}

url='http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/'
res = requests.get(url=url, headers=headers)
print(res.status_code)

jpy = pq(res.text)
items  = jpy('body > pre:nth-child(2) > a').items()
urls = list()
for item in items:
    url = item.attr('href')
    url = 'http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/' + url + '\n'
    print(url)
    urls.append(url)
    
with open('./tools_urls.txt', 'w') as f:
    f.writelines(urls)

网友评论

本文标题：【request爬虫】批量爬取转录组小工具库

本文链接：https://www.haomeiwen.com/subject/iutwfktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

【request爬虫】批量爬取转录组小工具库

目的：

相关文章

【request爬虫】批量爬取转录组小工具库

快速获取功能相关基因

Python·爬取当当网图书信息

用python做爬虫

【request爬虫3】批量爬取Cell Blast

爬取中华诗词网所有诗词

Python爬虫简谱网-简谱

python-爬虫学习（文字、图片、视频）

python爬虫学习（文字、图片、视频）

python爬虫：用selenium控制浏览器，爬取蛋壳公寓租房

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读