美文网首页
grabseqs——批量下载sra数据并直接转换为fastq文件

grabseqs——批量下载sra数据并直接转换为fastq文件

作者: iBioinformatics | 来源:发表于2022-10-04 07:45 被阅读0次

    grabseqs——批量下载sra数据并直接转换为fastq文件的工具 - 简书 (jianshu.com)

    文章:grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories | Bioinformatics | Oxford Academic (oup.com)
    GitHub:louiejtaylor/grabseqs: A utility for easy downloading of reads from next-gen sequencing repositories like NCBI SRA (github.com)

    image image

    grabseqs是一个可以从NCBI SRA, MG-RAST和iMicrobe数据库批量下载数据的工具,2020年发表在Bioinformatics 杂志,可下载sra数据并直接转换为fastq文件
    其转化依赖于fasterq-dump或fastq-dump,因此安装前注意要下载sra-tools:conda install -c bioconda sra-tools
    还要注意其他依赖条件有python3环境、sra-tools版本大于2.9、pigz和wget

    image

    1 下载安装

    conda安装:

    conda install grabseqs -c louiejtaylor -c bioconda -c conda-forge
    
    

    或者pip安装:

    pip install grabseqs
    
    

    2 使用

    2.1 详尽参数:

    grabseqs sra [-h] [-m METADATA] [-o OUTDIR] [-r RETRIES] [-t THREADS]
                 [-f] [-l] [--no_parsing] [--parse_run_ids]
                 [--use_fastq_dump]
                 id [id ...]
    
    positional arguments:
      id                One or more BioProject, ERR/SRR or ERP/SRP number(s)
    
    optional arguments:
      -h, --help        show this help message and exit
      -m METADATA       filename in which to save SRA metadata (.csv format,
                        relative to OUTDIR)
      -o OUTDIR         directory in which to save output. created if it doesn't
                        exist
      -r RETRIES        number of times to retry download
      -t THREADS        threads to use (for fasterq-dump/pigz)
      -f                force re-download of files
      -l                list (but do not download) samples to be grabbed
      --parse_run_ids   parse SRR/ERR identifers (do not pass straight to fasterq-
                        dump)
      --custom_fqdump_args CUSTOM_FQD_ARGS
                        "string" containing args to pass to fastq-dump
      --use_fastq_dump  use legacy fastq-dump instead of fasterq-dump (no
                        multithreaded downloading)
    
    

    2.2 示例如下:

    • 使用10个线程,保存数据到proj/metadata.csv,下载到文件夹 proj/,下载失败重试的次数为3,从SRP#######获取所有样本
    # use 10 threads, save metadata to proj/metadata.csv, download to the dir proj/, retry failed downloads 3x, get all samples from SRP#######)
    grabseqs sra -t 10 -m metadata.csv -o proj/ -r 3 SRP*********
    
    
    • 如果想将参数传递给fastq -dump获取数据,可以这样做:
    # If you'd like to pass your own arguments to fasterq-dump to get data in a slightly different format, you can do so like this
    grabseqs sra SRP*******  -r 0 --custom_fqdump_args="--split-spot --progress"
    
    

    其他常用命令的简单示例:

    • 从单个SRA项目下载所有样本
    #Download all samples from a single SRA Project:
    grabseqs sra SRP********
    
    
    • 或者结合其他各类项目一起下载
    #Or any combination of projects (S/ERP), runs (S/ERR), BioProjects (PRJNA):
    grabseqs sra SRR******** ERP******** PRJNA******** ERR********
    
    
    • 只想获取样本编号的话使用 -l 参数
    #If you'd like to do a dry run and just get a list of samples that will be downloaded, pass -l:
    grabseqs sra -l SRP********
    
    
    • 从 MG-RAST、iMicrobe数据库下载数据也是类似用法,(样本编号前加“s”,项目编号前加“p”)
    #Similar syntax works for MG-RAST:
    grabseqs mgrast mgp****** mgm*******
    
    #And iMicrobe (prefixing the sample numbers with "s" and project numbers with "p"):
    grabseqs imicrobe p4 s3
    

    相关文章

      网友评论

          本文标题:grabseqs——批量下载sra数据并直接转换为fastq文件

          本文链接:https://www.haomeiwen.com/subject/veleartx.html