美文网首页
从NCBI、ENA等公共数据库下载数据-kingfisher

从NCBI、ENA等公共数据库下载数据-kingfisher

作者: 重拾生活信心 | 来源:发表于2023-11-14 09:44 被阅读0次

    Kingfisher

    Kingfisher(翠鸟?) is a fast and flexible program for procurement of sequence files (and their metadata annotations) from public data sources, including the European Nucleotide Archive (ENA), NCBI SRA, Amazon AWS and Google Cloud. It's input is one or more "Run" accessions e.g. DRR001970, or a BioProject accessions e.g. PRJNA621514 or SRP260223.

    简介

    Kingfisher 有两个模式:

    • get
      在get子命令中,kingfisher从一系列源下载数据,并按顺序进行尝试,直到一个成功为止。然后根据需要将下载的数据转换为输出SRA/FASTQ/FASTA/GZIP文件格式。下载和提取阶段都可以比使用NCBI的SRA工具包更快。特别是,从ENA下载意味着直接下载FASTQ文件,因此不需要提取步骤。

    • annotate
      在annotation子命令中,有关运行的 metadata 从NCBI下载,并以几种格式之一输出,例如human-readable, CSV, TSV, JSON, feather or parquet。默认情况下,会下载少量 metadata 可以使用-all-columns输出更多详细信息。

    下载

    conda

    conda create -n kingfisher -c conda-forge -c bioconda kingfisher
    conda activate kingfisher
    kingfisher get -r SRR12118866 -m ena-ftp
    

    Optionally, to use the ena-ascp method, an Aspera connect client is also required. Seehttps://www.ibm.com/aspera/connect/ or https://www.biostars.org/p/325010/.

    Usage

    kingfisher get --full-help 
    kingfisher annotate --full-help
    
    kingfisher get -r ERR1739691 -m ena-ascp aws-http prefetch
    kingfisher extract --sra ERR1739691.sra -t 16 -f fastq.gz
    
    
    kingfisher annotate -r ERR1739691
    run        | bioproject | Gbp   | library_strategy | library_selection | model               | sample_name | taxon_name
    ---------- | ---------- | ----- | ---------------- | ----------------- | ------------------- | ----------- | ----------
    ERR1739691 | PRJEB15706 | 2.382 | WGS    
    
    get -m
    kingfisher(get)                                                kingfisher(get)
    
    NAME
           kingfisher get
    
    SYNOPSIS
           kingfisher  get [-h] [-r RUN_IDENTIFIERS [RUN_IDENTIFIERS ...]] [--run-
           identifiers-list  RUN_IDENTIFIERS_LIST]  [-p  BIOPROJECTS  [BIOPROJECTS
           ...]]   -m   {aws-http,prefetch,aws-cp,gcp-cp,ena-ascp,ena-ftp}  [{aws-
           http,prefetch,aws-cp,gcp-cp,ena-ascp,ena-ftp} ...]  [--download-threads
           DOWNLOAD_THREADS]       [--hide-download-progress]      [--ascp-ssh-key
           ASCP_SSH_KEY]  [--ascp-args  ASCP_ARGS]  [--allow-paid]  [--allow-paid-
           from-aws]  [--aws-user-key-id  AWS_USER_KEY_ID]  [--aws-user-key-secret
           AWS_USER_KEY_SECRET]   [--guess-aws-location]   [--allow-paid-from-gcp]
           [--gcp-project   GCP_PROJECT]  [--gcp-user-key-file  GCP_USER_KEY_FILE]
           [--prefetch-max-size    PREFETCH_MAX_SIZE]    [--check-md5sums]     [-f
           {sra,fastq,fastq.gz,fasta,fasta.gz}
           [{sra,fastq,fastq.gz,fasta,fasta.gz}   ...]]   [--force]   [--unsorted]
           [--stdout]  [-t  EXTRACTION_THREADS]  [--debug]  [--version]  [--quiet]
           [--full-help] [--full-help-roff]
    
    
    

    相关文章

      网友评论

          本文标题:从NCBI、ENA等公共数据库下载数据-kingfisher

          本文链接:https://www.haomeiwen.com/subject/jyhjwdtx.html