美文网首页
文章复现-全外显子数据分析学习1下载数据

文章复现-全外显子数据分析学习1下载数据

作者: jiarf | 来源:发表于2022-04-18 15:37 被阅读0次

    教程在:肿瘤外显子数据处理系列教程(一)读文献并且下载测序数据 (qq.com)
    点开

    image.png
    里面会有很多后续的分析

    Reliability of Whole-Exome Sequencing for Assessing Intratumor Genetic Heterogeneity - ScienceDirect文章链接

    image.png image.png
    image.png

    数据下载

    image.png
    NCBI的Sequence Read Archive (SRA),每个项目的url格式都是一样的,https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRPXXX
    https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP070662
    image.png
    image.png
    我们只需要下载wes的
    image.png
    下载好了,我是自己电脑上建了一个目录D:\work\肿瘤外显子数据分析学习 image.png runtable
    acclist

    RunInfo Table: 包含了较多的信息,可用于数据下载完成后对文件的重新命名
    Accesssion List: 只有一列,prefetch可以接受改文件,下载列表中包含的所有样本
    下载数据使用的软件是prefetch,SRA Toolkit的套件之一,如果使用conda的话,需要安装的软件是sra-tools,而不是prefetch。
    首先检查一下有没有这个软件

    11:08:26 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
    $
    prefetch  -help
    
    image.png

    首先建立一个命名为wes的conda环境

    ## 组织项目
    mkdir 0.sra log
    ## 安装conda
    #wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh 
    #bash Miniconda2-latest-Linux-x86_64.sh
    ## 使用官方镜像
    conda config --add channels conda-forge
    ## 创建一个wes环境
    conda create -n wes python=2
    conda info --envs
    ## 创建后需要激活该环境
    source activate wes
    ## 所有的安装都是建立在该环境已经激活的前提下,后续使用到的软件,同样需要激活后再安装
    #conda install sra-tools
    

    这个脚本按照自己需求更改

    11:15:48 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
    $
    mkdir run
    11:15:58 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES
    $
    cd run
    11:16:01 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    vim creat-wes-envs.sh
    11:17:26 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    sh creat-wes-envs.sh
    Warning: 'conda-forge' already in 'channels' list, moving to the top
    Collecting package metadata (current_repodata.json): |
    Proceed ([y]/n)? y
    Downloading and Extracting Packages
    sqlite-3.37.1        | 1.5 MB    | ##################################### | 100%
    pip-20.0.2           | 1.9 MB    | ##################################### | 100%
    zlib-1.2.11          | 88 KB     | ##################################### | 100%
    ncurses-6.3          | 1012 KB   | ##################################### | 100%
    readline-8.1         | 295 KB    | ##################################### | 100%
    python_abi-2.7       | 4 KB      | ##################################### | 100%
    python-2.7.15        | 12.2 MB   | ##################################### | 100%
    setuptools-36.4.0    | 557 KB    | ##################################### | 100%
    libgcc-ng-11.2.0     | 906 KB    | ##################################### | 100%
    ca-certificates-2021 | 139 KB    | ##################################### | 100%
    libstdcxx-ng-11.2.0  | 4.2 MB    | ########################9             |  68%
    …
    certifi-2016.9.26    | 217 KB    | ##################################### | 100%
    Preparing transaction: done
    Verifying transaction: done
    Executing transaction: done
    #
    # To activate this environment, use
    #
    #     $ conda activate wes
    #
    # To deactivate an active environment, use
    #
    #     $ conda deactivate
    
    # conda environments:
    #
    bioinfo                  /data1/jiarongf/anaconda3/envs/bioinfo
    r-reticulate             /data1/jiarongf/anaconda3/envs/r-reticulate
    base                  *  /data1/jiarongf/anaconda_se/anaconda3
    jupyter_notebook         /home/jiarongf/.conda/envs/jupyter_notebook
    celltalk                 /home/jiarongf/my-envs/celltalk
    chipseq                  /home/jiarongf/my-envs/chipseq
    d2l                      /home/jiarongf/my-envs/d2l
    pyscenic                 /home/jiarongf/my-envs/pyscenic
    wes                      /home/jiarongf/my-envs/wes
    
    creat-wes-envs.sh: 12: creat-wes-envs.sh: source: not found
    11:22:15 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    source activate wes
    (/home/jiarongf/my-envs/wes) 11:23:05 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    prefetch -help
    
    Usage: prefetch [ options ] [ accessions(s)... ]
    
    Parameters:
    
      accessions(s)                    list of accessions to process
    
    
    Options:
    
      -T|--type <file-type>            Specify file type to download. Default: sra
      -N|--min-size <size>             Minimum file size to download in KB
                                         (inclusive).
      -X|--max-size <size>             Maximum file size to download in KB
                                         (exclusive). Default: 20G
      -f|--force <no|yes|all|ALL>      Force object download - one of: no, yes,
                                         all, ALL. no [default]: skip download if
                                         the object if found and complete; yes:
                                         download it even if it is found and is
                                         complete; all: ignore lock files (stale
                                         locks or it is being downloaded by
                                         another process - use at your own
                                         risk!); ALL: ignore lock files, restart
                                         download from beginning
      -p|--progress                    Show progress
      -r|--resume <yes|no>             Resume partial downloads - one of: no, yes
                                         [default]
      -C|--verify <yes|no>             Verify after download - one of: no, yes
                                         [default]
      -c|--check-all                   Double-check all refseqs
      -o|--output-file <file>          Write file to <file> when downloading
                                         single file
      -O|--output-directory <directory>
                                       Save files to <directory>/
         --ngc <path>                  <path> to ngc file
         --perm <path>                 <path> to permission file
         --location <location>         location in cloud
         --cart <path>                 <path> to cart file
      -V|--version                     Display the version of the program
      -v|--verbose                     Increase the verbosity of the program
                                         status messages. Use multiple times for
                                         more verbosity.
      -L|--log-level <level>           Logging level as number or enum string.
                                         One of
                                         (fatal|sys|int|err|warn|info|debug) or
                                         (0-6) Current/default is warn
         --option-file file            Read more options and parameters from the
                                         file.
      -h|--help                        print this message
    
    "prefetch" version 2.11.0
    
    (/home/jiarongf/my-envs/wes) 11:25:16 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    

    prefetch,默认通过https下载数据,但是速度不理想。
    aspera的下载速度很快,但是它不是SRA Toolkit的套件之一,不能用conda安装,需要下载安装脚本

    wget https://d3gcli72yxqn2z.cloudfront.net/connect_latest/v4/bin/ibm-aspera-connect_4.1.3.93_linux.tar.gz
    tar -zxvf ibm-aspera-connect_4.1.3.93_linux.tar.gz
    bash ibm-aspera-connect_4.1.3.93_linux.sh
    ## 需要手动添加环境变量
    export PATH='$HOME/.aspera/connect/bin:$PATH'
    source ~/.bashrc
    ##私钥文件位于 $HOME/.aspera/connect/etc
    ## 这是调用的是aspera
    nohup prefetch --option-file ../data/SRR_Acc_List.txt -O ../0.sra -X 200G > ../log/0.download_sra.log 2>&1 &
    
    

    自己运行

    
    (/home/jiarongf/my-envs/wes) 11:45:36 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    vim download-aspera.sh
    (/home/jiarongf/my-envs/wes) 11:52:32 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    cat download-aspera.sh
    wget https://d3gcli72yxqn2z.cloudfront.net/connect_latest/v4/bin/ibm-aspera-connect_4.1.3.93_linux.tar.gz
    tar -zxvf ibm-aspera-connect_4.1.3.93_linux.tar.gz
    bash ibm-aspera-connect_4.1.3.93_linux.sh
    
    
    (/home/jiarongf/my-envs/wes) 11:50:16 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    sh download-aspera.sh 2> download-aspera.log
    ibm-aspera-connect_4.1.3.93_linux.sh
    
    Installing IBM Aspera Connect
    
    Deploying IBM Aspera Connect (/home/jiarongf/.aspera/connect) for the current user only.
    
    Install complete.
    (/home/jiarongf/my-envs/wes) 11:51:00 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    vim  ~/.bashrc
    (/home/jiarongf/my-envs/wes) 11:55:28 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    
    export PATH=/data1/jiarongf/learning/cancer-WES/run/:$PATH
    (/home/jiarongf/my-envs/wes) 11:56:30 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    source ~/.bashrc
    11:57:27 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    source activate wes
    (/home/jiarongf/my-envs/wes) 11:58:09 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    vim prefetch.sh
    (/home/jiarongf/my-envs/wes) 11:59:18 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    cat prefetch.sh
    nohup prefetch --option-file ../data/SRR_Acc_List.txt -O ../0.sra -X 200G > ../log/0.download_sra.log 2>&1 &
    
    (/home/jiarongf/my-envs/wes) 11:59:27 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    sh prefetch.sh
    (/home/jiarongf/my-envs/wes) 11:59:34 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/run
    $
    
    
    
    (/home/jiarongf/my-envs/wes) 12:05:06 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/log
    $
    cat 0.download_sra.log
    
    2022-03-31T04:03:25 prefetch.2.11.0: 1) Downloading 'SRR3182418'...
    2022-03-31T04:03:25 prefetch.2.11.0:  Downloading via HTTPS...
    (/home/jiarongf/my-envs/wes) 12:05:11 jiarongf@172.16.10.223:/data1/jiarongf/learning/cancer-WES/log
    $
    

    简单看一下下载好的log

    2022-04-02T00:09:59 prefetch.2.11.0: 49) Downloading 'SRR3182442.vdbcache'...
    2022-04-02T00:09:59 prefetch.2.11.0: 49) 'SRR3182442.vdbcache' was downloaded successfully
    
    2022-04-02T00:10:01 prefetch.2.11.0: 50) Downloading 'SRR3182443'...
    2022-04-02T00:10:01 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T00:59:08 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T00:59:08 prefetch.2.11.0: 50.2) Downloading 'SRR3182443.vdbcache'...
    2022-04-02T00:59:08 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T00:59:24 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T00:59:24 prefetch.2.11.0:  'SRR3182443.vdbcache' is valid
    2022-04-02T00:59:24 prefetch.2.11.0: 50.2) 'SRR3182443.vdbcache' was downloaded successfully
    2022-04-02T00:59:36 prefetch.2.11.0:  'SRR3182443' is valid
    2022-04-02T00:59:36 prefetch.2.11.0: 50) 'SRR3182443' was downloaded successfully
    2022-04-02T01:00:01 prefetch.2.11.0: 'SRR3182443' has 0 unresolved dependencies
    2022-04-02T01:00:01 prefetch.2.11.0: 50) Downloading 'SRR3182443.vdbcache'...
    2022-04-02T01:00:01 prefetch.2.11.0: 50) 'SRR3182443.vdbcache' was downloaded successfully
    
    2022-04-02T01:00:03 prefetch.2.11.0: 51) Downloading 'SRR3182444'...
    2022-04-02T01:00:03 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T02:03:30 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T02:03:30 prefetch.2.11.0: 51.2) Downloading 'SRR3182444.vdbcache'...
    2022-04-02T02:03:30 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T02:03:47 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T02:03:47 prefetch.2.11.0:  'SRR3182444.vdbcache' is valid
    2022-04-02T02:03:47 prefetch.2.11.0: 51.2) 'SRR3182444.vdbcache' was downloaded successfully
    2022-04-02T02:03:58 prefetch.2.11.0:  'SRR3182444' is valid
    2022-04-02T02:03:58 prefetch.2.11.0: 51) 'SRR3182444' was downloaded successfully
    2022-04-02T02:04:24 prefetch.2.11.0: 'SRR3182444' has 0 unresolved dependencies
    2022-04-02T02:04:24 prefetch.2.11.0: 51) Downloading 'SRR3182444.vdbcache'...
    2022-04-02T02:04:24 prefetch.2.11.0: 51) 'SRR3182444.vdbcache' was downloaded successfully
    
    2022-04-02T02:04:26 prefetch.2.11.0: 52) Downloading 'SRR3182445'...
    2022-04-02T02:04:26 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T02:56:41 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T02:56:41 prefetch.2.11.0: 52.2) Downloading 'SRR3182445.vdbcache'...
    2022-04-02T02:56:41 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T02:57:04 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T02:57:04 prefetch.2.11.0:  'SRR3182445.vdbcache' is valid
    2022-04-02T02:57:04 prefetch.2.11.0: 52.2) 'SRR3182445.vdbcache' was downloaded successfully
    2022-04-02T02:57:13 prefetch.2.11.0:  'SRR3182445' is valid
    2022-04-02T02:57:13 prefetch.2.11.0: 52) 'SRR3182445' was downloaded successfully
    2022-04-02T02:57:37 prefetch.2.11.0: 'SRR3182445' has 0 unresolved dependencies
    2022-04-02T02:57:37 prefetch.2.11.0: 52) Downloading 'SRR3182445.vdbcache'...
    2022-04-02T02:57:37 prefetch.2.11.0: 52) 'SRR3182445.vdbcache' was downloaded successfully
    
    2022-04-02T02:57:40 prefetch.2.11.0: 53) Downloading 'SRR3182446'...
    2022-04-02T02:57:40 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T20:49:37 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T20:49:37 prefetch.2.11.0: 53.2) Downloading 'SRR3182446.vdbcache'...
    2022-04-02T20:49:37 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T20:49:54 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T20:49:54 prefetch.2.11.0:  'SRR3182446.vdbcache' is valid
    2022-04-02T20:49:54 prefetch.2.11.0: 53.2) 'SRR3182446.vdbcache' was downloaded successfully
    2022-04-02T20:50:06 prefetch.2.11.0:  'SRR3182446' is valid
    2022-04-02T20:50:06 prefetch.2.11.0: 53) 'SRR3182446' was downloaded successfully
    2022-04-02T20:50:31 prefetch.2.11.0: 'SRR3182446' has 0 unresolved dependencies
    2022-04-02T20:50:31 prefetch.2.11.0: 53) Downloading 'SRR3182446.vdbcache'...
    2022-04-02T20:50:31 prefetch.2.11.0: 53) 'SRR3182446.vdbcache' was downloaded successfully
    
    2022-04-02T20:50:33 prefetch.2.11.0: 54) Downloading 'SRR3182447'...
    2022-04-02T20:50:33 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T21:03:02 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T21:03:02 prefetch.2.11.0: 54.2) Downloading 'SRR3182447.vdbcache'...
    2022-04-02T21:03:02 prefetch.2.11.0:  Downloading via HTTPS...
    2022-04-02T21:03:17 prefetch.2.11.0:  HTTPS download succeed
    2022-04-02T21:03:17 prefetch.2.11.0:  'SRR3182447.vdbcache' is valid
    2022-04-02T21:03:17 prefetch.2.11.0: 54.2) 'SRR3182447.vdbcache' was downloaded successfully
    2022-04-02T21:03:21 prefetch.2.11.0:  'SRR3182447' is valid
    2022-04-02T21:03:21 prefetch.2.11.0: 54) 'SRR3182447' was downloaded successfully
    2022-04-02T21:03:46 prefetch.2.11.0: 'SRR3182447' has 0 unresolved dependencies
    2022-04-02T21:03:46 prefetch.2.11.0: 54) Downloading 'SRR3182447.vdbcache'...
    2022-04-02T21:03:46 prefetch.2.11.0: 54) 'SRR3182447.vdbcache' was downloaded successfully
    
    

    下了蛮久的
    2022-03-31T04:03:25-2022-04-02T21:03:46
    下了两三天,

    相关文章

      网友评论

          本文标题:文章复现-全外显子数据分析学习1下载数据

          本文链接:https://www.haomeiwen.com/subject/wdngjrtx.html