美文网首页
基因组实战02: 软件安装和GATK数据下载

基因组实战02: 软件安装和GATK数据下载

作者: 生信探索 | 来源:发表于2024-05-16 18:29 被阅读0次

download the genomics data of GATK

FTP

https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

two slow in China (23.0K/s)

# install lftp

sudo apt -y install lftp

# login into the ftp server; no password (just enter)

lftp ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/

# download all the hg38 directory

mirror hg38

use google cloud

35M/s

https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/

micromamba create -n gsutil

micromamba activate gsutil

micromamba install -y -c conda-forge python=3.4 gsutil

mkdir -p ~/DataHub/Genomics/GATK

cd ~/DataHub/Genomics/GATK

gsutil -m cp -r \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G_omni2.5.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dict" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.fai" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz" \

  "gs://genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz.tbi" \

  "gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals" \

  "gs://genomics-public-data/resources/broad/hg38/v0/wgs_calling_regions.hg38.interval_list" \

  .

BWA的索引文件

Homo_sapiens_assembly38.fasta

Homo_sapiens_assembly38.fasta.64.amb

Homo_sapiens_assembly38.fasta.64.ann

Homo_sapiens_assembly38.fasta.64.bwt

Homo_sapiens_assembly38.fasta.64.pac

Homo_sapiens_assembly38.fasta.64.sa

Homo_sapiens_assembly38.fasta.dict

prepare the environment

python 2

micromamba create -n dna2 python=2

micromamba activate dna2

micromamba install -y -c bioconda bwa samtools bcftools vcftools snpeff fastqc qualimap gatk4 tabix multiqc

python 3

micromamba create -n dna3

micromamba activate dna3

micromamba install -y -c conda-forge python=3.10 python_abi xopen

micromamba install -y -c bioconda cutadapt=4.3 trim-galore

相关文章

  • art-illumina模拟测序

    1.安装软件及下载基因组数据 1.1 下载art-illumina测序软件 链接 1.2 下载基因组数据 从gen...

  • 通过lftp命令下载GATK官网Resource bundle数

    通过lftp命令下载GATK官网Resource bundle数据 # 如果使用GATK软件calling var...

  • RNAseq相关

    数据下载 sra下载ncbi数据 star 比对 star软件的简单使用star官方文档 gatk流程

  • GATK4 —— 获取短变异 (call SNP+indel)

    GATK是一款用于基因组数据分析的软件,其强大的处理引擎和高性能计算功能使其能够承担任何规模的项目。 GATK的功...

  • 安装GATK4

    下载安装GATK 下载GATK4 https://software.broadinstitute.org/gatk...

  • GATK4.1 call SNP

    GATK4.0 和之前的版本相比还是有较大的不同,更加趋于流程化。 软件安装 下载GATK 目前最新版是4.1.9...

  • 2 下载GATK需要的参考基因组文件

    参考基因组及必备的数据库 参考基因组下载 我是从服务器上下载下来放本地电脑了 下载方式1: 直接去gatk官网下载...

  • GATK4.0全基因组和全外显子组分析实战

    GATK4.0全基因组和全外显子组分析实战 文章来源:企鹅号 - 生信知识 前言 GATK是目前业内最权威、使用最...

  • GATK 学习

    本文学习GATK4.0和全基因组数据分析实践 1. 项目目录结构 2.下载E.coli K12的参考基因组序列 3...

  • VCF文件格式解析

    VCF文件全称为Variant Call Format,表示基因组的变异信息,通常为GATK和Samtools软件...

网友评论

      本文标题:基因组实战02: 软件安装和GATK数据下载

      本文链接:https://www.haomeiwen.com/subject/fwqifjtx.html