HISAT2 通过购买腾讯云服务器（计时）构建索引（--snp

作者: gong0037 | 来源:发表于2019-07-25 15:07 被阅读0次

HISAT2 通过购买腾讯云服务器（计时）构建索引（--snp
hisat2的使用
RNASeq实战练习-hisat2比对与featurecount
hisat2-build建立索引所需的SNP文件
云服务器部署Web项目—Linux环境下
使用egg.js搭建web服务器
RNA_seq:将修剪后的序列比对到参考基因组
Django云端部署教程
线粒体组装
服务器上部署HTML的流程

If you use --snp, --ss, and/or --exon, hisat2-build will need about 200GB RAM for the human genome size as index building involves a graph construction. Otherwise, you will be able to build an index on your desktop with 8GB RAM.

HISAT2构建索引需要200g内存，没办法要与时俱进匹配其它软件，包小时租用一个服务器吧

腾讯云，不错的选择！！计费模式，按量计费

棒棒的，用起来！

wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-Linux_x86_64.zip //下载hisat2

unzip 解压即可使用。

下载 ref，gtf，snp

ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp151Common.txt.gz

ftp://ftp.ensembl.org/pub/release-96/gtf/homo_sapiens/Homo_sapiens.GRCh38.96.gtf.gz

步骤（参照官网下载的索引中的）：

1. hisat2_extract_snps_haplotypes_UCSC.py Homo_sapiens.GRCh38.dna.primary_assembly.fa snp151Common.txt genome //产生genome.haplotype genome.snp文件

此处snp151Common.txt染色体名称使用的是，例如chr1，而GRCh38.fa使用的是1，需要修改snp文件的染色体名称，利用如下shell脚本更新

SNP_FILE=snp151Common.txt

awk 'BEGIN{OFS="\t"} {if($2 ~ /^chr/) {$2 = substr($2, 4)}; if($2 == "M") {$2 = "MT"} print}' ${SNP_FILE} > ${SNP_FILE}.tmp

mv ${SNP_FILE}.tmp ${SNP_FILE}

2. hisat2_extract_splice_sites.py Homo_sapiens.GRCh38.96.gtf > genome.ss

3. hisat2_extract_exons.py Homo_sapiens.GRCh38.96.gtf > genome.exon

4. hisat2-build -p 20 Homo_sapiens.GRCh38.dna.primary_assembly.fa --snp genome.snp --haplotype genome.haplotype --ss genome.ss --exon genome.exon genome_snp_tran

前三步骤，可以在本地做完，再把4个genome文件，还有基因组ref上传到服务器上，毕竟时间就是金钱。

搞定，过程图