rRNA去除之后就开始进行数据比对了,这一步骤作者使用了三个比对软件:Tophat2,STAR,HISAT2,BWA。
相应代码抠出来:
Tophat2:
image-20220327015817124.png继续扣出来运行~
构建索引
# 激活小环境
conda activate rna
# 创建文件夹
mkdir Tophat2Index
# vim Tophat2Index.sh
fasta=Homo_sapiens.GRCh38.dna.primary_assembly.fa
fasta_baseName=GRCh38
ln $fasta Tophat2Index
bowtie2-build -f $fasta Tophat2Index/${fasta_baseName}
# 运行
nohup sh Tophat2Index.sh >Tophat2Index.sh.log &
索引内容如下
Tophat2Index
├── GRCh38.1.bt2
├── GRCh38.2.bt2
├── GRCh38.3.bt2
├── GRCh38.4.bt2
├── GRCh38.rev.1.bt2
├── GRCh38.rev.2.bt2
└── Homo_sapiens.GRCh38.dna.primary_assembly.fa
数据比
额 作者内部好像放的是tophat不是tophat2。
- --no-novel-juncs:不输出新的junction
- --library-type:建库方式类型,无链特异性,链特异第一链,链特异第二链
# 激活小环境
conda activate rna
# 创建文件夹
mkdir -p alignment/tophat2
ls *gz | perl -ne 'chomp;/(SRR\d+)/;print"mkdir alignment/tophat2/$1\n";' |sh
gtf=../GRCh38/Homo_sapiens.GRCh38.105.chr.gtf
strand_info=fr-unstranded
index_base=../GRCh38/Tophat2Index/GRCh38
outdir=alignment/tophat2
ls alignment/rRNA_dup/SRR10352*gz |while read id
do
sample_name=${id##*/}
sample_name=${sample_name%%.*}
echo "tophat -p 12 -G $gtf -o ${outdir}/$sample_name --no-novel-juncs --library-type $strand_info $index_base ${id} > ${outdir}/${sample_name}_log.txt && mv ${outdir}/$sample_name/accepted_hits.bam ${outdir}/${sample_name}_tophat2.bam "
done >tophat2.sh
# 运行
nohup sh tophat2.sh >tophat2.sh.log &
tophat2.sh的内容:
image-20220328175452502.png运行完之后目录下每个样本会生成一个*_tophat2.bam。
网友评论