Smart-seq2上游处理(Hisat2版)

作者: Insc | 来源:发表于2022-05-09 12:21 被阅读0次

最近试了很多Smart-seq2上游处理的方法,参照了Jimmy老师的教程,以及阿则的生信小站公众号文章。对代码进行了整理。但是过程中仍然有一些问题尚待解决。

1. 数据、软件准备

  • SRR文件下载到SRRfile文件下
mkdir SRRfile
prefetch --option-file SRR_Acc_List.txt
  • hisat2下载
conda install hisat2
  • hisat2比对索引下载
axel https://genome-idx.s3.amazonaws.com/hisat/grch38_tran.tar.gz
# wget https://genome-idx.s3.amazonaws.com/hisat/grch38_tran.tar.gz
  • conda安装其他软件
conda install fastqc multiqc samtools sambamba featureCounts

2. 数据处理

  • 解压缩
gunzip *.gz
  • SRA文件转fastq
mkdir fastq
ls SRRfile/SRR* | while read id;do \
    (fastq-dump --gzip --split-3 -A `basename $id` -O fastq/$id &);done
  • 质控
mkdir fastqc
fastqc -o ./fastqc -t 20 ./fastq/*.fastq &&multiqc ./fastqc -o ./fastqc
  • 去接头
# trim_galore
mkdir clean
cat SRR_Acc_List.txt | while read id;do\
  trim_galore --quality 20 --phred33 \
  --length 20 -j 20 \
  -o ../clean \
  --paried ./fastq/${id}_1.fastq ./fastq/${id}_2.fastq
done
gunzip ./clean/*.gz

3. 序列比对

  • hisat2
mkdir aligned
cat SRR_Acc_List.txt | while read id;do
  hisat2 -p 20 -x ~/ref/genome_tran/genome_tran \
  -S ./aligned/${id}.sam \
 -1./clean/${id}_1_val_1.fq -2 ./clean/${id}_2_val_2.fq
done

  • sam转bam
mkdir bam
cat SRR_Acc_List.txt | while read id;do
  samtools sort -n -@ 20 \
  -o ./bam/${id}.bam ./aligned/${id}.sam
done
  • bam文件排序
cat SRR_Acc_List.txt | while read id;do
  sambamba sort -t 20 -o ./bam/${id}.bam ./bam/${id}.bam
done
  • 输出为count
mkdir count
cat SRR_Acc_List.txt | while read id;do
  featureCounts -T 20 -p -t exon -g gene_name \
  -a ~/ref/gencode.v40.annotation.gtf \
  -o count/${id}.all_feature.txt \
  bam/${id}.sorted.bam
done

4.批处理文件

根据以上,写成批处理文件,批量处理SRR格式文件。

### srr2fastq
mkdir fastq
ls SRRfile/SRR* | while read id;do \
    (fastq-dump --gzip --split-3 -A `basename $id` -O fastq/$id &);done

### fastqc
mkdir fastqc
fastqc -o ./fastqc -t 20 ./fastq/*.fastq &&multiqc ./fastqc -o ./fastqc

### trim_galore
mkdir clean
cat SRR_Acc_List.txt | while read id;do\
  trim_galore --quality 20 --phred33 \
  --length 20 -j 20 \
  -o ../clean \
  --paried ./fastq/${id}_1.fastq ./fastq/${id}_2.fastq
done

### hisat2
mkdir aligned
cat SRR_Acc_List.txt | while read id;do
  hisat2 -p 20 -x ~/ref/genome_tran/genome_tran \
  -S ./aligned/${id}.sam \
 -1./clean/${id}_1_val_1.fq -2 ./clean/${id}_2_val_2.fq
done

### sam转bam
mkdir bam
cat SRR_Acc_List.txt | while read id;do
  samtools sort -n -@ 20 \
  -o ./bam/${id}.bam ./aligned/${id}.sam
done

### bam文件排序
cat SRR_Acc_List.txt | while read id;do
  sambamba sort -t 20 -o ./bam/${id}.bam ./bam/${id}.bam
done

### featurecounts 
mkdir count
cat SRR_Acc_List.txt | while read id;do
  featureCounts -T 20 -p -t exon -g gene_name \
  -a ~/ref/gencode.v40.annotation.gtf \
  -o count/${id}.all_feature.txt \
  bam/${id}.sorted.bam
done


### 
echo "------- end -------"

参考资料

  1. Jimmy老师的教程
  2. 阿则的生信小站公众号文章

相关文章

网友评论

    本文标题:Smart-seq2上游处理(Hisat2版)

    本文链接:https://www.haomeiwen.com/subject/jmucurtx.html