【块】生信上游-5 StringTie

作者: JamesMori | 来源:发表于2022-10-25 23:44 被阅读0次
  • highly efficient assembler of RNA-Seq alignments into potential transcripts

1. 基本代码

stringtie [-o <output.gtf>] [other_options] <read_alignments.bam>
输入:SAM, BAM or CRAM file with RNA-Seq read alignments sorted by their genomic location

2. 大致的选项:

-h/--help:Prints help message and exits
--version:Prints version and exits
-L:long reads processing mode
--mix:mixed reads processing mode
-e:expression estimate mode,仅计算-G提供的参考本的表达量
-v:verbose mode
-a:spliced reads在连接点两端的碱基数目要高于一定量
--conservative:保守模式,等同于-t -c 1.5 -f 0.05
-B:输出output of Ballgown input table files (*.ctab) containing coverage data for the reference transcripts given with the -G option
--merge:Transcript merge mode,与以上的组装模式不同。输入GTF/GFF文件,产生a uniform set of transcripts for all samples. Output is a merged GTF file with all merged gene models, but without any numeric results on coverage, FPKM, and TPM. Then, with this merged GTF, StringTie can re-estimate abundances by running it again with the -e option on the original set of alignment files, as illustrated in the figure below.可借助-G的参考本

3. 输入

3.1. 输入必须是sorted,TopHat的输出是sorted,但其他的不是,可samtools sort
3.2. 输入序列要有tag表明是参考序列,TopHat与HISAT2会自动添加,STAR需要--outSAMstrandField intronMotif。对于长reads比对minimap2,需-ax splice
3.3. 主要的输入参数有-L,-mix,-G,-e

4. 输出

4.1. GTF文件:主要包含组装的转录本,包含9列信息


4.2. 基因丰富度文件:-A选项输出

4.3. Fully covered transcripts:-G选项输出
GTF文件,a file with all the transcripts in the reference annotation that are fully covered, end to end, by reads

4.4. Ballgown Input Table Files
4.5. Merged GTF

5. 后续分析组装结果,可以用gffcompare program



    本文标题:【块】生信上游-5 StringTie
