美文网首页linux
200904 Circ之旅4.1-Circexplorer2

200904 Circ之旅4.1-Circexplorer2

作者: dicklim | 来源:发表于2020-09-04 17:38 被阅读0次

Circexplorer2

参考文档:

CIRCexplorer2手册

tophat的安装与使用

这个主要记录一下使用Circexplorer2的流程

目前已经通过之前的步骤获得了trimmed的序列,并且qc过了,然后bowtie2,bwa,hisat,等等的索引都已经做好。

下面参考文档CIRCexplorer2手册,里面提供了单端和双端的align。这儿我使用的是GSE99531

这个是一个双端测序的rna文库,参数如下:

Instrument: Illumina HiSeq 4000
Strategy: RNA-Seq
Source: TRANSCRIPTOMIC
Selection: cDNA
Layout: PAIRED

但是因为这个软件本身默认是用tophat处理单端测序所以就记录一下使用。

Align

单端

From index files (bowtie1_index is the prefix for bowtie1 index files, and bowtie2_index is the prefix for bowtie2 index files):

CIRCexplorer2 align -G hg19_kg.gtf -i bowtie1_index -j bowtie2_index -f RNA_seq.fastq > CIRCexplorer2_align.log

-i -j 应该是拿来指定索引文件的,用一个就行了,gtf应该用的是knowgene那个gtf。

双端测序

tophat 2.1.0

tophat利用bowtie2建立的索引进行比对。这个来自circexplorer2的手册。

tophat2 -o tophat_fusion -p 15 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search index fq1 fq2

-o | --output < string > default: ./tophat_out 输出的文件夹路径
-p 15 线程数,老生常谈
--fusion-search 开启融合转录子的比对
--keep-fasta-order 对比对结果按基因组fasta文件进行排序。该参数会使输出的SAM/BAM文件和tophat的1.41或以前版本不兼容
--bowtie1 default: bowtie2使用Bowtie1来代替Bowtie2进行比对。特别是使用colorspace reads时,因为只有Bowtie1支持,而Bowtie2不支持。
--no-coverage-search取消以覆盖度为基础来搜寻junctions,和下一个参数对立,该参数为默认参数。

我的命令如下:

tophat2 -o tophat_fusion -p 30 --fusion-search --keep-fasta-order --no-coverage-search ~/Circ/index/bowtie2/hg19/hg19 ~/Circ/fqc/SRR5635537_1_val_1.fq.gz ~/Circ/fqc/SRR5635537_2_val_2.fq.gz

tophat2 -o tophat_fusion -p 30 --fusion-search --keep-fasta-order --no-coverage-search ~/Circ/index/bowtie2/hg19/hg19 ~/Circ/fqc/SRR5635538_1_val_1.fq.gz ~/Circ/fqc/SRR5635538_2_val_2.fq.gz

脚本写法:

ls *.gz |cut -d "_" -f 1 | sort -u |while read id;do
ls lh ${id}_1_val_1.fq.gz  ${id}_2_val_2.fq.gz
tophat2 -o tophat_fusion -p 30 --fusion-search --keep-fasta-order --no-coverage-search ~/Circ/index/bowtie2/hg19/hg19 ${id}_1_val_1.fq.gz  ${id}_2_val_2.fq.gz
done

任务丢后台留一些参考吧

linux后台执行命令:&和nohup

linux后台运行&符号、nohup命令、输出重定向等使用方法

nohup   conmmand   1>nohup37.out 2>error37   &

输出:

[2020-07-07 18:28:18] Beginning TopHat run (v2.1.0)
-----------------------------------------------
[2020-07-07 18:28:18] Checking for Bowtie
          Bowtie version:    2.3.5.1
[2020-07-07 18:28:18] Checking for Bowtie index files (genome)..
[2020-07-07 18:28:18] Checking for reference FASTA file
[2020-07-07 18:28:18] Generating SAM header for /home/dick/Circ/index/bowtie2/hg19/hg19
[2020-07-07 18:28:20] Preparing reads
     left reads: min. length=25, max. length=51, 65118503 kept reads (29794 discarded)
    right reads: min. length=25, max. length=51, 65111529 kept reads (36768 discarded)
[2020-07-07 18:48:44] Mapping left_kept_reads to genome hg19 with Bowtie2 
[2020-07-07 19:23:03] Mapping left_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2020-07-07 19:27:28] Mapping left_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2020-07-07 19:34:15] Mapping right_kept_reads to genome hg19 with Bowtie2 
[2020-07-07 20:07:28] Mapping right_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2020-07-07 20:12:43] Mapping right_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2020-07-07 20:18:47] Searching for junctions via segment mapping
[2020-07-07 21:20:20] Retrieving sequences for splices
[2020-07-07 21:27:10] Indexing splices
Building a LARGE index
[2020-07-08 00:11:23] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2020-07-08 00:39:47] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2020-07-08 01:01:28] Joining segment hits
[2020-07-08 01:25:55] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2020-07-08 01:54:26] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2020-07-08 02:18:06] Joining segment hits
[2020-07-08 02:43:22] Reporting output tracks
-----------------------------------------------
[2020-07-08 03:45:32] A summary of the alignment counts can be found in tophat_fusion/37/align_summary.txt
[2020-07-08 03:45:32] Run complete: 09:17:14 elapsed

###########################################################################################
-rw-rw-r-- 1 dick dick 7.0G 7月   8 03:44 accepted_hits.bam
-rw-rw-r-- 1 dick dick  569 7月   8 03:12 align_summary.txt
-rw-rw-r-- 1 dick dick 5.7M 7月   8 03:12 deletions.bed
-rw-rw-r-- 1 dick dick  42M 7月   8 03:12 fusions.out
-rw-rw-r-- 1 dick dick 3.0M 7月   8 03:12 insertions.bed
-rw-rw-r-- 1 dick dick  12M 7月   8 03:12 junctions.bed
drwxrwxr-x 2 dick dick 4.0K 7月   8 03:44 logs
-rw-rw-r-- 1 dick dick  184 7月   7 18:48 prep_reads.info
-rw-rw-r-- 1 dick dick 243M 7月   8 03:45 unmapped.bam

报错踩坑记录:

[2020-07-07 09:45:52] Beginning TopHat run (v2.1.1)
-----------------------------------------------
[2020-07-07 09:45:52] Checking for Bowtie
          Bowtie version:    2.3.5.1
[2020-07-07 09:45:52] Checking for Bowtie index files (genome)..
[2020-07-07 09:45:52] Checking for reference FASTA file
[2020-07-07 09:45:52] Generating SAM header for /home/dick/Circ/index/bowtie2/hg19/hg19
[2020-07-07 09:45:54] Preparing reads
     left reads: min. length=25, max. length=51, 65118503 kept reads (29794 discarded)
    right reads: min. length=25, max. length=51, 65111529 kept reads (36768 discarded)
[2020-07-07 10:06:11] Mapping left_kept_reads to genome hg19 with Bowtie2 
[2020-07-07 10:52:28] Mapping left_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2020-07-07 10:57:41] Mapping left_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2020-07-07 11:05:37] Mapping right_kept_reads to genome hg19 with Bowtie2 
[2020-07-07 11:46:10] Mapping right_kept_reads_seg1 to genome hg19 with Bowtie2 (1/2)
[2020-07-07 11:50:11] Mapping right_kept_reads_seg2 to genome hg19 with Bowtie2 (2/2)
[2020-07-07 11:55:02] Searching for junctions via segment mapping
[2020-07-07 12:49:01] Retrieving sequences for splices
[2020-07-07 12:55:46] Indexing splices
Building a LARGE index
[2020-07-07 15:24:33] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2020-07-07 15:46:39] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2020-07-07 16:10:08] Joining segment hits
[2020-07-07 16:36:08] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/2)
[2020-07-07 17:00:17] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/2)
[2020-07-07 17:22:40] Joining segment hits
[2020-07-07 17:48:31] Reporting output tracks
    [FAILED]
Error running /home/dick/anaconda3/envs/rna/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir tophat_fusion/37/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 --fusion-search --fusion-anchor-length 20 --fusion-min-dist 10000000 --fusion-read-mismatches 2 --fusion-multireads 2 --fusion-multipairs 2 -z gzip -p30 --inner-dist-mean 50 --inner-dist-std-dev 20 --no-closure-search --no-coverage-search --no-microexon-search --sam-header tophat_fusion/37/tmp/hg19_genome.bwt.samheader.sam --report-discordant-pair-alignments --report-mixed-alignments --samtools=/home/dick/anaconda3/envs/rna/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /home/dick/Circ/index/bowtie2/hg19/hg19.fa tophat_fusion/37/junctions.bed tophat_fusion/37/insertions.bed tophat_fusion/37/deletions.bed tophat_fusion/37/fusions.out tophat_fusion/37/tmp/accepted_hits tophat_fusion/37/tmp/left_kept_reads.mapped.bam,tophat_fusion/37/tmp/left_kept_reads.candidates tophat_fusion/37/tmp/left_kept_reads.bam tophat_fusion/37/tmp/right_kept_reads.mapped.bam,tophat_fusion/37/tmp/right_kept_reads.candidates tophat_fusion/37/tmp/right_kept_reads.bam
./SeqAn-1.4.2/seqan/basic/basic_exception.h:236 FAILED!  (Uncaught exception of type St12out_of_range: basic_string::substr)

这个好像是版本原因导致,把tophat换成2.1.0就可以了。

STAR

STAR --chimSegmentMin 10 \
     --runThreadN 30 \
     --genomeDir ~/Circ/index/STAR/hg19 \
     --readFilesCommand zcat \
     --readFilesIn ~/Circ/fqc/SRR5635537_1_val_1.fq ~/Circ/fqc/SRR5635537_2_val_2.fq \
     --outFileNamePrefix ~/Circ/STAR_map/37/SRR5635537_

这儿readFileCommand我一开始写的gzip后来发现不行,就还是用它给的参数吧

回头记录一下输出,这个速度真的快

STAR --chimSegmentMin 10 \
     --runThreadN 30 \
     --genomeDir ~/Circ/index/STAR/hg38 \
     --readFilesCommand zcat \
     --readFilesIn ~/Circ/fqc/SRR5635537_1_val_1.fq ~/Circ/fqc/SRR5635537_2_val_2.fq \
     --outFileNamePrefix ~/Circ/STAR_map/37/SRR5635537_

parse

CIRCexplorer2 parse -t STAR SRR5635537_Chimeric.out.junction > CIRCexplorer2_parse.log

annotation

CIRCexplorer2 annotate -r hg19_ref_all.txt -g hg19.fa -b back_spliced_junction.bed -o circularRNA_known.txt > CIRCexplorer2_annotate.log

这个 hg19_ref_all.txt

fetch_ucsc.py hg19 ref hg19_ref.txt
fetch_ucsc.py hg19 kg hg19_kg.txt
fetch_ucsc.py hg19 ens hg19_ens.txt
cat hg19_ref.txt hg19_kg.txt hg19_ens.txt > hg19_ref_all.txt

转gtf

cut -f2-11 hg19_ref.txt|/home/dick/Circ/index/ref/genePredToGtf file stdin hg19_ref.gtf
# or
cut -f2-11 hg19_kg.txt|/home/dick/Circ/index/ref/genePredToGtf file stdin hg19_kg.gtf
# or
cut -f2-11 hg19_ens.txt|/home/dick/Circ/index/ref/genePredToGtf file stdin hg19_ens.gtf
cut -f2-11 hg19_ref_all.txt|/home/dick/Circ/index/ref/genePredToGtf file stdin hg19_ref_all.gtf

annotation

Annotating for Circular RNAs
CIRCexplorer2 annotate -r ~/Circ/index/ref/hg19_ref_all.txt -g ~/Circ/index/raw/hg19/hg19.fa -b back_spliced_junction.bed -o circularRNA_known.txt > CIRCexplorer2_annotate.log

相关文章

网友评论

    本文标题:200904 Circ之旅4.1-Circexplorer2

    本文链接:https://www.haomeiwen.com/subject/jtxtektx.html