能实现同时用mate和特征序列过滤,需要更严格的过滤,同时对分三段比对上的circRNA警惕。
发现:twopair和onepair重合,其中一个有的另一个大概率也有。
下一步:采用更严格的过滤条件+去接头。
区间树的查找方式只能随机的查到完全在区间内的,不能输出全部的,只能输出近似全部的,由于过于复杂,暂时先选用bedtools跑完流程
之后希望改造bedtools源码
使用bedtools+取intron的python脚本
####################
人类:
python full_human_1_2.py -i /mnt/x110/wus/BP_new/BWA_mapping/dingh/SRR6999003_mapped.sam -f /mnt/x110/guosy/Database/hg19/samtools-index/hg19.fa -o /mnt/x110/wus/BP_new/BWA_mapping/dingh/te
由于用的是hg19的fa,不太清楚
gff采用:
/mnt/S30/database/hg38/hg38_gencodev24.gff
进行尝试,不行
/mnt/T30/zhengy/book/database/hg38/old-gff/ncbi-refseq-gene.gff
也不行
重新下了hg19的也不行,使用
/mnt/T30/zhengy/book/database/hg38/intron/hg38-refseq-intron.gff的intron做bedtools
命令:beedtools_test]$ bedtools intersect -abam /mnt/x110/wus/BP_new/BWA_mapping/dingh/te/new_all/one_pair_filter_result_sort.bam -b /mnt/T30/zhengy/book/database/hg38/intron/hg38-refseq-intron.gff -wo > one_pair_filter_result_sort_in_intron.gff
################
(python3) [wus@genome beedtools_test]$ bedtools intersect
Tool: bedtools intersect (aka intersectBed)
Version: v2.17.0
Summary: Report overlaps between two feature files.
Usage: bedtools intersect [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf>
Options:
-abam The A input file is in BAM format. Output will be BAM as well.
-ubam Write uncompressed BAM output. Default writes compressed BAM.
-bed When using BAM input (-abam), write output as BED. The default
is to write output in BAM when using -abam.
-wa Write the original entry in A for each overlap.
-wb Write the original entry in B for each overlap.
- Useful for knowing _what_ A overlaps. Restricted by -f and -r.
-loj Perform a "left outer join". That is, for each feature in A
report each overlap with B. If no overlaps are found,
report a NULL feature for B.
-wo Write the original A and B entries plus the number of base
pairs of overlap between the two features.
- Overlaps restricted by -f and -r.
Only A features with overlap are reported.
-wao Write the original A and B entries plus the number of base
pairs of overlap between the two features.
- Overlapping features restricted by -f and -r.
However, A features w/o overlap are also reported
with a NULL B feature and overlap = 0.
-u Write the original A entry _once_ if _any_ overlaps found in B.
- In other words, just report the fact >=1 hit was found.
- Overlaps restricted by -f and -r.
-c For each entry in A, report the number of overlaps with B.
- Reports 0 for A entries that have no overlap with B.
- Overlaps restricted by -f and -r.
-v Only report those entries in A that have _no overlaps_ with B.
- Similar to "grep -v" (an homage).
-f Minimum overlap required as a fraction of A.
- Default is 1E-9 (i.e., 1bp).
- FLOAT (e.g. 0.50)
-r Require that the fraction overlap be reciprocal for A and B.
- In other words, if -f is 0.90 and -r is used, this requires
that B overlap 90% of A and A _also_ overlaps 90% of B.
-s Require same strandedness. That is, only report hits in B
that overlap A on the _same_ strand.
- By default, overlaps are reported without respect to strand.
-S Require different strandedness. That is, only report hits in B
that overlap A on the _opposite_ strand.
- By default, overlaps are reported without respect to strand.
-split Treat "split" BAM or BED12 entries as distinct BED intervals.
-sorted Use the "chromsweep" algorithm for sorted (-k1,1 -k2,2n) input
-header Print the header from the A file prior to results.
Notes:
(1) When a BAM file is used for the A file, the alignment is retained if overlaps exist,
and exlcuded if an overlap cannot be found. If multiple overlaps exist, they are not
reported, as we are only testing for one or more overlaps.
1证明:1 和1.0是一样的效果,这样可以直接过滤出子集,原本只能取交集,使用参数限定之后可以取子集
bedtools intersect -abam /mnt/x110/wus/BP_new/BWA_mapping/dingh/te/new_all/one_pair_filter_result_sort.bam -b /mnt/T30/zhengy/book/database/hg38/intron/hg38-refseq-intron.gff -f 1.0 > one_pair_filter_result_sort_in_intron_v5.bam
1 1 1python filter_for_cirregion_v7.py > nohup
v7版本的结果显示了forward和query_alignment_sequemce的结果不同,
query_alignment_sequemce不含S的部分,但是是反向互补的,并非原始。
forward显示含S的部分,但是是原始的。
因此,应使用query_alignment_sequemce,并对reverse的转换。
V8实现了此功能:
1可见:转换的都是开头的,没转换的都是结尾的。
至此。取bp实现。
网友评论