findoverlaps
找到连个GRrange之间相交的区域
https://kasperdanielhansen.github.io/genbioconductor/html/GenomicRanges_GRanges_Usage.html
## S4 method for signature 'GInteractions,Vector'
findOverlaps(query, subject, maxgap=0L, minoverlap=1L,
type=c("any", "start", "end", "within", "equal"),
select=c("all", "first", "last", "arbitrary"),
ignore.strand=FALSE, use.region="both")
Arguments
query, subject
A Vector, GInteractions or InteractionSet object, depending on the specified method. At least one of these must be a GInteractions or InteractionSet object. Also, subject can be missing if query is a GInteractions or InteractionSet object.
maxgap, minoverlap, type
See ?findOverlaps in the GenomicRanges package.
select, ignore.strand
See ?findOverlaps in the GenomicRanges package.
use.region
A string specifying the regions to be used to identify overlaps.
默认采用any的比对模式。也就是说只要两个序列有重叠,就计算在内、如果想找query完全落在subject中的序列,就需要使用within
select参数说明:
解决的问题是,如果一个query同时比对到subject多个位置,应该选哪个。
默认使用all,将全部位置输出
first:输出比对的第一个
last:输出比对的最后一个
arbitary:随机输出一个
如果没有重叠区域,就输出NA
For findOverlaps, a Hits object is returned if select="all", and an integer vector of subject indices otherwise.
参考网址:
https://www.imsbio.co.jp/RGM/R_rdfile?f=InteractionSet/man/overlaps.Rd&d=R_BC
gr <- GRanges(
seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10))
gr1 <- GRanges(
seqnames = "chr1",
ranges = IRanges(100, end = 110),
strand = Rle(strand(c("-"))),
score = 1)
> gr
GRanges object with 10 ranges and 2 metadata columns:
seqnames ranges strand | score GC
<Rle> <IRanges> <Rle> | <integer> <numeric>
a chr1 101-111 - | 1 1.000000
b chr2 102-112 + | 2 0.888889
c chr2 103-113 + | 3 0.777778
d chr2 104-114 * | 4 0.666667
e chr1 105-115 * | 5 0.555556
f chr1 106-116 + | 6 0.444444
g chr3 107-117 + | 7 0.333333
h chr3 108-118 + | 8 0.222222
i chr3 109-119 - | 9 0.111111
j chr3 110-120 - | 10 0.000000
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
> gr1
GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 100-110 - | 1
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> findOverlaps(gr,gr1, ignore.strand = T)
Hits object with 3 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 5 1
[3] 6 1
-------
queryLength: 10 / subjectLength: 1
type参数尝试
> findOverlaps(gr,gr1, ignore.strand = T, type = "any")
Hits object with 3 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 5 1
[3] 6 1
-------
queryLength: 10 / subjectLength: 1
> findOverlaps(gr,gr1, ignore.strand = T, type = "start")
Hits object with 0 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
-------
queryLength: 10 / subjectLength: 1
> findOverlaps(gr,gr1, ignore.strand = T, type = "end")
Hits object with 0 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
-------
queryLength: 10 / subjectLength: 1
> gr2 <- GRanges(
+ seqnames = "chr1",
+ ranges = IRanges(c(100,103), end = c(102,110)),
+ strand = Rle(strand(c("-","+"))),
+ score = c(1,2))
> gr2
GRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 100-102 - | 1
[2] chr1 103-110 + | 2
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> findOverlaps(gr2,gr, ignore.strand = T, type = "within")
Hits object with 1 hit and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 2 1
-------
queryLength: 2 / subjectLength: 10
使用queryHits() 以及subjectHits() 提取数据
image.png
网友评论