findOverlaps

作者: 日月其除 | 来源:发表于2021-12-07 19:30 被阅读0次

    findoverlaps
    找到连个GRrange之间相交的区域
    https://kasperdanielhansen.github.io/genbioconductor/html/GenomicRanges_GRanges_Usage.html

    ## S4 method for signature 'GInteractions,Vector'
    findOverlaps(query, subject, maxgap=0L, minoverlap=1L,
        type=c("any", "start", "end", "within", "equal"),
        select=c("all", "first", "last", "arbitrary"),
        ignore.strand=FALSE, use.region="both")
    
    Arguments
    query, subject  
    A Vector, GInteractions or InteractionSet object, depending on the specified method. At least one of these must be a GInteractions or InteractionSet object. Also, subject can be missing if query is a GInteractions or InteractionSet object.
    
    maxgap, minoverlap, type    
    See ?findOverlaps in the GenomicRanges package.
    
    select, ignore.strand   
    See ?findOverlaps in the GenomicRanges package.
    
    use.region  
    A string specifying the regions to be used to identify overlaps.
    

    默认采用any的比对模式。也就是说只要两个序列有重叠,就计算在内、如果想找query完全落在subject中的序列,就需要使用within
    select参数说明:
    解决的问题是,如果一个query同时比对到subject多个位置,应该选哪个。
    默认使用all,将全部位置输出
    first:输出比对的第一个
    last:输出比对的最后一个
    arbitary:随机输出一个
    如果没有重叠区域,就输出NA

    For findOverlaps, a Hits object is returned if select="all", and an integer vector of subject indices otherwise.
    参考网址:
    https://www.imsbio.co.jp/RGM/R_rdfile?f=InteractionSet/man/overlaps.Rd&d=R_BC

    gr <- GRanges(
           seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
           ranges = IRanges(101:110, end = 111:120, names = head(letters, 10)),
           strand = Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
           score = 1:10,
           GC = seq(1, 0, length=10))
    
    
    gr1 <- GRanges(
           seqnames = "chr1",
           ranges = IRanges(100, end = 110),
           strand = Rle(strand(c("-"))),
           score = 1)
    
    > gr
    GRanges object with 10 ranges and 2 metadata columns:
        seqnames    ranges strand |     score        GC
           <Rle> <IRanges>  <Rle> | <integer> <numeric>
      a     chr1   101-111      - |         1  1.000000
      b     chr2   102-112      + |         2  0.888889
      c     chr2   103-113      + |         3  0.777778
      d     chr2   104-114      * |         4  0.666667
      e     chr1   105-115      * |         5  0.555556
      f     chr1   106-116      + |         6  0.444444
      g     chr3   107-117      + |         7  0.333333
      h     chr3   108-118      + |         8  0.222222
      i     chr3   109-119      - |         9  0.111111
      j     chr3   110-120      - |        10  0.000000
      -------
      seqinfo: 3 sequences from an unspecified genome; no seqlengths
    > gr1
    GRanges object with 1 range and 1 metadata column:
          seqnames    ranges strand |     score
             <Rle> <IRanges>  <Rle> | <numeric>
      [1]     chr1   100-110      - |         1
      -------
      seqinfo: 1 sequence from an unspecified genome; no seqlengths
    
    > findOverlaps(gr,gr1,  ignore.strand = T)
    Hits object with 3 hits and 0 metadata columns:
          queryHits subjectHits
          <integer>   <integer>
      [1]         1           1
      [2]         5           1
      [3]         6           1
      -------
      queryLength: 10 / subjectLength: 1
    
    

    type参数尝试

    > findOverlaps(gr,gr1,  ignore.strand = T, type = "any")
    Hits object with 3 hits and 0 metadata columns:
          queryHits subjectHits
          <integer>   <integer>
      [1]         1           1
      [2]         5           1
      [3]         6           1
      -------
      queryLength: 10 / subjectLength: 1
    
    > findOverlaps(gr,gr1,  ignore.strand = T, type = "start")
    Hits object with 0 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
      -------
      queryLength: 10 / subjectLength: 1
    
    > findOverlaps(gr,gr1,  ignore.strand = T, type = "end")
    Hits object with 0 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
      -------
      queryLength: 10 / subjectLength: 1
    
    > gr2 <- GRanges(
    +        seqnames = "chr1",
    +        ranges = IRanges(c(100,103), end = c(102,110)),
    +        strand = Rle(strand(c("-","+"))),
    +        score = c(1,2))
    > gr2
    GRanges object with 2 ranges and 1 metadata column:
          seqnames    ranges strand |     score
             <Rle> <IRanges>  <Rle> | <numeric>
      [1]     chr1   100-102      - |         1
      [2]     chr1   103-110      + |         2
      -------
      seqinfo: 1 sequence from an unspecified genome; no seqlengths
    
    > findOverlaps(gr2,gr,  ignore.strand = T, type = "within")
    Hits object with 1 hit and 0 metadata columns:
          queryHits subjectHits
          <integer>   <integer>
      [1]         2           1
      -------
      queryLength: 2 / subjectLength: 10
    

    使用queryHits() 以及subjectHits() 提取数据


    image.png

    相关文章

      网友评论

        本文标题:findOverlaps

        本文链接:https://www.haomeiwen.com/subject/zjyaxrtx.html