美文网首页生物信息学R语言源码
pyclone的输入格式:通过GenomicRanges准备输入

pyclone的输入格式:通过GenomicRanges准备输入

作者: 果蝇的小翅膀 | 来源:发表于2020-07-29 09:56 被阅读0次

    问题:

    在准备pyclone的输入的时候,需要点突变和对应的拷贝数信息。点突变的信息可以通过varscan获得,而拷贝数的信息可以通过sequenza获得,但是如何整合点突变和拷贝数的信息?

    附pyclone的输入格式:

    Tsv files for pyclone:
    -mutation_id - A unique ID to identify the mutation.
    -ref_counts - The number of reads covering the mutation which contain the reference (genome) allele.
    -var_counts - The number of reads covering the mutation which contain the variant allele.
    -normal_cn - The copy number of the cells in the normal population. For autosomal chromosomes this will be 2 and for sex chromosomes it could be either 1 or 2.
    -minor_cn - The minor copy number of the cancer cells. Usually
    -major_cn - The major copy number of the cancer cells.

    实例:

    在Bioconductor中的 GenomicRanges这个包主要用来处理这样的问题,可以参考其中的说明。

    1. 首先准备实例数据
    library(GenomicRanges)
    library(magrittr)
    #准备实例数据
    con<- textConnection("chr     start    end   rs   ref_counts  var_counts
    1           100          100      rs01   100      10
    1           200          200      rs02   200      20
    1           300          300      rs03   300      30")
    snps <- read.delim(con, head=TRUE, sep="")
    
    con <- textConnection("chr     start    end      minor_cn  major_cn
    1           10          150      0     2
    1           250          450      1     2
    1           400          600      2     2")  
    cnvs <- read.delim(con, head=TRUE, sep="")
    
    准备数据
    1. 通过GRanges函数将数据读取到GRanges中
    gsnps = GRanges(seqnames = snps$chr ,
                   ranges = IRanges(snps$start , snps$end ),
                   strand = "+" )
    #metadata columns can be added to a GRanges object,将表格中的其他信息添加到gsnps中
    mcols(gsnps) = snps
    
    gcnvs = GRanges(seqnames = cnvs$chr,
                   ranges = IRanges(cnvs$start , cnvs$end ),
                   strand = "+" )
    #metadata columns can be added to a GRanges object
    mcols(gcnvs) = cnvs
    
    1. 通过findOverlaps寻找它们之间的交集。根据overlaps可以看到它们的交集情况,最后得到的merge就是pyclone的输入了。
    overlaps =findOverlaps(gsnps, gcnvs )
    overlaps
    #Hits object with 2 hits and 0 metadata columns:
    #      queryHits subjectHits
    #     <integer>   <integer>
    # [1]         1           1
    # [2]         3           2
      -------
    #queryLength: 3 / subjectLength: 3
    
    #合并信息。
    merge = cbind( mcols(gsnps[queryHits(overlaps), ]) , mcols(gcnvs[subjectHits(overlaps) ,]) )
    
    #整合信息
    merge = as.data.frame(merge) %>%
         dplyr::mutate(mutation_id = rs,
                    normal_cn = 2,
                    ) %>%
      dplyr::select(mutation_id, ref_counts, var_counts, normal_cn, minor_cn, major_cn)
    
    merge
    # mutation_id ref_counts var_counts normal_cn minor_cn major_cn
    #1        rs01        100         10         2        0        2
    #2        rs03        300         30         2        1        2
    

    参考文献:

    1. pyclone的使用帮助
    2. GenomicRanges的帮助文档,主要参见 part4: Interval overlaps involving GRanges and GRangesList objects
    3. https://stackoverflow.com/questions/11892241/merge-by-range-in-r-applying-loops

    相关文章

      网友评论

        本文标题:pyclone的输入格式:通过GenomicRanges准备输入

        本文链接:https://www.haomeiwen.com/subject/tuncrktx.html