pyclone的输入格式：通过GenomicRanges准备输入

作者: 果蝇的小翅膀 | 来源:发表于2020-07-29 09:56 被阅读0次

pyclone的输入格式：通过GenomicRanges准备输入
SAS变量的输入格式
Wego input文件准备
七，表单标签
Swagger 文档在线导出（支持 PDF, HTML, Mar
Hadoop -数据输入输出格式和自定义数据输入输出格式
filebeat采集docker日志
入门训练1 计算A+B
格式化输出
格式输入函数

问题：

在准备pyclone的输入的时候，需要点突变和对应的拷贝数信息。点突变的信息可以通过varscan获得，而拷贝数的信息可以通过sequenza获得，但是如何整合点突变和拷贝数的信息？

附pyclone的输入格式：

Tsv files for pyclone:
-mutation_id - A unique ID to identify the mutation.
-ref_counts - The number of reads covering the mutation which contain the reference (genome) allele.
-var_counts - The number of reads covering the mutation which contain the variant allele.
-normal_cn - The copy number of the cells in the normal population. For autosomal chromosomes this will be 2 and for sex chromosomes it could be either 1 or 2.
-minor_cn - The minor copy number of the cancer cells. Usually
-major_cn - The major copy number of the cancer cells.

实例：

在Bioconductor中的 GenomicRanges这个包主要用来处理这样的问题,可以参考其中的说明。

首先准备实例数据

library(GenomicRanges)
library(magrittr)
#准备实例数据
con<- textConnection("chr     start    end   rs   ref_counts  var_counts
1           100          100      rs01   100      10
1           200          200      rs02   200      20
1           300          300      rs03   300      30")
snps <- read.delim(con, head=TRUE, sep="")

con <- textConnection("chr     start    end      minor_cn  major_cn
1           10          150      0     2
1           250          450      1     2
1           400          600      2     2")  
cnvs <- read.delim(con, head=TRUE, sep="")

准备数据

通过GRanges函数将数据读取到GRanges中

gsnps = GRanges(seqnames = snps$chr ,
               ranges = IRanges(snps$start , snps$end ),
               strand = "+" )
#metadata columns can be added to a GRanges object,将表格中的其他信息添加到gsnps中
mcols(gsnps) = snps

gcnvs = GRanges(seqnames = cnvs$chr,
               ranges = IRanges(cnvs$start , cnvs$end ),
               strand = "+" )
#metadata columns can be added to a GRanges object
mcols(gcnvs) = cnvs

通过findOverlaps寻找它们之间的交集。根据overlaps可以看到它们的交集情况,最后得到的merge就是pyclone的输入了。

overlaps =findOverlaps(gsnps, gcnvs )
overlaps
#Hits object with 2 hits and 0 metadata columns:
#      queryHits subjectHits
#     <integer>   <integer>
# [1]         1           1
# [2]         3           2
  -------
#queryLength: 3 / subjectLength: 3

#合并信息。
merge = cbind( mcols(gsnps[queryHits(overlaps), ]) , mcols(gcnvs[subjectHits(overlaps) ,]) )

#整合信息
merge = as.data.frame(merge) %>%
     dplyr::mutate(mutation_id = rs,
                normal_cn = 2,
                ) %>%
  dplyr::select(mutation_id, ref_counts, var_counts, normal_cn, minor_cn, major_cn)

merge
# mutation_id ref_counts var_counts normal_cn minor_cn major_cn
#1        rs01        100         10         2        0        2
#2        rs03        300         30         2        1        2

参考文献：

网友评论

生物信息学R语言源码

本文标题：pyclone的输入格式：通过GenomicRanges准备输入

本文链接：https://www.haomeiwen.com/subject/tuncrktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

pyclone的输入格式：通过GenomicRanges准备输入

问题：

实例：

相关文章

pyclone的输入格式：通过GenomicRanges准备输入

SAS变量的输入格式

Wego input文件准备

七，表单标签

Swagger 文档在线导出（支持 PDF, HTML, Mar

Hadoop -数据输入输出格式和自定义数据输入输出格式

filebeat采集docker日志

入门训练1 计算A+B

格式化输出

格式输入函数

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

生物信息学R语言源码