Bioconductor拥有丰富的元数据生态系统,包括包、使用和构建状态。这个包是一个简单的函数集合,用于以整洁的数据格式访问来自R的元数据。其目标是公开数据挖掘和增值功能(如包搜索、文本挖掘和包分析)的元数据。
Functionality includes access to :
Download statistics
General package listing
Build reports
Package dependency graphs
Vignettes
Bioconductor构建报告可以在网上以HTML页面的形式获得。然而,它们计算友好的。biocBuildReport函数对HTML进行了解析,生成一个整洁的数据。方便分析Bioconductor中的R包相互关系,为用户寻找和探索已有的R包提供便捷。
library(BiocPkgTools)
head(biocBuildReport())
## # A tibble: 6 x 9
## pkg version author commit last_changed_date node stage result
## <chr> <chr> <chr> <chr> <dttm> <chr> <chr> <chr>
## 1 a4 1.31.0 Tobia… a53c… 2018-10-30 00:00:00 malb… inst… OK
## 2 a4 1.31.0 Tobia… a53c… 2018-10-30 00:00:00 malb… buil… OK
## 3 a4 1.31.0 Tobia… a53c… 2018-10-30 00:00:00 malb… chec… OK
## 4 a4 1.31.0 Tobia… a53c… 2018-10-30 00:00:00 toka… inst… OK
## 5 a4 1.31.0 Tobia… a53c… 2018-10-30 00:00:00 toka… buil… OK
## 6 a4 1.31.0 Tobia… a53c… 2018-10-30 00:00:00 toka… chec… OK
## # … with 1 more variable: bioc_version <chr>
因为开发人员可能对他们自己的包的快速视图感兴趣,所以有一个简单的函数,problemPage,来生成一个与给定作者regex匹配的包构建状态的HTML报告。默认情况下只报告“问题”构建状态(错误、警告)。
problemPage()
Bioconductor提供所有软件包的下载统计数据。biocDownloadStats函数获取所有实验数据、注释数据和软件包中所有包的所有可用下载统计信息。结果以整洁的数据形式返回,作为进一步分析的框架.
head(biocDownloadStats())
## # A tibble: 6 x 7
## Package Year Month Nb_of_distinct_IPs Nb_of_downloads repo Date
## <chr> <int> <chr> <int> <int> <chr> <date>
## 1 ABarray 2019 Jan 104 210 Software 2019-01-01
## 2 ABarray 2019 Feb 80 164 Software 2019-02-01
## 3 ABarray 2019 Mar 144 192 Software 2019-03-01
## 4 ABarray 2019 Apr 140 259 Software 2019-04-01
## 5 ABarray 2019 May 0 0 Software 2019-05-01
## 6 ABarray 2019 Jun 0 0 Software 2019-06-01
每个R包的描述文件包含大量关于包作者、依赖、版本等的信息。在诸如Bioconductor这样的存储库中,这些详细信息可用于所有包含的包。biocPkgList返回一个数据。大量的信息是可用的,结果的列名证明了这一点。
bpi = biocPkgList()
colnames(bpi)
## [1] "Package" "Version"
## [3] "Depends" "Suggests"
## [5] "License" "MD5sum"
## [7] "NeedsCompilation" "Title"
## [9] "Description" "biocViews"
## [11] "Author" "Maintainer"
## [13] "git_url" "git_branch"
## [15] "git_last_commit" "git_last_commit_date"
## [17] "Date/Publication" "source.ver"
## [19] "win.binary.ver" "mac.binary.el-capitan.ver"
## [21] "vignettes" "vignetteTitles"
## [23] "hasREADME" "hasNEWS"
## [25] "hasINSTALL" "hasLICENSE"
## [27] "Rfiles" "Enhances"
## [29] "dependsOnMe" "Imports"
## [31] "importsMe" "suggestsMe"
## [33] "LinkingTo" "Archs"
## [35] "VignetteBuilder" "URL"
## [37] "SystemRequirements" "BugReports"
## [39] "Video" "linksToMe"
## [41] "OS_type" "License_restricts_use"
## [43] "PackageStatus" "License_is_FOSS"
## [45] "organism"
head(bpi)
## # A tibble: 6 x 45
## Package Version Depends Suggests License MD5sum NeedsCompilation Title
## <chr> <chr> <list> <list> <chr> <chr> <chr> <chr>
## 1 a4 1.31.0 <chr [… <chr [4… GPL-3 31072… no Auto…
## 2 a4Base 1.31.0 <chr [… <chr [2… GPL-3 2dec7… no Auto…
## 3 a4Clas… 1.31.0 <chr [… <chr [1… GPL-3 4bbcd… no Auto…
## 4 a4Core 1.31.0 <chr [… <chr [1… GPL-3 a2c0c… no Auto…
## 5 a4Prep… 1.31.0 <chr [… <chr [2… GPL-3 087b7… no Auto…
## 6 a4Repo… 1.31.0 <chr [… <chr [1… GPL-3 1635a… no Auto…
## # … with 37 more variables: Description <chr>, biocViews <list>,
## # Author <list>, Maintainer <list>, git_url <chr>, git_branch <chr>,
## # git_last_commit <chr>, git_last_commit_date <chr>,
## # `Date/Publication` <chr>, source.ver <chr>, win.binary.ver <chr>,
## # `mac.binary.el-capitan.ver` <chr>, vignettes <list>,
## # vignetteTitles <list>, hasREADME <chr>, hasNEWS <chr>, hasINSTALL <chr>,
## # hasLICENSE <chr>, Rfiles <list>, Enhances <list>, dependsOnMe <list>,
## # Imports <list>, importsMe <list>, suggestsMe <list>, LinkingTo <list>,
## # Archs <list>, VignetteBuilder <chr>, URL <chr>,
## # SystemRequirements <chr>, BugReports <chr>, Video <chr>,
## # linksToMe <list>, OS_type <chr>, License_restricts_use <chr>,
## # PackageStatus <chr>, License_is_FOSS <chr>, organism <chr>
作为如何使用这些列的简单示例,提取importsMe列来查找导入GEOquery包的包。
require(dplyr)
bpi = biocPkgList()
bpi %>%
filter(Package=="GEOquery") %>%
pull(importsMe) %>%
unlist()
## [1] "bigmelon" "ChIPXpress" "coexnet" "crossmeta"
## [5] "EGAD" "GAPGOM" "GSEABenchmarkeR" "MACPET"
## [9] "minfi" "MoonlightR" "phantasus" "recount"
## [13] "SRAdb"
Package Explorer
对于Bioconductor的最终用户,分析通常从找到一个或一组执行所需任务的包开始,或者根据特定的操作或数据类型进行定制。biocExplore()函数实现了一个交互式气泡可视化,并基于biocViews术语进行过滤。气泡的大小是根据下载统计数据确定的。工具提示和单击细节功能也包括在内。启动本地会话:
Dependency graphs
Bioconductor生态系统是围绕互操作性和依赖性的概念构建的。这些相互依赖关系可以作为biocPkgList()输出的一部分。BiocPkgTools提供了一些方便的函数来将包依赖关系转换为R图。
- Create a
data.frame
of dependencies usingbuildPkgDependencyDataFrame
. - Create an
igraph
object from the dependency data frame usingbuildPkgDependencyIgraph
- Use native
igraph
functionality to perform arbitrary network operations. Convenience functions,inducedSubgraphByPkgs
andsubgraphByDegree
are available. - Visualize with packages such as visNetwork.
Working with dependency graphs
library(BiocPkgTools)
dep_df = buildPkgDependencyDataFrame()
g = buildPkgDependencyIgraph(dep_df)
g
## IGRAPH a244fce DN-- 3113 25939 --
## + attr: name (v/c), edgetype (e/c)
## + edges from a244fce (vertex names):
## [1] a4 ->a4Base a4 ->a4Preproc
## [3] a4 ->a4Classif a4 ->a4Core
## [5] a4 ->a4Reporting a4Base ->methods
## [7] a4Base ->graphics a4Base ->grid
## [9] a4Base ->Biobase a4Base ->AnnotationDbi
## [11] a4Base ->annaffy a4Base ->mpm
## [13] a4Base ->genefilter a4Base ->limma
## [15] a4Base ->multtest a4Base ->glmnet
## + ... omitted several edges
library(igraph)
head(V(g))
## + 6/3113 vertices, named, from a244fce:
## [1] a4 a4Base a4Classif a4Core a4Preproc a4Reporting
head(E(g))
## + 6/25939 edges from a244fce (vertex names):
## [1] a4 ->a4Base a4 ->a4Preproc a4 ->a4Classif
## [4] a4 ->a4Core a4 ->a4Reporting a4Base->methods
有关图形分析、设置顶点和边属性以及高级子设置的更多细节,请参见igraph文档。
Graph visualization
visNetwork包是一个很好的交互式可视化工具,可以在浏览器中实现图形绘制。它可以集成到Rmarkdown 的应用程序中。交互式图形也可以包含在Rmarkdown文档中(参见vignette)。
igraph_network = buildPkgDependencyIgraph(buildPkgDependencyDataFrame())
尽管这样做是可能的,但完整的依赖关系图实际上并不能提供足够的信息。一个常见的用例是将依赖关系图“集中”在感兴趣的包上。在本例中,我将重点介绍GEOquery包。
igraph_geoquery_network = subgraphByDegree(igraph_network, "GEOquery")
The subgraphByDegree() function returns all nodes and connections within degree of the named package; the default degree is 1.
visNework包可以直接绘制igraph对象,但是首先将图形转换为visNetwork形式可以提供更大的灵活性。
library(visNetwork)
data <- toVisNetworkData(igraph_geoquery_network)
visNetwork(nodes = data$nodes, edges = data$edges, height = "500px")
有趣的是,我们可以看到图形在绘制过程中稳定下来,最好是交互式查看。
visNetwork(nodes = data$nodes, edges = data$edges, height = "500px") %>%
visPhysics(stabilization=FALSE)
data$edges$color='lightblue'
data$edges[data$edges$edgetype=='Imports','color']= 'red'
data$edges[data$edges$edgetype=='Depends','color']= 'green'
visNetwork(nodes = data$nodes, edges = data$edges, height = "500px") %>%
visEdges(arrows='from')
ledges <- data.frame(color = c("green", "lightblue", "red"),
label = c("Depends", "Suggests", "Imports"), arrows =c("from", "from", "from"))
visNetwork(nodes = data$nodes, edges = data$edges, height = "500px") %>%
visEdges(arrows='from') %>%
visLegend(addEdges=ledges)
Integration with BiocViews
library(biocViews)
data(biocViewsVocab)
biocViewsVocab
## A graphNEL graph with directed edges
## Number of Nodes = 476
## Number of Edges = 475
library(igraph)
g = igraph.from.graphNEL(biocViewsVocab)
library(visNetwork)
gv = toVisNetworkData(g)
visNetwork(gv$nodes, gv$edges, width="100%") %>%
visIgraphLayout(layout = "layout_as_tree", circular=TRUE) %>%
visNodes(size=20) %>%
visPhysics(stabilization=FALSE)
网友评论