美文网首页
基于R的网络分析(一): 基本操作

基于R的网络分析(一): 基本操作

作者: drlee_fc74 | 来源:发表于2020-04-11 07:28 被阅读0次

以下是基于dataCamp里面的<network analysis in R>课程以及Network
visualization with R

的学习笔记。同时由于现在对于网络数据的处理还有一个包(tidygraph)。由于tidygraph的数据是tbl的。所以对于tidyverse处理都成无缝衔接。所以这里也就顺带学习了以下这个包的使用。

library(igraph)
library(tidygraph)
library(tidyverse)

网络的基本要素

对于网络数据,主要是包括两个元素,一个是顶点(vertices/nodes),另外一个是连接线(edges)。我们在提供数据的时候也是基于这两个元素来提供数据的。


image-20200402121502121

网络对象构建

对于网络数据,我们需要提前创建一个和网络有关的对象。igraphtidygraph具有可以转换数据的对象自己的函数.

igraph

igraph可以通过graph_from_data_frame函数来构建网络数据。这个数据集需要提供网络之间的连接线信息以及节点信息。同时可以选择网络是否是有方向的。

nodes <- read.csv("./Data/Dataset1-Media-Example-NODES.csv", header=T, as.is=T) 
links <- read.csv("./Data/Dataset1-Media-Example-EDGES.csv", header=T, as.is=T)
net <- graph_from_data_frame(d=links, vertices=nodes, directed=T) 
net

## IGRAPH c3731a0 DNW- 17 49 -- 
## + attr: name (v/c), media (v/c), media.type (v/n), type.label (v/c),
## | audience.size (v/n), type (e/c), weight (e/n)
## + edges from c3731a0 (vertex names):
##  [1] s01->s02 s01->s03 s01->s04 s01->s15 s02->s01 s02->s03 s02->s09 s02->s10
##  [9] s03->s01 s03->s04 s03->s05 s03->s08 s03->s10 s03->s11 s03->s12 s04->s03
## [17] s04->s06 s04->s11 s04->s12 s04->s17 s05->s01 s05->s02 s05->s09 s05->s15
## [25] s06->s06 s06->s16 s06->s17 s07->s03 s07->s08 s07->s10 s07->s14 s08->s03
## [33] s08->s07 s08->s09 s09->s10 s10->s03 s12->s06 s12->s13 s12->s14 s13->s12
## [41] s13->s17 s14->s11 s14->s13 s15->s01 s15->s04 s15->s06 s16->s06 s16->s17
## [49] s17->s04

tidygraph

tidygrph包提供了可以把基本的构建网络对象的函数tbl_graph。通过这个函数可以构建网络对象。同时对于数据库;矩阵;igraph网络对象可以通过as_tbl_graph来进行转换。

### 直接构建网络对象
net1 <- tbl_graph(nodes = nodes, edges = links, directed = T)
### 转换igraph的对象
net2 <- as_tbl_graph(net)
net2

## # A tbl_graph: 17 nodes and 49 edges
## #
## # A directed multigraph with 1 component
## #
## # Node Data: 17 x 5 (active)
##   name  media               media.type type.label audience.size
##   <chr> <chr>                    <int> <chr>              <int>
## 1 s01   NY Times                     1 Newspaper             20
## 2 s02   Washington Post              1 Newspaper             25
## 3 s03   Wall Street Journal          1 Newspaper             30
## 4 s04   USA Today                    1 Newspaper             32
## 5 s05   LA Times                     1 Newspaper             20
## 6 s06   New York Post                1 Newspaper             50
## # … with 11 more rows
## #
## # Edge Data: 49 x 4
##    from    to type      weight
##   <int> <int> <chr>      <int>
## 1     1     2 hyperlink     22
## 2     1     3 hyperlink     22
## 3     1     4 hyperlink     21
## # … with 46 more rows

网络对象的查看

网络对象构建完之后,我们可以查看相关的信息

igraph

igraph可以通过V函数查看node的标签信息。通过vertex_attr可以看对于node的所有注释信息
通过E函数查看edges的连接信息。通过edges_attr可以看edges的所有注释信息。

## 查看node的信息
vertex_attr(net)

## $name
##  [1] "s01" "s02" "s03" "s04" "s05" "s06" "s07" "s08" "s09" "s10" "s11" "s12"
## [13] "s13" "s14" "s15" "s16" "s17"
## 
## $media
##  [1] "NY Times"            "Washington Post"     "Wall Street Journal"
##  [4] "USA Today"           "LA Times"            "New York Post"      
##  [7] "CNN"                 "MSNBC"               "FOX News"           
## [10] "ABC"                 "BBC"                 "Yahoo News"         
## [13] "Google News"         "Reuters.com"         "NYTimes.com"        
## [16] "WashingtonPost.com"  "AOL.com"            
## 
## $media.type
##  [1] 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3
## 
## $type.label
##  [1] "Newspaper" "Newspaper" "Newspaper" "Newspaper" "Newspaper" "Newspaper"
##  [7] "TV"        "TV"        "TV"        "TV"        "TV"        "Online"   
## [13] "Online"    "Online"    "Online"    "Online"    "Online"   
## 
## $audience.size
##  [1] 20 25 30 32 20 50 56 34 60 23 34 33 23 12 24 28 33

## 查看node的标签
V(net)

## + 17/17 vertices, named, from c3731a0:
##  [1] s01 s02 s03 s04 s05 s06 s07 s08 s09 s10 s11 s12 s13 s14 s15 s16 s17

### 查看edges的注释信息
edge_attr(net)

## $type
##  [1] "hyperlink" "hyperlink" "hyperlink" "mention"   "hyperlink" "hyperlink"
##  [7] "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink"
## [13] "mention"   "hyperlink" "hyperlink" "hyperlink" "mention"   "mention"  
## [19] "hyperlink" "mention"   "mention"   "hyperlink" "hyperlink" "mention"  
## [25] "hyperlink" "hyperlink" "mention"   "mention"   "mention"   "hyperlink"
## [31] "mention"   "hyperlink" "mention"   "mention"   "mention"   "hyperlink"
## [37] "mention"   "hyperlink" "mention"   "hyperlink" "mention"   "mention"  
## [43] "mention"   "hyperlink" "hyperlink" "hyperlink" "hyperlink" "mention"  
## [49] "hyperlink"
## 
## $weight
##  [1] 22 22 21 20 23 21  1  5 21 22  1  4  2  1  1 23  1 22  3  2  1 21  2 21  1
## [26] 21 21  1 22 21  4  2 21 23 21  2  2 22 22 21  1  1 21 22  1  4 23 21  4

### 查看edges的连接信息
E(net)

## + 49/49 edges from c3731a0 (vertex names):
##  [1] s01->s02 s01->s03 s01->s04 s01->s15 s02->s01 s02->s03 s02->s09 s02->s10
##  [9] s03->s01 s03->s04 s03->s05 s03->s08 s03->s10 s03->s11 s03->s12 s04->s03
## [17] s04->s06 s04->s11 s04->s12 s04->s17 s05->s01 s05->s02 s05->s09 s05->s15
## [25] s06->s06 s06->s16 s06->s17 s07->s03 s07->s08 s07->s10 s07->s14 s08->s03
## [33] s08->s07 s08->s09 s09->s10 s10->s03 s12->s06 s12->s13 s12->s14 s13->s12
## [41] s13->s17 s14->s11 s14->s13 s15->s01 s15->s04 s15->s06 s16->s06 s16->s17
## [49] s17->s04

tidygraph

对于tidygraph的对象而言。首先这个对象是无缝衔接igraph的参数的。所以上面的那些参数都是可以使用的。另外呢,tidygraph含有一个activate函数可以来提取相对应的信息。这个函数支持nodesedges这两个参数。提取的结果通过as.*就可以转换为数据框来进行查看了。

### 查看nodes信息
net2 %>% activate(nodes) %>% as_tibble() %>% head()

## # A tibble: 6 x 5
##   name  media               media.type type.label audience.size
##   <chr> <chr>                    <int> <chr>              <int>
## 1 s01   NY Times                     1 Newspaper             20
## 2 s02   Washington Post              1 Newspaper             25
## 3 s03   Wall Street Journal          1 Newspaper             30
## 4 s04   USA Today                    1 Newspaper             32
## 5 s05   LA Times                     1 Newspaper             20
## 6 s06   New York Post                1 Newspaper             50

### 查看edges信息
net2 %>% activate(edges) %>% as.data.frame() %>% head()

##   from to      type weight
## 1    1  2 hyperlink     22
## 2    1  3 hyperlink     22
## 3    1  4 hyperlink     21
## 4    1 15   mention     20
## 5    2  1 hyperlink     23
## 6    2  3 hyperlink     21

网络信息的筛选

igraph

igraph
可以进行相关信息筛选的查看。但是筛选完的数据,如果想要进行网络可视化的话。就需要重新的进行定义网络对象了。

## 基于node的注释信息筛选node
V(net)[type.label == "TV"]

## + 5/17 vertices, named, from c3731a0:
## [1] s07 s08 s09 s10 s11

## 查看某一个node的edge信息
E(net)[[inc("s01")]]

## + 8/49 edges from c3731a0 (vertex names):
##    tail head tid hid      type weight
## 1   s01  s02   1   2 hyperlink     22
## 2   s01  s03   1   3 hyperlink     22
## 3   s01  s04   1   4 hyperlink     21
## 4   s01  s15   1  15   mention     20
## 5   s02  s01   2   1 hyperlink     23
## 9   s03  s01   3   1 hyperlink     21
## 21  s05  s01   5   1   mention      1
## 44  s15  s01  15   1 hyperlink     22

## 基于某一个标准筛选edges
E(net)[[type == "heyperlink"]]

## + 0/49 edges from c3731a0 (vertex names):
## [1] tail   head   tid    hid    type   weight
## <0 rows> (or 0-length row.names)

tidygraph

通过activate
我们可以提取相关的node/edge信息。然后利用dplyr相关参数进行添加/修改即可。这样筛选完的对象还是网络对象。可以继续进行可视化的操作。

net2 %>% activate(nodes) %>% filter(type.label == "TV") %>% 
    activate(edges) %>% filter(type == "mention")

## # A tbl_graph: 5 nodes and 4 edges
## #
## # A directed simple graph with 2 components
## #
## # Edge Data: 4 x 4 (active)
##    from    to type    weight
##   <int> <int> <chr>    <int>
## 1     1     2 mention     22
## 2     2     1 mention     21
## 3     2     3 mention     23
## 4     3     4 mention     21
## #
## # Node Data: 5 x 5
##   name  media    media.type type.label audience.size
##   <chr> <chr>         <int> <chr>              <int>
## 1 s07   CNN               2 TV                    56
## 2 s08   MSNBC             2 TV                    34
## 3 s09   FOX News          2 TV                    60
## # … with 2 more rows

注释信息的添加/删除

igraph

igraph
的数据储存都是list格式的,所以如果要添加额外的注释信息,我们可以使用$
来进行添加。如果要添加node信息使用V;如果要添加edges信息则使用E

## 添加color的信息
V(net)$color <- ifelse(V(net)$type.label == "TV", "red", "blue")
vertex_attr(net)

## $name
##  [1] "s01" "s02" "s03" "s04" "s05" "s06" "s07" "s08" "s09" "s10" "s11" "s12"
## [13] "s13" "s14" "s15" "s16" "s17"
## 
## $media
##  [1] "NY Times"            "Washington Post"     "Wall Street Journal"
##  [4] "USA Today"           "LA Times"            "New York Post"      
##  [7] "CNN"                 "MSNBC"               "FOX News"           
## [10] "ABC"                 "BBC"                 "Yahoo News"         
## [13] "Google News"         "Reuters.com"         "NYTimes.com"        
## [16] "WashingtonPost.com"  "AOL.com"            
## 
## $media.type
##  [1] 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3
## 
## $type.label
##  [1] "Newspaper" "Newspaper" "Newspaper" "Newspaper" "Newspaper" "Newspaper"
##  [7] "TV"        "TV"        "TV"        "TV"        "TV"        "Online"   
## [13] "Online"    "Online"    "Online"    "Online"    "Online"   
## 
## $audience.size
##  [1] 20 25 30 32 20 50 56 34 60 23 34 33 23 12 24 28 33
## 
## $color
##  [1] "blue" "blue" "blue" "blue" "blue" "blue" "red"  "red"  "red"  "red" 
## [11] "red"  "blue" "blue" "blue" "blue" "blue" "blue"

tidygraph

在我们使用activate之后,可以提取相对应的信息,然后通过mutate即可来添加其他的信息了

net2 %>% activate(nodes) %>% 
    mutate(color = if_else(type.label == "TV", "red", "blue")) %>% 
    pull(color)

##  [1] "blue" "blue" "blue" "blue" "blue" "blue" "red"  "red"  "red"  "red" 
## [11] "red"  "blue" "blue" "blue" "blue" "blue" "blue"

相关文章

网友评论

      本文标题:基于R的网络分析(一): 基本操作

      本文链接:https://www.haomeiwen.com/subject/txkemhtx.html