美文网首页
基于R的网络分析(一): 基本操作

基于R的网络分析(一): 基本操作

作者: drlee_fc74 | 来源:发表于2020-04-11 07:28 被阅读0次

    以下是基于dataCamp里面的<network analysis in R>课程以及Network
    visualization with R

    的学习笔记。同时由于现在对于网络数据的处理还有一个包(tidygraph)。由于tidygraph的数据是tbl的。所以对于tidyverse处理都成无缝衔接。所以这里也就顺带学习了以下这个包的使用。

    library(igraph)
    library(tidygraph)
    library(tidyverse)
    

    网络的基本要素

    对于网络数据,主要是包括两个元素,一个是顶点(vertices/nodes),另外一个是连接线(edges)。我们在提供数据的时候也是基于这两个元素来提供数据的。


    image-20200402121502121

    网络对象构建

    对于网络数据,我们需要提前创建一个和网络有关的对象。igraphtidygraph具有可以转换数据的对象自己的函数.

    igraph

    igraph可以通过graph_from_data_frame函数来构建网络数据。这个数据集需要提供网络之间的连接线信息以及节点信息。同时可以选择网络是否是有方向的。

    nodes <- read.csv("./Data/Dataset1-Media-Example-NODES.csv", header=T, as.is=T) 
    links <- read.csv("./Data/Dataset1-Media-Example-EDGES.csv", header=T, as.is=T)
    net <- graph_from_data_frame(d=links, vertices=nodes, directed=T) 
    net
    
    ## IGRAPH c3731a0 DNW- 17 49 -- 
    ## + attr: name (v/c), media (v/c), media.type (v/n), type.label (v/c),
    ## | audience.size (v/n), type (e/c), weight (e/n)
    ## + edges from c3731a0 (vertex names):
    ##  [1] s01->s02 s01->s03 s01->s04 s01->s15 s02->s01 s02->s03 s02->s09 s02->s10
    ##  [9] s03->s01 s03->s04 s03->s05 s03->s08 s03->s10 s03->s11 s03->s12 s04->s03
    ## [17] s04->s06 s04->s11 s04->s12 s04->s17 s05->s01 s05->s02 s05->s09 s05->s15
    ## [25] s06->s06 s06->s16 s06->s17 s07->s03 s07->s08 s07->s10 s07->s14 s08->s03
    ## [33] s08->s07 s08->s09 s09->s10 s10->s03 s12->s06 s12->s13 s12->s14 s13->s12
    ## [41] s13->s17 s14->s11 s14->s13 s15->s01 s15->s04 s15->s06 s16->s06 s16->s17
    ## [49] s17->s04
    

    tidygraph

    tidygrph包提供了可以把基本的构建网络对象的函数tbl_graph。通过这个函数可以构建网络对象。同时对于数据库;矩阵;igraph网络对象可以通过as_tbl_graph来进行转换。

    ### 直接构建网络对象
    net1 <- tbl_graph(nodes = nodes, edges = links, directed = T)
    ### 转换igraph的对象
    net2 <- as_tbl_graph(net)
    net2
    
    ## # A tbl_graph: 17 nodes and 49 edges
    ## #
    ## # A directed multigraph with 1 component
    ## #
    ## # Node Data: 17 x 5 (active)
    ##   name  media               media.type type.label audience.size
    ##   <chr> <chr>                    <int> <chr>              <int>
    ## 1 s01   NY Times                     1 Newspaper             20
    ## 2 s02   Washington Post              1 Newspaper             25
    ## 3 s03   Wall Street Journal          1 Newspaper             30
    ## 4 s04   USA Today                    1 Newspaper             32
    ## 5 s05   LA Times                     1 Newspaper             20
    ## 6 s06   New York Post                1 Newspaper             50
    ## # … with 11 more rows
    ## #
    ## # Edge Data: 49 x 4
    ##    from    to type      weight
    ##   <int> <int> <chr>      <int>
    ## 1     1     2 hyperlink     22
    ## 2     1     3 hyperlink     22
    ## 3     1     4 hyperlink     21
    ## # … with 46 more rows
    

    网络对象的查看

    网络对象构建完之后,我们可以查看相关的信息

    igraph

    igraph可以通过V函数查看node的标签信息。通过vertex_attr可以看对于node的所有注释信息
    通过E函数查看edges的连接信息。通过edges_attr可以看edges的所有注释信息。

    ## 查看node的信息
    vertex_attr(net)
    
    ## $name
    ##  [1] "s01" "s02" "s03" "s04" "s05" "s06" "s07" "s08" "s09" "s10" "s11" "s12"
    ## [13] "s13" "s14" "s15" "s16" "s17"
    ## 
    ## $media
    ##  [1] "NY Times"            "Washington Post"     "Wall Street Journal"
    ##  [4] "USA Today"           "LA Times"            "New York Post"      
    ##  [7] "CNN"                 "MSNBC"               "FOX News"           
    ## [10] "ABC"                 "BBC"                 "Yahoo News"         
    ## [13] "Google News"         "Reuters.com"         "NYTimes.com"        
    ## [16] "WashingtonPost.com"  "AOL.com"            
    ## 
    ## $media.type
    ##  [1] 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3
    ## 
    ## $type.label
    ##  [1] "Newspaper" "Newspaper" "Newspaper" "Newspaper" "Newspaper" "Newspaper"
    ##  [7] "TV"        "TV"        "TV"        "TV"        "TV"        "Online"   
    ## [13] "Online"    "Online"    "Online"    "Online"    "Online"   
    ## 
    ## $audience.size
    ##  [1] 20 25 30 32 20 50 56 34 60 23 34 33 23 12 24 28 33
    
    ## 查看node的标签
    V(net)
    
    ## + 17/17 vertices, named, from c3731a0:
    ##  [1] s01 s02 s03 s04 s05 s06 s07 s08 s09 s10 s11 s12 s13 s14 s15 s16 s17
    
    ### 查看edges的注释信息
    edge_attr(net)
    
    ## $type
    ##  [1] "hyperlink" "hyperlink" "hyperlink" "mention"   "hyperlink" "hyperlink"
    ##  [7] "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink"
    ## [13] "mention"   "hyperlink" "hyperlink" "hyperlink" "mention"   "mention"  
    ## [19] "hyperlink" "mention"   "mention"   "hyperlink" "hyperlink" "mention"  
    ## [25] "hyperlink" "hyperlink" "mention"   "mention"   "mention"   "hyperlink"
    ## [31] "mention"   "hyperlink" "mention"   "mention"   "mention"   "hyperlink"
    ## [37] "mention"   "hyperlink" "mention"   "hyperlink" "mention"   "mention"  
    ## [43] "mention"   "hyperlink" "hyperlink" "hyperlink" "hyperlink" "mention"  
    ## [49] "hyperlink"
    ## 
    ## $weight
    ##  [1] 22 22 21 20 23 21  1  5 21 22  1  4  2  1  1 23  1 22  3  2  1 21  2 21  1
    ## [26] 21 21  1 22 21  4  2 21 23 21  2  2 22 22 21  1  1 21 22  1  4 23 21  4
    
    ### 查看edges的连接信息
    E(net)
    
    ## + 49/49 edges from c3731a0 (vertex names):
    ##  [1] s01->s02 s01->s03 s01->s04 s01->s15 s02->s01 s02->s03 s02->s09 s02->s10
    ##  [9] s03->s01 s03->s04 s03->s05 s03->s08 s03->s10 s03->s11 s03->s12 s04->s03
    ## [17] s04->s06 s04->s11 s04->s12 s04->s17 s05->s01 s05->s02 s05->s09 s05->s15
    ## [25] s06->s06 s06->s16 s06->s17 s07->s03 s07->s08 s07->s10 s07->s14 s08->s03
    ## [33] s08->s07 s08->s09 s09->s10 s10->s03 s12->s06 s12->s13 s12->s14 s13->s12
    ## [41] s13->s17 s14->s11 s14->s13 s15->s01 s15->s04 s15->s06 s16->s06 s16->s17
    ## [49] s17->s04
    

    tidygraph

    对于tidygraph的对象而言。首先这个对象是无缝衔接igraph的参数的。所以上面的那些参数都是可以使用的。另外呢,tidygraph含有一个activate函数可以来提取相对应的信息。这个函数支持nodesedges这两个参数。提取的结果通过as.*就可以转换为数据框来进行查看了。

    ### 查看nodes信息
    net2 %>% activate(nodes) %>% as_tibble() %>% head()
    
    ## # A tibble: 6 x 5
    ##   name  media               media.type type.label audience.size
    ##   <chr> <chr>                    <int> <chr>              <int>
    ## 1 s01   NY Times                     1 Newspaper             20
    ## 2 s02   Washington Post              1 Newspaper             25
    ## 3 s03   Wall Street Journal          1 Newspaper             30
    ## 4 s04   USA Today                    1 Newspaper             32
    ## 5 s05   LA Times                     1 Newspaper             20
    ## 6 s06   New York Post                1 Newspaper             50
    
    ### 查看edges信息
    net2 %>% activate(edges) %>% as.data.frame() %>% head()
    
    ##   from to      type weight
    ## 1    1  2 hyperlink     22
    ## 2    1  3 hyperlink     22
    ## 3    1  4 hyperlink     21
    ## 4    1 15   mention     20
    ## 5    2  1 hyperlink     23
    ## 6    2  3 hyperlink     21
    

    网络信息的筛选

    igraph

    igraph
    可以进行相关信息筛选的查看。但是筛选完的数据,如果想要进行网络可视化的话。就需要重新的进行定义网络对象了。

    ## 基于node的注释信息筛选node
    V(net)[type.label == "TV"]
    
    ## + 5/17 vertices, named, from c3731a0:
    ## [1] s07 s08 s09 s10 s11
    
    ## 查看某一个node的edge信息
    E(net)[[inc("s01")]]
    
    ## + 8/49 edges from c3731a0 (vertex names):
    ##    tail head tid hid      type weight
    ## 1   s01  s02   1   2 hyperlink     22
    ## 2   s01  s03   1   3 hyperlink     22
    ## 3   s01  s04   1   4 hyperlink     21
    ## 4   s01  s15   1  15   mention     20
    ## 5   s02  s01   2   1 hyperlink     23
    ## 9   s03  s01   3   1 hyperlink     21
    ## 21  s05  s01   5   1   mention      1
    ## 44  s15  s01  15   1 hyperlink     22
    
    ## 基于某一个标准筛选edges
    E(net)[[type == "heyperlink"]]
    
    ## + 0/49 edges from c3731a0 (vertex names):
    ## [1] tail   head   tid    hid    type   weight
    ## <0 rows> (or 0-length row.names)
    

    tidygraph

    通过activate
    我们可以提取相关的node/edge信息。然后利用dplyr相关参数进行添加/修改即可。这样筛选完的对象还是网络对象。可以继续进行可视化的操作。

    net2 %>% activate(nodes) %>% filter(type.label == "TV") %>% 
        activate(edges) %>% filter(type == "mention")
    
    ## # A tbl_graph: 5 nodes and 4 edges
    ## #
    ## # A directed simple graph with 2 components
    ## #
    ## # Edge Data: 4 x 4 (active)
    ##    from    to type    weight
    ##   <int> <int> <chr>    <int>
    ## 1     1     2 mention     22
    ## 2     2     1 mention     21
    ## 3     2     3 mention     23
    ## 4     3     4 mention     21
    ## #
    ## # Node Data: 5 x 5
    ##   name  media    media.type type.label audience.size
    ##   <chr> <chr>         <int> <chr>              <int>
    ## 1 s07   CNN               2 TV                    56
    ## 2 s08   MSNBC             2 TV                    34
    ## 3 s09   FOX News          2 TV                    60
    ## # … with 2 more rows
    

    注释信息的添加/删除

    igraph

    igraph
    的数据储存都是list格式的,所以如果要添加额外的注释信息,我们可以使用$
    来进行添加。如果要添加node信息使用V;如果要添加edges信息则使用E

    ## 添加color的信息
    V(net)$color <- ifelse(V(net)$type.label == "TV", "red", "blue")
    vertex_attr(net)
    
    ## $name
    ##  [1] "s01" "s02" "s03" "s04" "s05" "s06" "s07" "s08" "s09" "s10" "s11" "s12"
    ## [13] "s13" "s14" "s15" "s16" "s17"
    ## 
    ## $media
    ##  [1] "NY Times"            "Washington Post"     "Wall Street Journal"
    ##  [4] "USA Today"           "LA Times"            "New York Post"      
    ##  [7] "CNN"                 "MSNBC"               "FOX News"           
    ## [10] "ABC"                 "BBC"                 "Yahoo News"         
    ## [13] "Google News"         "Reuters.com"         "NYTimes.com"        
    ## [16] "WashingtonPost.com"  "AOL.com"            
    ## 
    ## $media.type
    ##  [1] 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3
    ## 
    ## $type.label
    ##  [1] "Newspaper" "Newspaper" "Newspaper" "Newspaper" "Newspaper" "Newspaper"
    ##  [7] "TV"        "TV"        "TV"        "TV"        "TV"        "Online"   
    ## [13] "Online"    "Online"    "Online"    "Online"    "Online"   
    ## 
    ## $audience.size
    ##  [1] 20 25 30 32 20 50 56 34 60 23 34 33 23 12 24 28 33
    ## 
    ## $color
    ##  [1] "blue" "blue" "blue" "blue" "blue" "blue" "red"  "red"  "red"  "red" 
    ## [11] "red"  "blue" "blue" "blue" "blue" "blue" "blue"
    

    tidygraph

    在我们使用activate之后,可以提取相对应的信息,然后通过mutate即可来添加其他的信息了

    net2 %>% activate(nodes) %>% 
        mutate(color = if_else(type.label == "TV", "red", "blue")) %>% 
        pull(color)
    
    ##  [1] "blue" "blue" "blue" "blue" "blue" "blue" "red"  "red"  "red"  "red" 
    ## [11] "red"  "blue" "blue" "blue" "blue" "blue" "blue"

    相关文章

      网友评论

          本文标题:基于R的网络分析(一): 基本操作

          本文链接:https://www.haomeiwen.com/subject/txkemhtx.html