美文网首页R随笔-生活工作点滴生信小白
R for data science ||使用tibble实现简

R for data science ||使用tibble实现简

作者: 周运来就是我 | 来源:发表于2019-07-16 07:45 被阅读45次

tibble是R语言中一个用来替换data.frame类型的扩展的数据框,tibble继承了data.frame,是弱类型的,同时与data.frame有相同的语法,使用起来更方便。tibble包,也是由Hadley开发的R包。

  • tibble,不关心输入类型,可存储任意类型,包括list类型
  • tibble,没有行名设置 row.names
  • tibble,支持任意的列名
  • tibble,会自动添加列名
  • tibble,类型只能回收长度为1的输入
  • tibble,会懒加载参数,并按顺序运行
  • tibble,是tbl_df类型
创建tibble
library(tidyverse)
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> as_tibble(iris)
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# ... with 140 more rows
tibble(
  x = 1:5, 
  y = 1, 
  z = x ^ 2 + y
)
# A tibble: 5 x 3
      x     y     z
  <int> <dbl> <dbl>
1     1     1     2
2     2     1     5
3     3     1    10
4     4     1    17
5     5     1    26

对特殊符号的支持

tb <- tibble(
  `:)` = "smile", 
  ` ` = "space",
  `2000` = "number"
)
tb
# A tibble: 1 x 3
  `:)`  ` `   `2000`
  <chr> <chr> <chr> 
1 smile space number

定制化生成

tribble(
  ~x, ~y, ~z,
  #--|--|----
  "a", 2, 3.6,
  "b", 1, 8.5
)

# A tibble: 2 x 3
  x         y     z
  <chr> <dbl> <dbl>
1 a         2   3.6
2 b         1   8.5
对比tibble与data.frame

人性化打印。


tibble(
 a = lubridate::now() + runif(1e3) * 86400,
 b = lubridate::today() + runif(1e3) * 30,
 c = 1:1e3,
 d = runif(1e3),
 e = sample(letters, 1e3, replace = TRUE)
)
# A tibble: 1,000 x 5
  a                   b              c      d e    
  <dttm>              <date>     <int>  <dbl> <chr>
1 2019-07-16 19:27:28 2019-07-31     1 0.906  v    
2 2019-07-17 03:46:36 2019-08-12     2 0.271  k    
3 2019-07-17 06:38:54 2019-08-10     3 0.0282 x    
4 2019-07-16 13:02:51 2019-08-07     4 0.938  v    
5 2019-07-17 07:18:28 2019-08-14     5 0.759  t    
6 2019-07-16 17:11:20 2019-08-09     6 0.275  f    
7 2019-07-16 10:54:57 2019-08-11     7 0.0217 u    
8 2019-07-17 03:35:19 2019-07-20     8 0.110  b    
9 2019-07-16 14:57:29 2019-07-27     9 0.436  g    
10 2019-07-17 05:01:51 2019-08-01    10 0.401  s    
# ... with 990 more rows

nycflights13::flights %>% 
  print(n = 10, width = Inf)

# A tibble: 336,776 x 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier flight tailnum origin dest  air_time distance
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl> <chr>    <int> <chr>   <chr>  <chr>    <dbl>    <dbl>
 1  2013     1     1      517            515         2      830            819        11 UA        1545 N14228  EWR    IAH        227     1400
 2  2013     1     1      533            529         4      850            830        20 UA        1714 N24211  LGA    IAH        227     1416
 3  2013     1     1      542            540         2      923            850        33 AA        1141 N619AA  JFK    MIA        160     1089
 4  2013     1     1      544            545        -1     1004           1022       -18 B6         725 N804JB  JFK    BQN        183     1576
 5  2013     1     1      554            600        -6      812            837       -25 DL         461 N668DN  LGA    ATL        116      762
 6  2013     1     1      554            558        -4      740            728        12 UA        1696 N39463  EWR    ORD        150      719
 7  2013     1     1      555            600        -5      913            854        19 B6         507 N516JB  EWR    FLL        158     1065
 8  2013     1     1      557            600        -3      709            723       -14 EV        5708 N829AS  LGA    IAD         53      229
 9  2013     1     1      557            600        -3      838            846        -8 B6          79 N593JB  JFK    MCO        140      944
10  2013     1     1      558            600        -2      753            745         8 AA         301 N3ALAA  LGA    ORD        138      733
    hour minute time_hour          
   <dbl>  <dbl> <dttm>             
 1     5     15 2013-01-01 05:00:00
 2     5     29 2013-01-01 05:00:00
 3     5     40 2013-01-01 05:00:00
 4     5     45 2013-01-01 05:00:00
 5     6      0 2013-01-01 06:00:00
 6     5     58 2013-01-01 05:00:00
 7     6      0 2013-01-01 06:00:00
 8     6      0 2013-01-01 06:00:00
 9     6      0 2013-01-01 06:00:00
10     6      0 2013-01-01 06:00:00
# ... with 3.368e+05 more rows

You can also control the default print behaviour by setting options:

  • options(tibble.print_max = n, tibble.print_min = m): if more than n rows, print only m rows.

  • options(tibble.print_min = Inf) to always show all rows.

  • options(tibble.width = Inf) to always print all columns, regardless of the width of the screen.

取子集。

nycflights13::flights %>% 
  print(n = 10, width = Inf)

df <- tibble(
  x = runif(5),
  y = rnorm(5)
)
> df
# A tibble: 5 x 2
       x      y
   <dbl>  <dbl>
1 0.140   0.492
2 0.0541 -0.307
3 0.366  -0.395
4 0.616   0.441
5 0.203  -2.16 
# Extract by name
df$x
#> [1] 0.434 0.395 0.548 0.762 0.254
df[["x"]]
#> [1] 0.434 0.395 0.548 0.762 0.254

# Extract by position
df[[1]]
#> [1] 0.434 0.395 0.548 0.762 0.254
df %>% .$x
#> [1] 0.434 0.395 0.548 0.762 0.254
df %>% .[["x"]]
#> [1] 0.434 0.395 0.548 0.762 0.254

与旧代码交互

class(as.data.frame(tb))
#> [1] "data.frame"

r4ds
R语言数据科学新类型tibble

相关文章

网友评论

    本文标题:R for data science ||使用tibble实现简

    本文链接:https://www.haomeiwen.com/subject/wquzhctx.html