> daily <- group_by(flights, year, month, day)
> daily
# A tibble: 336,776 x 19
# 只执行group_by()命令之后显示按照year,month,day分组
# Groups: year, month, day [365]
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 517 515 2 830 819
2 2013 1 1 533 529 4 850 830
3 2013 1 1 542 540 2 923 850
4 2013 1 1 544 545 -1 1004 1022
5 2013 1 1 554 600 -6 812 837
6 2013 1 1 554 558 -4 740 728
7 2013 1 1 555 600 -5 913 854
8 2013 1 1 557 600 -3 709 723
9 2013 1 1 557 600 -3 838 846
10 2013 1 1 558 600 -2 753 745
# ... with 336,766 more rows, and 11 more variables: arr_delay <dbl>,
# carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
# air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
# time_hour <dttm>
> (per_day <- summarise(daily, flights = n()))
`summarise()` has grouped output by 'year', 'month'. You can override using the `.groups` argument.
# A tibble: 365 x 4
# Groups: year, month [12]
year month day flights
<int> <int> <int> <int>
1 2013 1 1 842
2 2013 1 2 943
3 2013 1 3 914
4 2013 1 4 915
5 2013 1 5 720
6 2013 1 6 832
7 2013 1 7 933
8 2013 1 8 899
9 2013 1 9 902
10 2013 1 10 932
# ... with 355 more rows
> (per_month <- summarise(per_day, flights = sum(flights)))
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
# A tibble: 12 x 3
# Groups: year [1]
year month flights
<int> <int> <int>
1 2013 1 27004
2 2013 2 24951
3 2013 3 28834
4 2013 4 28330
5 2013 5 28796
6 2013 6 28243
7 2013 7 29425
8 2013 8 29327
9 2013 9 27574
10 2013 10 28889
11 2013 11 27268
12 2013 12 28135
daily %>%
ungroup() %>%
summarise(flights = n())
2.改变写法,输出与not_cancelled %>% count(dest) and not_cancelled %>% count(tailnum, wt = distance)相同的结果,不要使用count()
not_cancelled %>% group_by(dest) %>% summarise(n=n())
not_cancelled %>% group_by(tailnum) %>% summarise(sum(distance))
3.对航班取消的定义是: (is.na(dep_delay) | is.na(arr_delay) ),不是最优的,为什么?哪一列最重要?(不知道)
5.哪家航空公司的航班延误最严重?挑战:你能区分坏机场和坏航空公司的影响吗?为什么/为什么不?(提示:think about flights %>% group_by(carrier, dest) %>% summarise(n()))
6. count()函数有个sort参数,是做什么的?什么时候可能用到它?