1. 安装

这里使用 nycflights13 和 tidyverse 两个包，其中主要用到 dplyr 包中函数：

library(nycflights13)
library(tidyverse)

nycflights13 中的 flights 数据对象含有 336776 个 2013 年纽约的航班信息：

flights.png

注意：

int stands for integers.

dbl stands for doubles, or real numbers.

chr stands for character vectors, or strings.

dttm stands for date-times (a date + a time).

lgl stands for logical, vectors that contain only TRUE or FALSE.

fctr stands for factors, which R uses to represent categorical variables with fixed possible values.

date stands for dates.
dplyr basics

filter() : Pick observations by their values.

arrange() : Reorder the rows.

select() : Pick variables by their names.

mutate() : Create new variables with functions of existing variables.

summarise() : Collapse many values down to a single summary.

2. `filter()` 筛选观测值（行）

选取特定值：

filter1.png
2.1 near() 能用来判断两个值是否相等：

near().png
2.2 逻辑判断：

& is "and", | is "or", and ! is "not". “与或非”
x %in% y: This will select every row where x is one of the values in y.
!(x & y) is the same as !x | !y, and !(x | y) is the same as !x & !y.

logical operators.png

filter(flights, month == 11 | month == 12)
filter(flights, month %in% c(11, 12))
filter(flights, !(arr_delay > 120 | dep_delay > 120))
filter(flights, arr_delay <= 120, dep_delay <= 120)

filter2.png

2.3 缺失值：

NA represents an unknown value so missing values are “contagious”: almost any operation involving an unknown value will also be unknown.

判断一个值是否是 NA，使用 is.na()。
尝试：

filter(df, is.na(x) | x > 1)

3. `arrange()` 对行进行重排序

默认情况下按升序排列。使用desc() 可以降序排列, NA 值进行排序时候再末尾：

desc.png

4. `select()` 筛选特征值（列）

flights 对象有 19 个特征值，可以直接选择所需要的特征值进行后续分析：

select.png

There are a number of helper functions you can use within select():

starts_with("abc"): matches names that begin with “abc”.

ends_with("xyz"): matches names that end with “xyz”.

contains("ijk"): matches names that contain “ijk”.

matches("(.)\\1"): selects variables that match a regular expression. This one matches any variables that contain repeated characters.

num_range("x", 1:3): matches x1, x2 and x3.