前面主要介绍了
dplyr
中的三大函数select
filter
mutate
,这一节来介绍dplyr
tidyr
中执行一些特定功能的函数
arrange() 通过选定的列进行排序,默认为升序
arrange(mtcars,mpg) %>% as_tibble()
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
2 10.4 8 460 215 3 5.42 17.8 0 0 3 4
3 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
arrange结合desc对数据进行降序
arrange(mtcars,desc(mpg)) %>% as_tibble()
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
2 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
3 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
distinct 针对数据框进行去重,unique针对向量进行去重
df <- tibble(
x = sample(10, 100, rep = TRUE),
y = sample(10, 100, rep = TRUE))
df %>% distinct()
rename 更改列名
rename(iris,petal_length=Petal.Length) %>% as_tibble()
Sepal.Length Sepal.Width petal_length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
reclocate 更改列顺序
iris %>% as_tibble() %>% relocate(Species)
#下述方法也可以实现,但是较为麻烦
iris %>% as_tibble() %>% select(Species,everything())
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3 1.4 0.2
3 setosa 4.7 3.2 1.3 0.2
df <- tibble(a = 1, b = 1, c = 1, d = "a", e = "a", f = "a")
a b c d e f
<dbl> <dbl> <dbl> <chr> <chr> <chr>
1 1 1 1 a a a
df %>% relocate(a, .after = c) # 指定列的顺序
df %>% relocate(f, .before = b)
df %>% relocate(a, .after = last_col()) # 移至最后一列
df %>% relocate(ff = f) #更改列名
df %>% relocate(where(is.character)) # 选择所有字符列
df %>% relocate(where(is.numeric), .after = last_col())
drop_na 删除含有缺失值的行
df <- tibble(x = c(1, 2, NA), y = c("a", NA, "b"))
df
x y
<dbl> <chr>
1 1 a
2 2 NA
3 NA b
df %>% drop_na()
> df %>% drop_na()
# A tibble: 1 x 2
x y
<dbl> <chr>
1 1 a
df %>% drop_na(x)
x y
<dbl> <chr>
1 1 a
2 2 NA
pull 提取单列
pull( )与$相似,在管道中使用pull更加优雅
iris %>% as_tibble() %>%
mutate(mean = rowMeans(across(where(is.numeric)))) %>%
pull(mean)
不使用pull函数称为点过滤
iris %>% as_tibble() %>%
mutate(mean = rowMeans(across(where(is.numeric)))) %>%
.$mean
喜欢的小伙伴欢迎关注我的公众号
R语言数据分析指南,持续分享数据可视化的经典案例及一些生信知识,希望对大家有所帮助
网友评论