美文网首页
让R像excel一样工作-篇一

让R像excel一样工作-篇一

作者: 肖ano | 来源:发表于2020-12-27 20:41 被阅读0次

Preparation

install.library('tidyverse')

The package tidyverse includes several useful packages using in data analysis, such as ggplot2, phlyr, tidyr. The phlyr is selected to perform the data in this article.

Work Flow

# load the tidyverse package
library(tidyverse)

filter——过滤

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with.

# filter(.data, ..., .preserve = FALSE)
# using the iris data
> data(iris)
# display the first five rows of the iris data
> head(iris)
# filter the data and attain the Sepal.Length = 5
> filter(iris, Sepal.Length == 5)

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1             5         3.6          1.4         0.2     setosa
2             5         3.4          1.5         0.2     setosa
3             5         3.0          1.6         0.2     setosa
4             5         3.4          1.6         0.4     setosa
5             5         3.2          1.2         0.2     setosa
6             5         3.5          1.3         0.3     setosa
7             5         3.5          1.6         0.6     setosa
8             5         3.3          1.4         0.2     setosa
9             5         2.0          3.5         1.0 versicolor
10            5         2.3          3.3         1.0 versicolor

> filter(iris, Sepal.Length == 5 & Sepal.Width == 3)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1            5           3          1.6         0.2  setosa

Useful filter functions

There are many functions and operators that are useful when constructing the expressions used to filter the data:

  • ==, >, >= etc

  • &, |, !, xor()

  • is.na()

  • between(), near()

Attention:
The filter() will exclude the data contain NA , or you can keep the NA by adding restrictions.

> flower <- iris
> flower[1,1] <- NA
> filter(flower, is.na(flower) | Sepal.Length == 5 )
   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            NA         3.5          1.4         0.2     setosa
2             5         3.6          1.4         0.2     setosa
3             5         3.4          1.5         0.2     setosa
4             5         3.0          1.6         0.2     setosa
5             5         3.4          1.6         0.4     setosa
6             5         3.2          1.2         0.2     setosa
7             5         3.5          1.3         0.3     setosa
8             5         3.5          1.6         0.6     setosa
9             5         3.3          1.4         0.2     setosa
10            5         2.0          3.5         1.0 versicolor
11            5         2.3          3.3         1.0 versicolor

arrange——排序

arrange() orders the rows of a data frame by the values of selected columns.
Unlike other dplyr verbs, arrange() largely ignores grouping; you need to explicitly mention grouping variables (or use .by_group = TRUE) in order to group by them, and functions of variables are evaluated once per data frame, not once per group.

# arrange the Sepal.Width column and then the Species column
> arrange(iris, Petal.Width, Species)
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            4.9         3.1          1.5         0.1     setosa
2            4.8         3.0          1.4         0.1     setosa
3            4.3         3.0          1.1         0.1     setosa
4            5.2         4.1          1.5         0.1     setosa
5            4.9         3.6          1.4         0.1     setosa
...
47           5.4         3.4          1.5         0.4     setosa
48           5.1         3.8          1.9         0.4     setosa
49           5.1         3.3          1.7         0.5     setosa
50           5.0         3.5          1.6         0.6     setosa
51           4.9         2.4          3.3         1.0 versicolor
52           5.0         2.0          3.5         1.0 versicolor
53           6.0         2.2          4.0         1.0 versicolor
...
# The optional parameters desc() can be used to descend order.

select()——选择

Select (and optionally rename) variables in a data frame, using a concise mini-language that makes it easy to refer to variables based on their name (e.g. a:f selects all columns from a on the left to f on the right). You can also use predicate functions like is.numeric to select variables based on their properties.

# select the Petal.Width column and Species column
> select(iris, Petal.Width, Species)
# select the data from Petal.Width column to Species column
> select(iris, Petal.Width:Species)
# select the data except Petal.Width column to Species column
> select(iris, -c(Petal.Width:Species))

Useful selection skills

Overview of selection features
Tidyverse selections implement a dialect of R where operators make it easy to select variables:

  • : for selecting a range of consecutive variables.

  • ! for taking the complement of a set of variables.

  • & and | for selecting the intersection or the union of two sets of variables.

  • c() for combining selections.

In addition, you can use selection helpers. Some helpers select specific columns:

  • everything(): Matches all variables.

  • last_col(): Select last variable, possibly with an offset.

These helpers select variables by matching patterns in their names:

  • starts_with(): Starts with a prefix.

  • ends_with(): Ends with a suffix.

  • contains(): Contains a literal string.

  • matches(): Matches a regular expression.

  • num_range(): Matches a numerical range like x01, x02, x03.

These helpers select variables from a character vector:

  • all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

  • any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

This helper selects variables with a function:

  • where(): Applies a function to all variables and selects those for which the function returns TRUE.

mutate()——创建新变量

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL.

iris_part <- mutate(iris, Sepal.Area = Sepal.Length * Sepal.Width)

Attention: If you only want to preserve the new variables, you can use the transmute() function.

Reference

https://dplyr.tidyverse.org/

相关文章

  • 让R像excel一样工作-篇一

    Preparation The package tidyverse includes several useful...

  • 还不快给你的EXCEL上把“密码锁”!

    来源:微信公众号表妹的EXCEL EXCEL工作薄好似数据的”家“,既然是家,总不忍心让人家像公共场所一样随随便便...

  • 2018-04-04

    R与Excel 有人说,Excel中被使用最多的功能是数据透视。我没有统计过,也不想做这样繁琐的工作,让我假设这是...

  • 像Excel一样使用R语言做数据分析

    一、R语言导入数据(读取数据) 1、定位 (待读取的数据地址) getwd() : 返回当前工作目录setwd("...

  • 像蚂蚁一样工作,像蝴蝶一样工作

    “像蚂蚁一样工作,像蝴蝶一样生活”是美国著名的摄影记者罗伯特·卡帕的一句话,这句话在网络上广为流传。很多人用这句话...

  • 【每天一个R语言命令】-edit/view

    【描述】edit用于像excel那样在R中编辑u局;而view则是以表格显示数据 【用法】

  • 像Excel一样使用Python(一)

    一、基本介绍 在进行数据处理时,如果数据简单,数量不多,excel是大家的首选。但是当数据众多,类型复杂,需要灵活...

  • 如何让工作像游戏一样好玩

    1 前几日和朋友在探讨这样一个话题: 为什么游戏、泡沫剧、小说可以让人在家里宅一天。 而工作、学习、看书却很难有超...

  • 让工作变得像“游戏”一样有趣!

    今天回顾《可复制的领导力》第三章:构建游戏化组织,让工作变得更有趣。记下那些需要反复揣摩和带来启发的金句: 1.在...

  • SQL notes

    目录 一,学习笔记 二,参考资料 持续跟新中 ...... 一,学习笔记 SQL 数据库,像极了Excel表。 R...

网友评论

      本文标题:让R像excel一样工作-篇一

      本文链接:https://www.haomeiwen.com/subject/erggnktx.html