DAY 5

作者: Peng_001 | 来源:发表于2020-05-04 21:07 被阅读0次

data-manipulation-with-dplyr

使用glimpse() 快速查看表格信息。

  1. select
    通过select选择指定表格信息
# Select the columns 
counties %>%
  select(state, county, population, poverty)

复习之前的mutate, arrange 及fliter

counties %>%
  # Select the five columns 
  select(state, county, population, men, women) %>%
  # Add the proportion_men variable
  mutate(proportion_men = men / population) %>%
  # Filter for population of at least 10,000
  filter(population >= 10000) %>%
  # Arrange proportion of men in descending order 
  arrange(desc(proportion_men))

使用select 选中多行信息。
select(state:drive) 选中 state-drive 之间的全部列表
contain("work") 选中全部带有work 的列表
starts_with("income") 选中以income 开头的列表

  1. count
    计算数据数量
    count() ,还可以定义sort对数值进行排序。
# Use count to find the number of counties in each region
counties_selected %>%
  count(region, sort = TRUE)

还可以设定排序的选项。通过调整weights 的数值。

# Find number of counties per state, weighted by citizens
counties_selected %>%
  count(state, wt = citizens, sort = TRUE)
  1. top_n
    top_n(2, population) 选择表格中按照population为权重排序,筛选出前2个。
    例子
# Group by region and find the greatest number of citizens who walk to work
counties_selected %>%
  group_by(region) %>%
  top_n(1, walk)
  1. rename
    rename() 对变量名称进行修改
    也可以直接在select 中语句使用
    name = new_name类似。
    不过注意的是,新命名的名字在左。
# Rename the n column to num_counties
counties %>%
  count(state) %>%
  rename(num_counties = n)
  1. transmute
    通过transmute,对列的内容不仅更名,还可以做计算操作。类似于特殊操作版本的select。
counties %>%
  # Keep the state, county, and populations columns, and add a density column
  transmute(state, county, population, density = population / land_area) %>%
  # Filter for counties with a population greater than one million 
  filter(population > 1000000) %>%
  # Sort density in ascending order 
  arrange(density)

select, rename, mutate, transmute的区别

# Change the name of the unemployment column
counties %>%
  rename(unemployment_rate = unemployment)

# Keep the state and county columns, and the columns containing poverty
counties %>%
  select(state, county, contains("poverty"))

# Calculate the fraction_women column without dropping the other columns
counties %>%
  mutate(fraction_women = women / population)

# Keep only the state, county, and employment_rate columns
counties %>%
  transmute(state, county, employment_rate = employed / population)
  1. lag()
    通过lag() 函数将连续数据内容向后移动一位。
v = c(1, 2, 4 ,5)
lag(v) = (NA, 2, 4)

因此,借助这样方式,可以计算连续项的差值。

difference = lag(v) - v 

例子

babynames_fraction %>%
  # Arrange the data in order of name, then year 
  arrange(name, year) %>%
  # Group the data by name
  group_by(name) %>%
  # Add a ratio column that contains the ratio between each year 
  mutate(ratio = fraction / lag(fraction))

总结


相关文章

网友评论

      本文标题:DAY 5

      本文链接:https://www.haomeiwen.com/subject/nuyeghtx.html