美文网首页
利用subset()筛选数据集

利用subset()筛选数据集

作者: 冬之心 | 来源:发表于2019-12-07 15:52 被阅读0次

subset()可以方便地提取变量和值。比如:

subset(airquality, Temp > 80, select = c(Ozone, Temp))
subset(airquality, Day == 1, select = -Temp)
subset(airquality, select = Ozone:Wind)

问题

我想从数据集中,根据某一个变量的多个取值筛选子集。

以gapminder数据集为例 ,我想筛选出中日美三个国家的数据。

出错

只选中国的结果

 china <- subset(gapminder, country=="China")
country continent year lifeExp pop gdpPercap
China Asia 1952 44.00000 556263527 400.4486
China Asia 1957 50.54896 637408000 575.9870
China Asia 1962 44.50136 665770000 487.6740
China Asia 1967 58.38112 754550000 612.7057
China Asia 1972 63.11888 862030000 676.9001
China Asia 1977 63.96736 943455000 741.2375
China Asia 1982 65.52500 1000281000 962.4214
China Asia 1987 67.27400 1084035000 1378.9040
China Asia 1992 68.69000 1164970000 1655.7842
China Asia 1997 70.42600 1230075000 2289.2341
China Asia 2002 72.02800 1280400000 3119.2809
China Asia 2007 72.96100 1318683096 4959.1149

同时选三国的错误做法

three <-subset(gapminder, country==c("China","Japan", "United States"))
country continent year lifeExp pop gdpPercap
China Asia 1952 44.00000 556263527 400.4486
China Asia 1967 58.38112 754550000 612.7057
China Asia 1982 65.52500 1000281000 962.4214
China Asia 1997 70.42600 1230075000 2289.2341
Japan Asia 1957 65.50000 91563009 4317.6944
Japan Asia 1972 73.42000 107188273 14778.7864
Japan Asia 1987 78.67000 122091325 22375.9419
Japan Asia 2002 82.00000 127065841 28604.5919
United States Americas 1962 70.21000 186538000 16173.1459
United States Americas 1977 73.38000 220239000 24072.6321
United States Americas 1992 76.09000 256894189 32003.9322
United States Americas 2007 78.24200 301139947 42951.6531

缺失了很多数据。实际上,这种做法把12个年份的数据依次轮回分配给三个国家了;而不是我们想要的每个国家都有12个年份的数据。

正确做法

用“|”或判断

three <-subset(gapminder, country=="China"|country=="Japan"|country=="United States")

用“%in%”多项配对

three <-subset(gapminder, country %in% c("China","Japan","United States"))
country continent year lifeExp pop gdpPercap
China Asia 1952 44.00000 556263527 400.4486
China Asia 1957 50.54896 637408000 575.9870
China Asia 1962 44.50136 665770000 487.6740
China Asia 1967 58.38112 754550000 612.7057
China Asia 1972 63.11888 862030000 676.9001
China Asia 1977 63.96736 943455000 741.2375
China Asia 1982 65.52500 1000281000 962.4214
China Asia 1987 67.27400 1084035000 1378.9040
China Asia 1992 68.69000 1164970000 1655.7842
China Asia 1997 70.42600 1230075000 2289.2341
China Asia 2002 72.02800 1280400000 3119.2809
China Asia 2007 72.96100 1318683096 4959.1149
Japan Asia 1952 63.03000 86459025 3216.9563
Japan Asia 1957 65.50000 91563009 4317.6944
Japan Asia 1962 68.73000 95831757 6576.6495
Japan Asia 1967 71.43000 100825279 9847.7886
Japan Asia 1972 73.42000 107188273 14778.7864
Japan Asia 1977 75.38000 113872473 16610.3770
Japan Asia 1982 77.11000 118454974 19384.1057
Japan Asia 1987 78.67000 122091325 22375.9419
Japan Asia 1992 79.36000 124329269 26824.8951
Japan Asia 1997 80.69000 125956499 28816.5850
Japan Asia 2002 82.00000 127065841 28604.5919
Japan Asia 2007 82.60300 127467972 31656.0681
United States Americas 1952 68.44000 157553000 13990.4821
United States Americas 1957 69.49000 171984000 14847.1271
United States Americas 1962 70.21000 186538000 16173.1459
United States Americas 1967 70.76000 198712000 19530.3656
United States Americas 1972 71.34000 209896000 21806.0359
United States Americas 1977 73.38000 220239000 24072.6321
United States Americas 1982 74.65000 232187835 25009.5591
United States Americas 1987 75.02000 242803533 29884.3504
United States Americas 1992 76.09000 256894189 32003.9322
United States Americas 1997 76.81000 272911760 35767.4330
United States Americas 2002 77.31000 287675526 39097.0995
United States Americas 2007 78.24200 301139947 42951.6531

附: 蝌蚪图

library(ggplot2)
library(gganimate)
ggplot(three, aes(x=gdpPercap, y=lifeExp, size=pop, color=country)) + 
  geom_point(alpha=0.8) + 
  scale_size(range=c(2,12)) + 
  scale_x_log10() + 
  labs(title = "年份: {frame_time}") + 
  transition_time(year) +
  shadow_wake(wake_length = 0.1, alpha = FALSE)
file216c654d24f7.gif

动图加了小尾巴进行修饰。
参见 gganimate|创建可视化动图,让你的表会说话

相关文章

网友评论

      本文标题:利用subset()筛选数据集

      本文链接:https://www.haomeiwen.com/subject/dpeugctx.html