subset()可以方便地提取变量和值。比如:
subset(airquality, Temp > 80, select = c(Ozone, Temp))
subset(airquality, Day == 1, select = -Temp)
subset(airquality, select = Ozone:Wind)
问题
我想从数据集中,根据某一个变量的多个取值筛选子集。
以gapminder数据集为例 ,我想筛选出中日美三个国家的数据。
出错
只选中国的结果
china <- subset(gapminder, country=="China")
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
China | Asia | 1952 | 44.00000 | 556263527 | 400.4486 |
China | Asia | 1957 | 50.54896 | 637408000 | 575.9870 |
China | Asia | 1962 | 44.50136 | 665770000 | 487.6740 |
China | Asia | 1967 | 58.38112 | 754550000 | 612.7057 |
China | Asia | 1972 | 63.11888 | 862030000 | 676.9001 |
China | Asia | 1977 | 63.96736 | 943455000 | 741.2375 |
China | Asia | 1982 | 65.52500 | 1000281000 | 962.4214 |
China | Asia | 1987 | 67.27400 | 1084035000 | 1378.9040 |
China | Asia | 1992 | 68.69000 | 1164970000 | 1655.7842 |
China | Asia | 1997 | 70.42600 | 1230075000 | 2289.2341 |
China | Asia | 2002 | 72.02800 | 1280400000 | 3119.2809 |
China | Asia | 2007 | 72.96100 | 1318683096 | 4959.1149 |
同时选三国的错误做法
three <-subset(gapminder, country==c("China","Japan", "United States"))
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
China | Asia | 1952 | 44.00000 | 556263527 | 400.4486 |
China | Asia | 1967 | 58.38112 | 754550000 | 612.7057 |
China | Asia | 1982 | 65.52500 | 1000281000 | 962.4214 |
China | Asia | 1997 | 70.42600 | 1230075000 | 2289.2341 |
Japan | Asia | 1957 | 65.50000 | 91563009 | 4317.6944 |
Japan | Asia | 1972 | 73.42000 | 107188273 | 14778.7864 |
Japan | Asia | 1987 | 78.67000 | 122091325 | 22375.9419 |
Japan | Asia | 2002 | 82.00000 | 127065841 | 28604.5919 |
United States | Americas | 1962 | 70.21000 | 186538000 | 16173.1459 |
United States | Americas | 1977 | 73.38000 | 220239000 | 24072.6321 |
United States | Americas | 1992 | 76.09000 | 256894189 | 32003.9322 |
United States | Americas | 2007 | 78.24200 | 301139947 | 42951.6531 |
缺失了很多数据。实际上,这种做法把12个年份的数据依次轮回分配给三个国家了;而不是我们想要的每个国家都有12个年份的数据。
正确做法
用“|”或判断
three <-subset(gapminder, country=="China"|country=="Japan"|country=="United States")
用“%in%”多项配对
three <-subset(gapminder, country %in% c("China","Japan","United States"))
country | continent | year | lifeExp | pop | gdpPercap |
---|---|---|---|---|---|
China | Asia | 1952 | 44.00000 | 556263527 | 400.4486 |
China | Asia | 1957 | 50.54896 | 637408000 | 575.9870 |
China | Asia | 1962 | 44.50136 | 665770000 | 487.6740 |
China | Asia | 1967 | 58.38112 | 754550000 | 612.7057 |
China | Asia | 1972 | 63.11888 | 862030000 | 676.9001 |
China | Asia | 1977 | 63.96736 | 943455000 | 741.2375 |
China | Asia | 1982 | 65.52500 | 1000281000 | 962.4214 |
China | Asia | 1987 | 67.27400 | 1084035000 | 1378.9040 |
China | Asia | 1992 | 68.69000 | 1164970000 | 1655.7842 |
China | Asia | 1997 | 70.42600 | 1230075000 | 2289.2341 |
China | Asia | 2002 | 72.02800 | 1280400000 | 3119.2809 |
China | Asia | 2007 | 72.96100 | 1318683096 | 4959.1149 |
Japan | Asia | 1952 | 63.03000 | 86459025 | 3216.9563 |
Japan | Asia | 1957 | 65.50000 | 91563009 | 4317.6944 |
Japan | Asia | 1962 | 68.73000 | 95831757 | 6576.6495 |
Japan | Asia | 1967 | 71.43000 | 100825279 | 9847.7886 |
Japan | Asia | 1972 | 73.42000 | 107188273 | 14778.7864 |
Japan | Asia | 1977 | 75.38000 | 113872473 | 16610.3770 |
Japan | Asia | 1982 | 77.11000 | 118454974 | 19384.1057 |
Japan | Asia | 1987 | 78.67000 | 122091325 | 22375.9419 |
Japan | Asia | 1992 | 79.36000 | 124329269 | 26824.8951 |
Japan | Asia | 1997 | 80.69000 | 125956499 | 28816.5850 |
Japan | Asia | 2002 | 82.00000 | 127065841 | 28604.5919 |
Japan | Asia | 2007 | 82.60300 | 127467972 | 31656.0681 |
United States | Americas | 1952 | 68.44000 | 157553000 | 13990.4821 |
United States | Americas | 1957 | 69.49000 | 171984000 | 14847.1271 |
United States | Americas | 1962 | 70.21000 | 186538000 | 16173.1459 |
United States | Americas | 1967 | 70.76000 | 198712000 | 19530.3656 |
United States | Americas | 1972 | 71.34000 | 209896000 | 21806.0359 |
United States | Americas | 1977 | 73.38000 | 220239000 | 24072.6321 |
United States | Americas | 1982 | 74.65000 | 232187835 | 25009.5591 |
United States | Americas | 1987 | 75.02000 | 242803533 | 29884.3504 |
United States | Americas | 1992 | 76.09000 | 256894189 | 32003.9322 |
United States | Americas | 1997 | 76.81000 | 272911760 | 35767.4330 |
United States | Americas | 2002 | 77.31000 | 287675526 | 39097.0995 |
United States | Americas | 2007 | 78.24200 | 301139947 | 42951.6531 |
附: 蝌蚪图
library(ggplot2)
library(gganimate)
ggplot(three, aes(x=gdpPercap, y=lifeExp, size=pop, color=country)) +
geom_point(alpha=0.8) +
scale_size(range=c(2,12)) +
scale_x_log10() +
labs(title = "年份: {frame_time}") +
transition_time(year) +
shadow_wake(wake_length = 0.1, alpha = FALSE)

动图加了小尾巴进行修饰。
参见 gganimate|创建可视化动图,让你的表会说话。
网友评论