美文网首页
Data Science with R in 4 Weeks -

Data Science with R in 4 Weeks -

作者: 慢思考快思考 | 来源:发表于2016-01-13 20:35 被阅读30次

Day 3: summaries of data - two dimension summary


例子1: multiple boxplot  不同联盟的胜率有什么不同?

> temp <- read.csv("basketball_teams.csv")

> teamdata <- as.data.frame(temp)

> teamdata$new_column <- ifelse(teamdata$games == 0, NA, teamdata$won / teamdata$games)

> stats <- teamdata[, c("name","lgID", "year","new_column")]

boxplot(stats$new_column ~stats$lgID, data = stats, col = "red")

结果如下:


我们也可以用histgram

> par(mfrow = c(2,1), mar = c(4,4,2,1))

> hist(subset(stats$new_column, stats$lgID == "ABA"), col="green")

> hist(subset(stats$new_column, stats$lgID == "NBA"), col="green")

scatterplot

> with(stats, plot(stats$year, stats$new_column))

> abline( h =0.7, lwd = 2, lty = 2)

add color to scatterplot

with(stats, plot(stats$year, stats$new_column, col=stats$lgID))

从这个图中,我们就能看出来各个联赛(ABA,NBA)的球队他们的胜率是什么样子的。

或者,可以做多个scatterplot

分别看NBA和NBL的胜率

> with(subset(stats, stats$lgID == "NBA"), plot(subset(stats, stats$lgID == "NBA")$year, subset(stats, stats$lgID == "NBA")$new_column, main = "NBA"))

> with(subset(stats, stats$lgID == "NBL"), plot(subset(stats, stats$lgID == "NBL")$year, subset(stats, stats$lgID == "NBL")$new_column, main = "NBL"))

相关文章

网友评论

      本文标题:Data Science with R in 4 Weeks -

      本文链接:https://www.haomeiwen.com/subject/kydshttx.html