美文网首页R
kaggle案例重复:科比的投篮选择之二

kaggle案例重复:科比的投篮选择之二

作者: 小明的数据分析笔记本 | 来源:发表于2019-04-30 22:30 被阅读29次

今天继续重复kaggle案例:科比的投篮选择。原文地址https://www.kaggle.com/xvivancos/kobe-bryant-shot-selection/report

读入数据、加载需要用到的包
setwd("../Desktop/Data_analysis_practice/Kaggle/Kobe_shot_selection/")
shots<-read.csv("data.csv")
dim(shots)
shots<-na.omit(shots)
dim(shots)
library(ggplot2)
library(tidyverse)
library(gridExtra)
不同进攻方式的投篮命中率

这里用到group_by()summarise()函数。一个简单的小例子理解这两个函数的用法

df<-data.frame(First=c("A","A","A","B","B","B"),
               Second=c(1,2,1,4,5,6))
df%>%
  group_by(First)%>%
  summarise(Accuracy=mean(Second),
            counts=n())

# A tibble: 2 x 3
  First Accuracy counts
  <fct>    <dbl>  <int>
1 A         1.33      3
2 B         5.00      3
shots%>%
  group_by(action_type)%>%
  summarise(Accuracy=mean(shot_made_flag),counts=n())%>%
  filter(counts>20)%>%
  ggplot(aes(x=reorder(action_type,Accuracy),y=Accuracy))+
  geom_point(aes(colour=Accuracy),size=3)+
  scale_colour_gradient(low="orangered",high="chartreuse3")+
  labs(title="Accurancy by shot type")+theme_bw()+
  theme(axis.title.y=element_blank(),
        legend.position="none",
        plot.title=element_text(hjust=0.5))+
  coord_flip()
Rplot14.png
这里又涉及一个小知识点:从小到大排序使用reorder()函数。小例子:
df<-data.frame(First=LETTERS[1:5],
               Second=c(1,4,5,3,2))
p1<-ggplot(df,aes(x=First,y=Second))+
  geom_bar(stat="identity",fill="darkgreen")
p2<-ggplot(df,aes(x=reorder(First,Second),y=Second))+
  geom_bar(stat="identity",fill="orange")

ggpubr::ggarrange(p1,p2,ncol=1,nrow=2,labels=c("p1","p2"))
Rplot15.png

那么从大到小排序呢?暂时想到一种解决办法:

df1<-df[order(df$Second,decreasing=T),]
df1$First<-fct_inorder(df1$First)
ggplot(df1,aes(x=First,y=Second))+
  geom_bar(stat="identity",fill="orangered")
Rplot16.png
每个赛季的命中率
shots%>%
  group_by(season)%>%
  summarise(Accuracy=mean(shot_made_flag))%>%
  ggplot(aes(x=season,y=Accuracy,group=1))+
  geom_line(aes(colour=Accuracy))+
  geom_point(aes(colour=Accuracy),size=3)+
  scale_colour_gradient(low="orangered",high="chartreuse3")+
  labs(title="Accuracy by season",x="Season")+theme_bw()+
  theme(legend.position="none",
        plot.title=element_text(hjust=0.5),
        axis.text.x=element_text(angle=45,hjust=1))
Rplot17.png

由上图可以看出最后三个赛季科比的命中率断崖式下跌。原文作者的话:As we see, the accuracy begins to decrease badly from the 2013-14 season. Why didn't you retire before, Kobe?

常规赛季后赛命中率对比
shots%>%
  group_by(season)%>%
  summarise(Playoff=mean(shot_made_flag[playoffs==1]),
            RegularSeason=mean(shot_made_flag[playoffs==0]))%>%
  ggplot(aes(x=season,group=1))+
  geom_line(aes(y=Playoff,color="Playoff"))+
  geom_line(aes(y=RegularSeason,colour="RegularSeason"))+
  geom_point(aes(y=Playoff,color="Playoff"),size=3)+
  geom_point(aes(y=RegularSeason,color="RegularSeason"))+
  labs(title="Accuracy by season",
       subtitle="Playoff and Regular Season",
       x="Season",y="Accuracy")+theme_bw()+
  theme(legend.title=element_blank(),
        plot.title=element_text(hjust=0.5),
        plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=45,hjust=1))

Rplot18.png
两分球和三分球命中率
shots %>%
  group_by(season) %>%
  summarise(TwoPoint=mean(shot_made_flag[shot_type=="2PT Field Goal"]),
            ThreePoint=mean(shot_made_flag[shot_type=="3PT Field Goal"])) %>%
  ggplot(aes(x=season, group=1)) +
  geom_line(aes(y=TwoPoint, colour="TwoPoint")) +
  geom_line(aes(y=ThreePoint, colour="ThreePoint")) +
  geom_point(aes(y=TwoPoint, colour="TwoPoint"), size=3) +
  geom_point(aes(y=ThreePoint, colour="ThreePoint"), size=3) +
  labs(title="Accuracy by season", 
       subtitle="2PT Field Goal and 3PT Field Goal",
       x="Season", y="Accuracy") +
  theme_bw() +
  theme(legend.title=element_blank(),
        plot.title=element_text(hjust=0.5),
        plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=45, hjust=1)) 
Rplot19.png

从上图看到2013-2014赛季科比的3分命中率极低。哪位忠实的球迷还能想起来2013-2014赛季的科比是什么情况吗?

不同的对手两分球三分球命中率
shots %>%
  group_by(opponent) %>%
  summarise(TwoPoint=mean(shot_made_flag[shot_type=="2PT Field Goal"]),
            ThreePoint=mean(shot_made_flag[shot_type=="3PT Field Goal"])) %>%
  ggplot(aes(x=opponent, group=1)) +
  geom_line(aes(y=TwoPoint, colour="TwoPoint")) +
  geom_line(aes(y=ThreePoint, colour="ThreePoint")) +
  geom_point(aes(y=TwoPoint, colour="TwoPoint"), size=3) +
  geom_point(aes(y=ThreePoint, colour="ThreePoint"), size=3) +
  labs(title="Accuracy by opponent", 
       subtitle="2PT Field Goal and 3PT Field Goal",
       x="Opponent", y="Accuracy") +
  theme_bw() +
  theme(legend.title=element_blank(),
        plot.title=element_text(hjust=0.5),
        plot.subtitle=element_text(hjust=0.5),
        axis.text.x=element_text(angle=45, hjust=1)) 
Rplot20.png
不同出手距离投篮命中率
shots %>%
  group_by(shot_distance) %>%
  summarise(Accuracy=mean(shot_made_flag)) %>%
  ggplot(aes(x=shot_distance, y=Accuracy)) + 
  geom_line(aes(colour=Accuracy)) +
  geom_point(aes(colour=Accuracy), size=2) +
  scale_colour_gradient(low="orangered", high="chartreuse3") +
  labs(title="Accuracy by shot distance", x="Shot distance (ft.)") +
  xlim(c(0,45)) +
  theme_bw() +
  theme(legend.position="none",
        plot.title=element_text(hjust=0.5)) 
Rplot21.png
不同区域的投篮命中率
p7 <- shots %>%
  select(lat, lon, shot_zone_range, shot_made_flag) %>%
  group_by(shot_zone_range) %>%
  mutate(Accuracy=mean(shot_made_flag)) %>%
  ggplot(aes(x=lon, y=lat)) +
  geom_point(aes(colour=Accuracy)) +
  scale_colour_gradient(low="red", high="lightgreen") +
  labs(title="Accuracy by shot zone range") +
  ylim(c(33.7, 34.0883)) +
  theme_void() +
  theme(plot.title=element_text(hjust=0.5)
p8 <- shots %>%
  select(lat, lon, shot_zone_area, shot_made_flag) %>%
  group_by(shot_zone_area) %>%
  mutate(Accuracy=mean(shot_made_flag)) %>%
  ggplot(aes(x=lon, y=lat)) +
  geom_point(aes(colour=Accuracy)) +
  scale_colour_gradient(low="red", high="lightgreen") +
  labs(title="Accuracy by shot zone area") +
  ylim(c(33.7, 34.0883)) +
  theme_void() +
  theme(legend.position="none",
        plot.title=element_text(hjust=0.5))
p9 <- shots %>%
  select(lat, lon, shot_zone_basic, shot_made_flag) %>%
  group_by(shot_zone_basic) %>%
  mutate(Accuracy=mean(shot_made_flag)) %>%
  ggplot(aes(x=lon, y=lat)) +
  geom_point(aes(colour=Accuracy)) +
  scale_colour_gradient(low="red", high="lightgreen") +
  labs(title="Accuracy by shot zone basic") +
  ylim(c(33.7, 34.0883)) +
  theme_void() +
  theme(legend.position="none",
        plot.title=element_text(hjust=0.5))
grid.arrange(p7, p8, p9, layout_matrix=cbind(c(1,2), c(1,3)))
Rplot22.png
欢迎喜欢篮球的R语言初学者关注我的公众号 小明的数据分析笔记本
公众号二维码.jpg

相关文章

  • kaggle案例重复:科比的投篮选择之二

    今天继续重复kaggle案例:科比的投篮选择。原文地址https://www.kaggle.com/xvivanc...

  • kaggle案例重复:科比的投篮选择之一

    以下内容为kaggle网站上的一个案例;原文地址 Kobe Bryant Shot Selection。主要内容是...

  • Kaggle入门案例——从进网站到获得评测结果

    最近写了Kaggle的一个playground项目——预测科比投篮是否命中https://www.kaggle...

  • tensorflow笔记 - bug - onehot

    数据预处理时用pandas 做了一个科比的投篮预测,数据集在kaggle上可以找到,我们对flag(投中与否)进行...

  • 音频37

    音频37 【学习】天才来自刻意的练习 成功等于简单的事情重复做。科比的成功,来自于每天早上4点钟去练习投篮1000...

  • 无论如何都要持续

    持续是硬道理,天才的成功来自刻意的重复。科比一身的伤病有多少?可他依然每天看到洛杉矶4点钟的样子,而且投篮1000...

  • 转述《时间管理100讲》37-学习的规律

    科比他每天早起打篮球投篮1000个。 成功就是这样,简单的事情重复做。 成功等于简单的事情重复做,其实只说了一半。...

  • 时间管理100讲第37讲

    天才来自刻意的练习 科比每天早上4点起床,会去打篮球投篮1000个,成功等于简单的事情重复做。 水滴石穿、冰冻三尺...

  • 时间管理37-38讲学习

    无数的成功人士,都是在专注地、重复地做一件事情。 科比每天早上4点钟起床练习篮球技巧,他每天早上练习投篮1000个...

  • 7.5 晴

    无数的成功人士,都是在专注地、重复地做一件事情。 科比每天早上4点钟起床练习篮球技巧,他每天早上练习投篮1000个...

网友评论

    本文标题:kaggle案例重复:科比的投篮选择之二

    本文链接:https://www.haomeiwen.com/subject/tzdbnqtx.html