Final Project — MATH 185 (Winter 2019)Salaries of Baseball PlayersThe data in the file baseball.txt are a subset of data collected on professional American baseball players. Focus on thefollowing variables: salary — The yearly salary of each player (there are 59 NA’s in salary — remove these rows) homeruns_career — The total number of homeruns a player has made over the course of his career division — The division a player is in (either W or E)Part I: Rank sum test(a) Are the salaries similar in the two divisions? Examine whether the distributions of player salaries are the samein both leagues. Use the Wilcoxon test with the normal approximation and calculate the value of the teststatistic, its expected value and variance under the null hypothesis of no difference, and give the test result.(b) The data contain ties. Explain how to simulate the distribution of the test statistic under the null. Implement thisprocedure and check that the significance levels you computed in the previous question are robust to the effectof ties.(c) Test your result by implementing the Wilcoxon test in R and give a point estimate (using the Lehmann-Hodgesestimator) and a confidence interval for the difference in median salary between the two leagues.Part II: Rank sign test(d) Consider the possibility that walks and runs have the same median. Find a confidence interval for the differencein medians and use it to test for equal medians at 95%. What do you conclude?Part III: Smoothing(e) Examine the relationship between the log of outcome variable salary and explanatory variables homeruns_careerand division using exploratory plots.(f) Can we predict salaries with the predictor variable homeruns_career? Perform a regression of response variablelog_salary on the variable homeruns_career. Plot the regression lines onto the data and comment on the fit. Give aninterpretation of the model, keeping in mind that the salaries are reported on a log-scale. Comment on the effectof extreme observations in homeruns_career.(g) Try smoothing procedure, for example a Kernel smoother ksmooth() or local polynomial regression locpoly(). Plotthe fit for various smoothing parameters and discuss what you find.(h) Use LOO-CV to estimate the bandwidth in a local-polynomial fit of degree 1 (see lecture code for somethingnearly identical). Calculate the smoothing matrix in this case (the expression for was given in lecture 9 forlocal linear).本团队核心人员组成主要包括硅谷工程师、BAT一线工程师,精通德英语!我们主要业务范围是代做编程大作业、课程设计等等。我们的方向领域:window编程 数值算法 AI人工智能 金融统计 计量分析 大数据 网络编程 WEB编程 通讯编程 游戏编程多媒体linux 外挂编程 程序API图像处理 嵌入式/单片机 数据库编程 控制台 进程与线程 网络安全 汇编语言 硬件编程 软件设计 工程标准规等。其中代写编程、代写程序、代写留学生程序作业语言或工具包括但不限于以下范围:C/C++/C#代写Java代写IT代写Python代写辅导编程作业Matlab代写Haskell代写Processing代写Linux环境搭建Rust代写Data Structure Assginment 数据结构代写MIPS代写Machine Learning 作业 代写Oracle/SQL/PostgreSQL/Pig 数据库代写/代做/辅导Web开发、网站开发、网站作业ASP.NET网站开发Finance Insurace Statistics统计、回归、迭代Prolog代写Computer Computational method代做因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com 微信:codehelp
网友评论