AMS 315Data Analysis, Spring 2019First Computing Assignment The first report is due on Tuesday, March 26, but can be submitted without penalty by April 2. This report is worth 60 examination points. Please remember that there is a second project coming, so that you should finish the first project as soon as possible. Please submit your project via e-mail as instructed on the Class Blackboard. Detailed submission information is online. Project 1 has two parts. There are three files for this project. Two of the files are for part A, and one file is for part B. The files are labeled with the last four digits of your Stony Brook ID number. Part A Part A is worth 20 points. The two files for part A each contain a column for subject ID and a column for either the dependent variable value or the independent variable value. Your first task is to sort the two files by subject ID and merge them. You should not just use “cut and paste” to merge your data. Second, you are expected to deal with missing data. Your report should contain the count of the number of subject IDs that had at least one independent variable value or dependent variable value. It should also include the count of the number of subject IDs that had an independent variable value, the count of the number of subject IDs that had a dependent variable value, the count of the number of subject IDs that had both an independent and dependent variable value, and the count of the number of subject IDs that had at least one independent variable value or dependent variable value. Your second task is to impute the missing values. There are a number of missing data procedures. Often a statistical package has imputation algorithms in the software. For example, R has 5 different algorithms available. You may choose any algorithm except for listwise deletion. Specify your choice in your report. Often, the choice of imputation method has little effect on the results if the fraction of missing data is 30% or less. Then, you should use the statistical package of your choice to find the fitted linear model.Part B Part B is worth 40 points. The data file for part B contains one line for each subject ID. The line will contain the subject ID, the value of the independent variable, and the value of the dependent variable. A transformation of either IV or DV or both may be required. You should read the text for suggestions on fitting a model. A lack of fit (LOF) test should be applied. It is your responsibility to find repeated (or near repeated) independent variable values. That is, you should bin near repeated data into one level. For example, suppose that and . While there are not exactly repeated x values, you could bin these points into one group of nearly repeated points. That is, choose the average x-value as the value of x after binning. Then your binned data would be and . Now perform a LOF test on the data set after binning all near repeated values.You must submit a one-page report on Problem A and a one-page report on Problem B. Each report should have four sections. The introduction should contain a statement of the problem and the objective of the paper. This part is easy: your problem is to recover the function that was used to generate the dependent variable value based on the value of the independent variable. The data you receive will be generated by a simulation program. The second section should describe your methodology. Specifically, how the files were merged, the program used to perform the statistical analysis, whether you used linear regression and additional procedures such as a lack of fit test, how much missing data was present in the data, and the procedure for dealing with missing data. The third section should contain your results: what fraction of the variation of the dependent variable was explained, the analysis of variance table, the fitted function, confidence intervals for slope and test of the null hypothesis that the slope was zero. The fourth section should be conclusions and discussion. This section should focus on “big picture” issues. Was there an association between the variables? How important was it? That is, what was the r-squared value. What is your fitted function? You may submit a longer appendix of computer work and programs. Important note: Simply submitting your computer output is not acceptable and will receive a grade of 0. You must submit a formal report to begin to get non-zero credit. Grading Comments from last semesterMake sure that you focus on “big picture” issues in your reports. For example, what is the function that you chose to fit the data. Have you accounted for all of the observations that you were given? Do your results make sense? Make sure that you edit your findings to focus on your final model. A useful example of reporting other models in part B is reporting the r-squared for a model with no transformations to establish a base of comparison. Avoid putting large amounts of code in your report. These can be placed in an appendix. Even here, editing the code that your report to the actual code that you used to get your final report is valuable. Part A Deductions-5 incorrect interpretation of r-squared -5 incorrect or incomplete accounting of missing data -5 no report of missing DV values -5 Incomplete and hurried report.-5 listwise deletion with missing data report-5 incomplete report of imputation method -10 No missing data report-10 listwise deletion with incomplete missing data report-10 no report of imputation method -10 incorrect number of observations -10 no conclusion -10 no report of fitted function-15 wrong handling of missing data Part B Deductions -5 more report and less code.-5 wrong transformation -5 confusing statement of fitted function and missing anova table -5 presenting results for models other than the final model. -5 unsatisfactory residual plot in final transformation model -5 did not consider transforming IV.-5 no report of specifics from ANOVA table. -10 No statement of fitted function -10 transformations used in fitted function not clear -10 no statement of classes of transformations considered -10 only one transformation considered -10 Incomplete and hurried report. -10 incomplete methods section-10 no report of lack of fit test-10 incorrect conclusion about lack of fit test -10 incorrect use of regression test as a lack of fit test -10 no discussion of transformation results-10 no discussion of model adequacy -20 Anova table wrong-20 reversed IV and DV-20 No transformations considered when need apparent from scatterplot -30 incorrect hypothesis decision -40 no results reported-40 combine DV and IV into a final IV Example ReportHere is a sample report. Keep in mind, this is just a general idea of what should the first project looks like. You must not copy and paste it to submit as your report with the values of the numbers changed. Such activity is plagiarism and you will receive a grade of 0. Introduction The objective is to find the model describing the data in Problem A. A simulation program using an unknown linear function was used to generate the data.Methodology In order to solve problem A, we used the statistics package SPSS and Microsoft Excel spreadsheet program. The original data files were supplied with two data sheets in Excel. One data sheet had the ID of an observation and its associated independent variable value, and the other had the ID and associated dependent variable value. The independent variable data file had a total of 710 independent variable values with ID# ranging from 1 to 729. The dependent variable value had a total of 690 dependent variable values with ID # ranging from 1 to 730. We first sorted data in both files in ascending ID# order and then used Excel to merge the files. We next used listwise deletion to remove 40 entries that were missing either the independent variable value or the dependent variable value. Finally, we merged the two files into one file with three columns: ID, IV and DV. There were 670 entries with both values, with ID# ranging from 1 to 729. The data was then imported into SPSS. We assume linear regression for our data, but in order to find a better fit, we also transformed dependent variable into DV^2, Sqrt(DV) and independent variables into IV^2, Sqrt(IV), 1/IV, and ln(IV).Results The fitted function for the model Y= B+B1 X was DV=20.966IV+2123.719 with 99.9% fraction of variance was explained. The 95% confidence interval for the slope was [20.914 , 21.019]. The 95% confidence interval for the intercept was [2068.988 , 2178.450]. The analysis of variance table is shown below and the association between the independent variable and dependent variable was highly significant (p=0.000).Table 1Analysis of Variance TableDV regressed on IV(n=670)ANOVAaModel Sum of Squares Df Mean Square F Sig.1 Regression 25021381100.435 1 25021381100.435 617186.738 .000b Residual 27081402.664 668 40541.022 Total 25048462503.099 669 a. Dependent Variable: DVb. Predictors: (Constant), IVConclusion For problem A, the association between independent variables and dependent variables was highly significant (p=0.000), with 99.9% of the dependent variable variationexplained. The plot of residual versus predicted value confirmed the validity of this model.End of ReportNote: For part B, please report transformation you have performed, and the model using the transformations that you have decided upon. 本团队核心人员组成主要包括硅谷工程师、BAT一线工程师,精通德英语!我们主要业务范围是代做编程大作业、课程设计等等。我们的方向领域:window编程 数值算法 AI人工智能 金融统计 计量分析 大数据 网络编程 WEB编程 通讯编程 游戏编程多媒体linux 外挂编程 程序API图像处理 嵌入式/单片机 数据库编程 控制台 进程与线程 网络安全 汇编语言 硬件编程 软件设计 工程标准规等。其中代写编程、代写程序、代写留学生程序作业语言或工具包括但不限于以下范围:C/C++/C#代写Java代写IT代写Python代写辅导编程作业Matlab代写Haskell代写Processing代写Linux环境搭建Rust代写Data Structure Assginment 数据结构代写MIPS代写Machine Learning 作业 代写Oracle/SQL/PostgreSQL/Pig 数据库代写/代做/辅导Web开发、网站开发、网站作业ASP.NET网站开发Finance Insurace Statistics统计、回归、迭代Prolog代写Computer Computational method代做因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com 微信:codehelp
网友评论