STAT UN2102 Homework 4 [100 pts]Due 11:59pm Monday, May 6th on CanvasYour homework should be submitted on Canvas as an R Markdown file. Please submitthe knitted .pdf or .html file along with the .Rmd file. We will not (and cannot) acceptany other formats. Please clearly label the questions in your responses and support youranswers by textual explanations and the code you use to produce the result. We may printout your homeworks. Please do not waste paper by printing the dataset or any vector over,say, length 20.Goals: Simulating probability distributions using the accept-reject method, simulating asampling distribution related to the linear regression model.1 Reject-Accept MethodLet random variable X denote the temperature at which a certain chemical reaction takesplace. Suppose that X has probability density functionPerform the following tasks:1. Determine the maximum of f(x). Find an envelope function e(x) by using a uniformdistribution for g(x) and setting e(x) = maxx{f(x)}.2. Using the Accept-Reject Algorithm, write a program that simulates 1000 drawsfrom the probability density function f(x) from Equation 1.3. Plot a histogram of your simulated data with the density function f overlayed in thegraph. Label your plot appropriately.2 Regression and Empirical Size2.1 RegressionWe work with the grocery retailer dataset from Canvas. The description follows:1A large national grocery retailer tracks productivity and costs of its facilities closely. Considera data set obtained from a single distribution center for a one-year period. Each datapoint for each variable represents one week of activity. The variables included are numberof cases shipped in thousands (X1), the indirect costs of labor as a percentage of totalcosts (X2), a qualitative predictor called holiday that is coded 1 if the week has a holidayand 0 otherwise (X3), and total labor hours (Y ). Consider the multiple linear regressionmodel(2) Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + �i, i = 1, 2, . . . , 52,and iid~ N(0, σ2).Perform the following tasks:4. Read in the grocery retailer dataset. Name the dataset grocery.5. Use the least squares equation = (XTX)1XTY to estimate regression model (2).To estimate the model, use the linear model function in R, i.e., use lm().6. Use R to estimate σ2, i.e., compute MSE =1�. To perform this task,use the residuals function.2.2 Test for SlopeNow consider investigating if the number of cases shipped (X1) is statistically related tototal labor hours (Y ). To investigate the research question, we run a t-test on the coefficientcorresponding to X1, i.e., we test the null altSTAT UN2102作业代做、代做R Markdown file作业、代写R课程作业、R程序语言作业调试 代做SPSSernative pair(3) H0 : β1 = 0 versus HA : β1 6= 0.To run the hypothesis testing procedure, we use the t-statistic�1 is the second element of the least squares estimator β= (XTX)1XTY andSE(β1) is the standard error of β?1. The least squares estimates, estimated standard errors,t-statistics and p-values for all coefficients β0, β1, β2, β3 are nicely organized in the standardlinear regression output displayed in Table 1. To get this output in R, use the summary()function on your model.Test the manager’s claim in (3) using the R functions lm() and summary().2Table 1: Standard Multiple Linear Regression OutputEstimate Std. Error t value Pr(> |t|) or Sig(Intercept) β2.3 Sampling DistributionUnder model (2) and under the null hypothesis H0 : β1 = 0, the test statistic (4) has astudent’s t-distribution with n 4 degrees of freedom, i.e.,�The goal of this section is to simulate the sampling distribution of the t-statistic.Perform the following tasks:5. Write a loop that simulates the sampling distribution of the t-statistic under nullhypothesis (3) with the multiple linear regression model (2). To accomplish this task:i. Assume the true model relating Y with X1, X2, X3 is(5) Yi = 4200 + β1Xi1 ? 15X2 + 620X3 + i, i = 1, 2, . . . , 52,iiid~ N(0, 20500).ii. Assuming H0 : β1 = 0 is true, simulate 10,000 draws from model (5) using thefixed covariates X2, X3.iii. For each iteration of the loop, fit the full model�using the simulated Y and fixed covariates X1, X2, X3.iv. For each iteration of the loop, also compute the t-statistic from equation (4).Store these values in a vector t.stat. Hint: Use the summary function in R andextract the actual summary table using the code summary(model)[[4]]. Thenextract the relevant t-statistic from the table.v. Display the first six elements of your simulated t-values.37. Plot a histogram of the simulated sampling distribution. Overlay the correct t-densityon this histogram, i.e., overlay the density t(df = 52 ? 4). Plot the density in greenand set breaks=40 in the histogram. Make sure to label the plot appropriately. Youcan use base R or ggplot.8. Recall that the significance level of a testing procedure is defined asP(Type I error) = P(Rejecting H0 when H0 is true) = α.The significance level is often called the size of the testing procedure. Based onsignificance levels α = 0.10, 0.05, 0.01, compute the sample proportion of simulatedt-values that fell in the rejection region. The proportion of simulated rejected t-valuesunder the null is called the empirical size of a test. The three values should be closeto the actual α levels.4转自:http://www.7daixie.com/2019042853423322.html
网友评论