GU4206-GR5206 Sample Final ExamStudent The STAT Fall 2018 GU4206-GR5206 Final Exam is open notes, open book(s), open computer and onlineresources are allowed. Students are not allowed to communicate with any other people regarding the finalwith the exception of the instructor and TA. This includes emailing fellow students, using WeChat and othersimilar forms of communication. If there is any suspicion of students cheating, further investigation will takeplace. If students do not follow the guidelines, they will receive a zero on the exam and potentially face moresevere consequences.Part 1: Warm Up [15 pts]Recall the strikes dataset from Week 6 lecture notes. This data set, compiled by Bruce Western has informationon 18 countries over 35 years.strikes dim(strikes)## [1] 625 8head(strikes, 3)## country year strike.volume unemployment inflation left.parliament## 1 Australia 1951 296 1.3 19.8 43## 2 Australia 1952 397 2.2 17.2 43## 3 Australia 1953 360 2.5 4.3 43## centralization density## 1 0.3748588 NA## 2 0.3751829 NA## 3 0.3745076 NAIn this problem, you will use the variable unemployment. The goal of this exercise is to take the followinginelegant nested-loop and condense it to as few of lines of code as possible. For full credit, perform the sametask in no more that two lines of code.countries unemployment_statistic counter for (c in countries) {one_country unemployment_temp stat for (i in 1:length(unemployment_temp)){stat }unemployment_statistic[counter] counter }data.frame(country=countries,unemployment=unemployment_statistic)## country unemployment1## 1 Australia 3.5057143## 2 Austria 2.5400000## 3 Belgium 3.6466667## 4 Canada 6.0428571## 5 Denmark 5.7114286## 6 Finland 2.5714286## 7 France 3.1828571## 8 Germany 3.1171429## 9 Ireland 7.7714286## 10 Italy 6.7257143## 11 Japan 1.6028571## 12 Netherlands 3.6914286## 13 New.Zealand 1.0028571## 14 Norway 1.4285714## 15 Sweden 2.1371429## 16 Switzerland 0.3285714## 17 UK 3.4514286## 18 USA 5.5428571Write your solution below.## Solution goes here -------Part 2: Simulation [50 pts]The goal of this section is to study the random variableX = U1 + U2,where U1 and U2 are independent uniform random variables, each over the unit interval. This section of thefinal exam has five major components; simulate X = U1 + U2 directly, plot the pdf of X = U1 + U2, use theinverse-transform method to simulate X, use the accept-reject method to simulate X, and use the simulatedrandom variable X to compute a Monte-Carlo integral of the function b(x) = sin(sin(x)), over [0, 2].Perform the following tasks:Part 2.i) [5 pts]Simulate 10,000 cases of the random variable X = U1 + U2 directly by using runif() for U1 and U2 andthen adding the resulting vectors. Display the first 10 simulated values of X and use ggplot to construct ahistogram of the distribution of X. Use a Base R plot for partial credit.## Solution goes here -------Part 2.ii) [10 pts]Using ggplot, plot the pdf of X over the range [?1, 3]. The pdf is given by the following piecewise function:For guidance on defining and plotting a piecewise function, please see Part 2.v.2## Solution goes here -------Part 2.iii) [15 pts]Use the inverse-transform method to simulate 10,000 draws of X. For full credit, perform the followingtasks:a) Define a R function called F.cdf.inv which is the inverse of the cdf F(x). Test your function using thepoints .1,.45,.7. More specifically, run the code F.cdf.inv(F.cdf(c(.1,.45,.7))), where F.cdf() is thecdf.b) Use the inverse-transform method to simulate 10,000 draws of X and display the first 10 simulateddraws.c) Plot the 10,000 simulated cases using ggplot, i.e., plot a histogram. Also overlay the pdf f(x) on thethe histogram. For partial credit, use Base R graphics.Note that the cumulative distribution function (cdf) of f(x) is�Note that the function F(x) appears to be non-invertible because of the quadratic terms. However, thefunction is invertible over the domain of each piecewise interval.�## Solution goes here -------Part 2.iii.b)## Solution goes here -------Part 2.iii.c)## Solution goes here -------Part 2.iv) [10 pts]Use the accept-reject method to simulate 10,000 draws of X. For full credit, perform the following tasks:a) Clearly identify an easy to simulate distribution g(x).b) Identify a suitable envelope function e(x) that satisfiesf(x) ≤ e(x) = g(x)/α, where 0 c) Simulate 10,000 draws from the target distribution f(x) using the accept-reject algorithm. Displaythe first 10 simulated values.3d) Using ggplot or Base R, con代写GU4206-GR5206作业、代做R编程语言作业、R实验作业代写、strikes留学生作业代做 代做Databasstruct a histogram of the simulated distribution with the pdf f(x)overlayed on the plot.Part 2.vi.a)Part 2.vi.b)## Solution goes here -------Part 2.iv.c)## Solution goes here -------Part 2.iv.d)## Solution goes here -------Part 2.v) [10 pts]The goal of this section is to use the simulated cases coming from target distribution f(x) to numericallyintegrate b(x) via Monte-Carlo. You must draw directly from f(x) by using any of the previous threemethods. The integral of interest is:�b(x)dx, where b(x) = (sin(sin(x)) 0 To visualize b(x), see the code below.x n.x b out 2),0,sin(sin(x)))return(out)}plot_data library(ggplot2)ggplot(data = plot_data) +geom_abline(slope=0,intercept=0,linetype = dashed)+geom_line(mapping = aes(x = x, y = b),color=red)+labs(title = expression(sin(sin(x))),y=b(x))## Solution goes here -----Part 3: Cost of Gradient Descent vs. Newton’s Method [35 pts]In this section, we will extend off of the gradient descent and Newton’s method algorithms to assess how longeach method takes overall and how long each method takes per iteration.To investigate the two methods, consider the simulated dataset:n d set.seed(0)x x b y To begin, we will apply a traditional least squares scenario as our objective function. It (obviously) isn’tnecessary to use gradient-descent or Newton’s method. However, this nice objective function guarantees lessproblems with convergence. The objective function (squared loss) is easily defined by:square.loss return(sum((y - x %*% beta)^2))}A very useful function in R is proc.time(). The third output of proc.time can be used to estimate howlong a program takes to run. A simple example follows:5start_time min_procedure end_time # Elapsed timeend_time## elapsed## 0.007# Look at solutionrbind(b,min_procedure$estimate)## [,1] [,2] [,3] [,4] [,5] [,6] [,7]## b -4.460000 -8.770000 -2.890000 1.540000 0.7000000 2.090000 -0.280000## -4.851119 -8.670076 -4.439985 1.648804 0.3286596 2.407551 1.122194## [,8] [,9] [,10]## b -7.370000 9.29000 4.850000## -7.510709 10.49301 4.103895Perform the following tasks:2.i) Overall runtime [15 pts]In this section you will use the Gradient Descent and Newton’s Method functions from Homework 7and Lab 6 respectively. In all methods, use defaults: x0 = rep(0,d), max.iter=200, step.size=0.001,stopping.deriv=0.1Run a simulation that performs the following tasks:I) Create two vectors of length 29. Call the vectors runtime_gd and runtime_nm.II) for d = 2, 3, . . . , 30i) Simulate Y using the linear modelY = β0 + β1x1 + · · · + βd?1xd?1 + �, �iid~ N(0, σ2 = 25)Note this is already solved!ii) Using the simulated dataset, apply both the gradient descent and Newton’s method algorithmsto estimate the parameters β0, . . . , βd?1. During the estimation procedure, store the amount ofelapsed time it took to run for each d.III) You should have two vectors of length 29, one for gradient descent and one for Newton’s method. Plotthese runtime vectors (on the same graph) as a function of the dimension of the model d. Brieflycomment on the plot.Note: I generally discourage nesting loops but it is fine for this assignment.## Solution goes here -------------2.ii) Runtime per iteration [20 pts]In this section you will modify the Gradient Descent and Newton’s Method functions from Homework7 and Lab 6 respectively. In all methods, use defaults: x0 = rep(0,d), max.iter=200, step.size=0.001,stopping.deriv=0.16In this problem, you will slightly change the Gradient Descent and Newton’s Method functions fromclass so that the algorithms also store the run time of each iteration. Each function should output the averageruntime for all iterations.Run a simulation that performs the following tasks:I) Create two vectors of length 29. Call the vectors ** iter_time_gd** and ** iter_time_gd**.II) for d = 2, 3, . . . , 30i) Simulate Y using the linear modelY = β0 + β1x1 + · · · + βd?1xd?1 + �, �iid~ N(0, σ2 = 25)Note this is already solved!ii) Using the simulated dataset, apply both the modified gradient descent and modified Newton’smethod algorithms to estimate the parameters β0, . . . , βd?1. For each d during the estimationprocedure, store the average elapsed time per iteration.III) You should have two vectors of length 29, one for gradient descent and one for Newton’s method. Plotthese iteration runtimes (on the same graph) as a function of the dimension of the model d. Brieflycomment on the plot.## Solution goes here -------------7转自:http://www.7daixie.com/2019051221714088.html
网友评论