美文网首页
代做STA355H1S作业、代写R编程语言作业、R实验作业代写、

代做STA355H1S作业、代写R编程语言作业、R实验作业代写、

作者: dunqiuwei | 来源:发表于2019-02-17 14:33 被阅读0次

    Assignment #2 STA355H1S

    due Friday February 15, 2019

    Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF files

    only) – the deadline is 11:59pm on February 15. You are strongly encouraged to do problems

    3 through 6 but these are not to be submitted for grading.

    1. On Quercus, there is a file containing data on the lengths (in minutes) of 272 eruptions

    of the Old Faithful geyser in Yellowstone National Park. Using R and some of the methods

    discussed in class, answer the following questions.

    (a) Do the data appear to be normal? If not, do they appear to be unimodal?

    (b) Use the density function in R to estimate the density. Choose a variety of bandwidths

    (the parameter bw) and describe how the estimates change as the bandwidth changes.

    (c) One automated approach to selecting the bandwidth parameter h is leave-one-out

    cross-validation. This is a fairly general procedure that is useful for selecting tuning

    parameters in a variety of statistical problems.

    If f and g are density functions, then we can define the Kullback-Leibler divergence

    DKL(fkg) = Z ∞f(x) lnf(x)g(x)!dx.

    For a given density f, DKL(fkg) is minimized over densities g when g = f (and DKL(fkf) =

    0). In the context of bandwith selection, define bfh(x) to be a density estimator with bandwidth

    h and f(x) to be the true (but unknown) density that produces the data. Ideally, we

    would like to minimize DKL(fkfh) with respect to h but since f is unknown, the best we

    can do is to minimize an estimate of DKL(fkfh). Noting that

    DKL(fkfh) = ln(fh(x))f(x) dx +Z ∞ln(f(x))f(x) dx= Ef [ln(fh(X))] + constant,

    this suggests that we should try to maximize an estimate of Ef [ln(fh(X))], which can be

    estimated for a given h by the following (leave-one-out) substitution principle estimator:

    is the density estimate with bandwidth h using all the observations except Xi

    . (Note that

    this has the flavour of maximum likelihood estimation for the bandwidth h. Note that if we

    replaced bf

    (Xi) by bfh(Xi) in the formula for CV(h) then it would always be maximized

    at h = 0 – the “leave-one-out” approach avoids this.)

    On Quercus, there is a function kde.cv (in a file kde.cv.txt) that computes CV(h) for

    various bandwidth parameters h. Use this function to estimate the density of the Old

    Faithful data.

    (d) Now assume that the data come from a mixture of two normal distributions so that the

    density has the form:

    are unknown parameters. Use the density estimates from parts (b)

    and (c) and any other appropriate methods to come up with educated guesses of the values

    of these parameters. (Don’t worry too much about your final answers – the process is much

    more important here.)

    2. Suppose that F is a distribution concentrated on the positive real line (i.e. F(x) = 0

    for x < 0). If μ(F) = EF (X) then the mean population share of the distribution F is

    defined as MPS(F) = F(μ(F)?) = PF (X < μ(F)). (When F is a continuous distribution,

    F(μ(F)?) = F(μ(F)).) For most income distributions, MPS(F) > 1/2 with MPS(F) = 0 if

    (and only if) all incomes are equal and MPS(F) → 1 as Gini(F) → 1.

    (a) Suppose that F is a continuous distribution function with Lorenz curve

    LF (t) = 1

    (s) ds where μ(F) = Z 1

    (s) ds.

    Show that MPS(F) satisfies the condition

    (MPS(F)) = 1

    where L

    (t) is the derivative (with respect to t) of the Lorenz curve.

    (b) Given observations X1, · · · , Xn from a distribution F, a substitution principle estimate

    of MPS(F) is

    MPS( d F) = 1

    I(Xi < Xˉ)

    A sample of 200 incomes is given on Quercus in a file incomes.txt. Using these data

    compute an estimate of MPS(F) and use the jackknife to give an estimate of its standard

    error.

    (c) Suppose that you know (or are willing to assume) that the data come from a log-normal

    distribution – that is, ln(X1), · · · , ln(Xn) are independent N (μ,σ2) random variables. Show

    that

    EF (Xi) = exp(μ) exp(σ2/2)

    and so

    MPS(F) = PF (Xi < EF (Xi)) = PF (ln(Xi) < ln(EF (Xi)) = 1

    (Hint: Evaluate EF (Xi) as E[exp(μ + σY )] where Y ~ N (0, 1).)

    (d) Using the data used in part (b), compute an estimate of MPS(F) using the log-normal

    assumption and give an estimate of its standard error. How do these compare to the estimates

    in part (b)? Does the log-normal assumption seem to be valid for these data? (Hint: An

    estimate of σ

    2

    is simply the sample variance of ln(x1), · · · , ln(xn). For the standard error,

    you can use the Delta Method or the jackknife or both!)

    Supplemental problems (not to be handed in):

    3. Suppose that X1, · · · , Xn are independent random variables with common density f(xθ)

    where f is symmetric around 0 (i.e. f(x) = f(x)) and θ is an unknown location parameter.

    If Var(Xi) is finite then the sample mean Xˉ will be a reasonable estimator of μ; however, if

    f has heavy tails then Xˉ will be less efficient than other estimators of θ, for example, the

    sample median.

    An useful alternative to the sample mean is the α-trimmed mean, which trims the smallest

    α and largest α fractions of the data and averages the middle order statistics. Specifically,

    if we define r = bnαc (where bxc is the integer part of x) then the α-trimmed mean, bθ(α),

    is defined by

    bθ(α) = 1n 2r

    (a) Suppose (for simplicity) that bnαc = b(n?1)αc and define bθi(α) to be α-trimmed mean

    with X(i) deleted from the sample. Find expressions for bθ1(α), · · · ,

    bθn(α); in particular,

    note that

    bθ1(α) = · · · = bθr(α) and bθ(nr+1)(α) = · · · = bθn(α)

    (b) Using the setup in part (a), show that the pseudo-values {Φi} are given by Φi =(n 2r)(n 1 2r)

    for i = 1, · · · , r + 1

    Φi =2r(n 2r)(n 1 2r)

    for i = r + 2, · · · , n r 1

    Φi =2rX(nr)2r(n 2r)(n 1 2r)for i = n r, · · · , n

    and give a formula for the jackknife estimator of variance of bθ(α). (Think about how you

    might use this variance estimator to choose an “optimal” value of r.)

    4. Suppose that bθ1 and bθ2 are unbiased estimators of a parameter θ and consider estimators

    of the form

    eθ = a

    bθ1 + (1 a)

    bθ2.

    (a) Show that eθ is unbiased for any a.

    (b) Find the value of a that minimizes Var(eθ) in terms of Var(bθ1), Var(bθ2), and Cov(bθ1,

    bθ2).

    Under what conditions would a = 1? Can a be greater than 1 or less than 0?

    5. A histogram is a very simple example of a density estimator. For a sample X1, · · · , Xn

    from a continuous distribution with density f(x), we define breakpoints a0, · · · , ak satisfying

    a0 < min(X1, · · · , Xn) < a1 < a2 < · · · < ak1 < max(X1, · · · , Xn) < ak

    and define for x ∈ [aj1, aj ):

    bf(x) = 1

    n(aj aj1)

    I(aj1 ≤ Xi < aj )

    with bf(x) = 0 for x < a0 and x ≥ ak.

    (a) Show that bf is a density function.

    (b) For a given value of x, evaluate the mean and variance of bf(x).

    (c) What conditions on a0, · · · , ak are needed for the bias and variance of bf(x) to go to 0 as

    n → ∞6. Another measure of inequality based on the Lorenz curve is the Pietra index defined by

    P(F) = max

    0≤t≤1

    {t LF (t)}

    where LF (t) is the Lorenz curve.

    (a) Show that g(t) = t LF (t) is maximized at t satisfying F1

    (t) = μ(F).

    (b) Using the result of part (a), show that

    P(F) = EF [|X μ(F)|]2μ(F).

    (You may assume that F has a density f.)

    (c) Give a substitution principle estimator for the Pietra index P(F) based on the empirical

    distribution function of X1, · · · , Xn. Using the data in Problem 2, compute an estimate of

    P(F) and use the jackknife to compute an estimate of its standard error.

    因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com 

    微信:codinghelp

    相关文章

      网友评论

          本文标题:代做STA355H1S作业、代写R编程语言作业、R实验作业代写、

          本文链接:https://www.haomeiwen.com/subject/okygeqtx.html