美文网首页
R包学习之broom

R包学习之broom

作者: 敖浩程 | 来源:发表于2019-05-26 23:26 被阅读0次

    #broom包接受R中内置函数的杂乱输出,如lm、nls或t-test,并将它们转换为整齐的数据帧。

    #就是把非数据框的杂乱数据整理为数据框

    #broom+dplyr配合使用

    #有三个功能:tidy;augment;glance

    #例子一

    ```

    lmfit <- lm(mpg ~ wt, mtcars)

    lmfit

    summary(lmfit)

    library(broom)

    tidy(lmfit)

    ```

    #返回一个数据框,行名变成了名为term的列

    # 您可能对回归中每个原始点的拟合值和残差感兴趣,而不是查看系数。

    # 使用augment,它使用来自模型的信息来扩充原始数据

    augment(lmfit)

    #添加的列前面有一个点.,以避免覆盖原始列

    #对于整个回归计算,有好几个总结性统计方法,glance功能可实现

    glance(lmfit)

    #例子二

    ```

    #Generalized linear and non-linear models

    glmfit <- glm(am ~ wt, mtcars, family="binomial")

    tidy(glmfit)

    augment(glmfit)

    glance(glmfit)

    #这些功能对非线性模型一样适用

    nlsfit <- nls(mpg ~ k / wt + b, mtcars, start=list(k=1, b=0))

    tidy(nlsfit)

    augment(nlsfit, mtcars)

    glance(nlsfit)

    #The tidy function can also be applied to htest objects,

    #such as those output by popular built-in functions like

    #t.test, cor.test, and wilcox.test.

    tt <- t.test(wt ~ am, mtcars)

    tidy(tt)

    wt<-wilcox.test(wt ~ am, mtcars)

    tidy(wt)

    glance(tt)

    glance(wt)

    #augment method is defined only for chi-squared tests

    chit <- chisq.test(xtabs(Freq ~ Sex + Class, data = as.data.frame(Titanic)))

    tidy(chit)

    augment(chit)

    ```

    # All functions

    # The output of the tidy, augment and glance functions is always a data frame.

    # The output never has rownames. This ensures that you can combine it with other tidy outputs without

    # fear of losing information (since rownames in R cannot contain duplicates).

    # Some column names are kept consistent, so that they can be combined across different models and so

    # that you know what to expect (in contrast to asking “is it pval or PValue?” every time). The examples

    # below are not all the possible column names, nor will all tidy output contain all or even any of these

    # columns.

    # tidy functions

    # Each row in a tidy output typically represents some well-defined concept, such as one term in a

    # regression, one test, or one cluster/class. This meaning varies across models but is usually self-evident.

    # The one thing each row cannot represent is a point in the initial data (for that, use the augment method).

    # Common column names include:

    #  term"" the term in a regression or model that is being estimated.

    # p.value: this spelling was chosen (over common alternatives such as pvalue, PValue, or pval) to

    # be consistent with functions in R’s built-in stats package

    # statistic a test statistic, usually the one used to compute the p-value. Combining these across

    # many sub-groups is a reliable way to perform (e.g.) bootstrap hypothesis testing

    # estimate

    # conf.low the low end of a confidence interval on the estimate

    # conf.high the high end of a confidence interval on the estimate

    # df degrees of freedom

    # augment functions

    # augment(model, data) adds columns to the original data.

    # If the data argument is missing, augment attempts to reconstruct the data from the model (note that

    #                                                                                          this may not always be possible, and usually won’t contain columns not used in the model).

    # Each row in an augment output matches the corresponding row in the original data.

    # If the original data contained rownames, augment turns them into a column called .rownames.

    # Newly added column names begin with . to avoid overwriting columns in the original data.

    # Common column names include:

    #  .fitted: the predicted values, on the same scale as the data.

    # .resid: residuals: the actual y values minus the fitted values

    # .cluster: cluster assignments

    # glance functions

    # glance always returns a one-row data frame.

    # The only exception is that glance(NULL) returns an empty data frame.

    # We avoid including arguments that were given to the modeling function. For example, a glm glance

    # output does not need to contain a field for family, since that is decided by the user calling glm rather

    # than the modeling function itself.

    # Common column names include:

    #  r.squared the fraction of variance explained by the model

    # adj.r.squared R^2 adjusted based on the degrees of freedom

    # augment(chit)sigma the square root of the estimated variance of the residuals

    相关文章

      网友评论

          本文标题:R包学习之broom

          本文链接:https://www.haomeiwen.com/subject/xfuqtctx.html