#broom包接受R中内置函数的杂乱输出,如lm、nls或t-test,并将它们转换为整齐的数据帧。
#就是把非数据框的杂乱数据整理为数据框
#broom+dplyr配合使用
#有三个功能:tidy;augment;glance
#例子一
```
lmfit <- lm(mpg ~ wt, mtcars)
lmfit
summary(lmfit)
library(broom)
tidy(lmfit)
```
#返回一个数据框,行名变成了名为term的列
# 您可能对回归中每个原始点的拟合值和残差感兴趣,而不是查看系数。
# 使用augment,它使用来自模型的信息来扩充原始数据
augment(lmfit)
#添加的列前面有一个点.,以避免覆盖原始列
#对于整个回归计算,有好几个总结性统计方法,glance功能可实现
glance(lmfit)
#例子二
```
#Generalized linear and non-linear models
glmfit <- glm(am ~ wt, mtcars, family="binomial")
tidy(glmfit)
augment(glmfit)
glance(glmfit)
#这些功能对非线性模型一样适用
nlsfit <- nls(mpg ~ k / wt + b, mtcars, start=list(k=1, b=0))
tidy(nlsfit)
augment(nlsfit, mtcars)
glance(nlsfit)
#The tidy function can also be applied to htest objects,
#such as those output by popular built-in functions like
#t.test, cor.test, and wilcox.test.
tt <- t.test(wt ~ am, mtcars)
tidy(tt)
wt<-wilcox.test(wt ~ am, mtcars)
tidy(wt)
glance(tt)
glance(wt)
#augment method is defined only for chi-squared tests
chit <- chisq.test(xtabs(Freq ~ Sex + Class, data = as.data.frame(Titanic)))
tidy(chit)
augment(chit)
```
# All functions
# The output of the tidy, augment and glance functions is always a data frame.
# The output never has rownames. This ensures that you can combine it with other tidy outputs without
# fear of losing information (since rownames in R cannot contain duplicates).
# Some column names are kept consistent, so that they can be combined across different models and so
# that you know what to expect (in contrast to asking “is it pval or PValue?” every time). The examples
# below are not all the possible column names, nor will all tidy output contain all or even any of these
# columns.
# tidy functions
# Each row in a tidy output typically represents some well-defined concept, such as one term in a
# regression, one test, or one cluster/class. This meaning varies across models but is usually self-evident.
# The one thing each row cannot represent is a point in the initial data (for that, use the augment method).
# Common column names include:
# term"" the term in a regression or model that is being estimated.
# p.value: this spelling was chosen (over common alternatives such as pvalue, PValue, or pval) to
# be consistent with functions in R’s built-in stats package
# statistic a test statistic, usually the one used to compute the p-value. Combining these across
# many sub-groups is a reliable way to perform (e.g.) bootstrap hypothesis testing
# estimate
# conf.low the low end of a confidence interval on the estimate
# conf.high the high end of a confidence interval on the estimate
# df degrees of freedom
# augment functions
# augment(model, data) adds columns to the original data.
# If the data argument is missing, augment attempts to reconstruct the data from the model (note that
# this may not always be possible, and usually won’t contain columns not used in the model).
# Each row in an augment output matches the corresponding row in the original data.
# If the original data contained rownames, augment turns them into a column called .rownames.
# Newly added column names begin with . to avoid overwriting columns in the original data.
# Common column names include:
# .fitted: the predicted values, on the same scale as the data.
# .resid: residuals: the actual y values minus the fitted values
# .cluster: cluster assignments
# glance functions
# glance always returns a one-row data frame.
# The only exception is that glance(NULL) returns an empty data frame.
# We avoid including arguments that were given to the modeling function. For example, a glm glance
# output does not need to contain a field for family, since that is decided by the user calling glm rather
# than the modeling function itself.
# Common column names include:
# r.squared the fraction of variance explained by the model
# adj.r.squared R^2 adjusted based on the degrees of freedom
# augment(chit)sigma the square root of the estimated variance of the residuals
网友评论