(To be released as 0.7.0)
Removing few bugs and achieving internal consistency has led to few breaking
changes in this version of broom
. We list them below-
-
New
glance.aov
method replaces the older one which relied onglance.lm
. The new model summary dataframe contains following information-logLik
,AIC
,BIC, deviance
,nobs
. This is in response to conversation that took place in #212. Note thattidy.aov
can be used to get numerator and denominator degrees of freedom. -
Augment method for
factanal
objects now returns a tibble with name pattern.fs
(e.g.,.fs1
,.fs2
,.fs3
, etc.), instead offactor
(e.g.,factor1
,factor2
,factor3
, etc.) (#650). -
We have removed all support for the
quick
argument intidy()
methods. TODO: explain why, and discuss alternatives.
have overhauled augment()
for general consistency improvements (hopefully,
pending getting safepredict()
going urgh)
-
If you pass a dataset to
augment()
via thedata
ornewdata
arguments, you are now guaranteed that the augmented dataset will have exactly the same number of rows as the original dataset. This differs from previous behavior primarily when there are missing values. Previouslyaugment()
would drop rows containingNA
. This should no longer be the case. -
augment()
no longer accepts anna.action
argument -
We no longer cram everything through
augment.lm()
and it has subsequently losts a lot of arguments that were needed when it was a frankenstein do everything function -
augment()
tries to give an informative error whendata
isn't the original training data
- Most of the glance methods return a
nobs
column now! (TODO: KUDOS)
Previously the df
column in glance
reported the rank of the design matrix.
Now it reports degrees of freedom of the numerator for the overall F-statistic.
This is always equals equal to the rank of the model matrix minus one, so the
new df
will always be the old df
minus one.
TODO: sort out what happens to glance.aov()
-
We now use
rlang::arg_match()
when possible instead ofarg.match()
to give more informative errors on argument mismatches. -
Moved core tests to the
modeltests
package -
Added new vignette detailing use of
modelgenerics
andmodeltests
packages -
Added
data
argument toaugment()
generic (did this happen?)
-
All
conf.int
arguments now default toFALSE
. This primarily effectstidy.survreg()
, which previously always returned confidence intervals. -
All
conf.level
arguments now default to0.95
. -
tidy.lsmobj()
gained aconf.int
argument
-
Added tidier for
summary.manova
(#729) -
Added
tidy()
andglance()
methods forspeedglm
objects from thespeedglm
package -
Added tidier for
epiR::epi.2by2
(#711) -
Make
.fitted
values respecttype.predict
argument ofaugment.clm()
. (#617) -
Return factor rather than numeric class predictions in
.fitted
ofaugment.polr()
. (#619) -
ordinal
tidier rewrite -
Added tidiers for
rma
objects from themetafor
package (#674, @malcolmbarrett, @softloud) -
Added support for
tidy.lavaan()
to takequick = TRUE
. (#628) -
ordinal
tidier rewrite -
Added tidiers for
pam
objects from thecluster
package. (#637) -
Added
tidy.svyglm()
andglance.svyglm()
(#611) -
Previously, F-statistics for weak instruments were returned through
glance.ivreg()
. F-statistics are now returned throughtidy.ivreg(instruments = TRUE)
. Default istidy.ivreg(instruments = FALSE)
.glance.ivreg()
still returns Wu-Hausman and Sargan test statistics. -
Added
tidy.regsubsets()
for best subsets linear regression from theleaps
package -
Added method
tidy.lm.beta()
to tidylm.beta
class models (#545 by @mattle24) -
Add feature for glance.biglm to return df.residual
-
Patch bug in glance.lavaan (#577)
-
Added tidiers for
lmrob
andglmrob
objects from therobustbase
package (#205, #505). -
Added method
tidy.systemfit()
to tidysystemfit
class models (by @jaspercooper) -
Added tidiers for
drc::drm
models (#574 by @edild) -
tidy.prcomp()
parametermatrix
gained new options"scores"
,"loadings"
, and"eigenvalues"
(#557 by @GegznaV) -
tidy.kmeans()
now uses the names of the input variables in the output by default. Setcol.names = NULL
to recover the old behavior. -
tidy_optim()
now returns the standard error provides the standard error if the Hessian is present. (#529 by @billdenney) (TODO: think about this) -
glance.biglm()
now returns adf.residual
column -
tidy.htest()
column names are now run throughmake.names()
to ensure syntactic correctness (#549 by @karissawhiting) (TODO: use tidyverse name repair?) -
Many
glance()
methods now return the number of observations in anobs
column, which is typically the rightmost column. -
tidy.lmodel2()
now returns ap.value
column (#570) -
Added
tidy.summary_emm()
(#691 by @crsh) -
tidy.zoo()
now doesn't change column names that have spaces or other special characters (previously they were converted to data.frame friendly column names bymake.names
)
-
augment.htest()
:-
.residuals
->.resid
-
.stdres
->.std.resid
-
These changes only effect chi-squared tests
-
-
tidy.ridgelm()
will now always return aGCV
column and never returns anxm
column (#532)
- Bug fix to better allow
tidy.boot()
to support confidence intervals (#581) - Bug fix to allow
augment.kmeans()
to work with masked data (#609) - Bug fix to allow
augment.Mclust()
to work on univariate data (#490) - Bug fix to allow
tidy.htest()
to supports equal variances (#608) - Bug fix for
tidy.mlm()
when passedquick = TRUE
(#539 by @MatthieuStigler) - Bug fix for
tidy.polr()
when passedconf.int = TRUE
(#498) - Bug fix in
glance.lavaan()
(#577) - Added tidiers for
lmrob
andglmrob
objects from therobustbase
package (#205, #505).
Planned
-
Data frame, rowwise data frame, vector and matrix tidiers have been removed
-
bootstrap()
has been removed
Unplanned
The following tidiers have been removed from broom
but were not soft
deprecated in the previous release:
-
tidy.summaryDefault()
,glance.summaryDefault()
are gone -
glance.summary.lm()
-
augment.glmRob()
We regret that we were unable to provide any warning for these changes.
The robust
package does not provide the functionality necessary to implement
an augment method. We are looking into supporting the robustbase
package in
the future.
The following have all been deprecated in favor of broom.mixed
:
-
tidy.brmsfit()
-
tidy.merMod()
,glance.merMod()
,augment.merMod()
-
tidyMCMC()
,tidy.rjags()
,tidy.stanfit()
-
tidy.lme()
,glance.lme()
,augment.lme()
-
tidy.stanreg()
,glance.stanreg()
-
tidy.table()
andtidy.ftable()
have been deprecated in favor oftibble::as_tibble()
-
tidy.summaryDefault()
has been deprecated in favor ofskimr::skim()
tidy()
,glance()
andaugment()
are now re-exported from the generics package.
Tidiers now return tibble::tibble()
s. This release also includes several new
tidiers, new vignettes and a large number of bugfixes. We've also begun to more
rigorously define tidier specifications: we've laid part of the groundwork for
stricter and more consistent tidying, but the new tidier specifications are not
yet complete. These will appear in the next release.
Additionally, users should note that we are in the process of migrating tidying
methods for mixed models and Bayesian models to broom.mixed
. broom.mixed
is
not on CRAN yet, but all mixed model and Bayesian tidiers will be deprecated
once broom.mixed
is on CRAN. No further development of mixed model tidiers
will take place in broom
.
Almost all tidiers should now return tibble
s rather than data.frame
s.
Deprecated tidying methods, Bayesian and mixed model tidiers still return
data.frame
s.
Users are mostly to experience issues when using augment
in situations
where tibbles are stricter than data frames. For example, specifying model
covariates as a matrix object will now error:
library(broom)
library(quantreg)
fit <- rq(stack.loss ~ stack.x, tau = .5)
broom::augment(fit)
#> Error: Column `stack.x` must be a 1d atomic vector or a list
This is because the default data
argument data = model.frame(fit)
cannot be
coerced to tibble
.
Another consequence of this is that augment.survreg
and augment.coxph
from
the survival
package now require that the user explicitly passes data to
either the data
or newdata
arguments.
These restrictions will be relaxed in an upcoming release of broom
pending
support for matrix-columns in tibbles.
Developers are likely to experience issues:
-
subsetting tibbles with
[
, which returns a tibble rather than a vector. -
setting rownames on tibbles, which is deprecated.
-
using matrix and vector tidiers, now deprecated.
-
handling the additional tibble classes
tbl_df
andtbl
beyond thedata.frame
class -
linking to defunct documentation files -- broom recently moved all tidiers to a
roxygen2
template based documentation system.
This version of broom
includes several new vignettes:
-
vignette("available-methods", package = "broom")
contains a table detailing which tidying methods are available -
vignette("adding-tidiers", package = "broom")
is an in-progress guide for contributors on how to add new tidiers to broom -
vignette("glossary", package = "broom")
contains tables describing acceptable argument names and column names for the in-progress new specification.
Several old vignettes have also been updated:
vignette("bootstrapping", package = "broom")
now relies on thersample
package and atidyr::nest
-purrr::map
-tidyr::unnest
workflow. This is now the recommended workflow for working with multiple models, as opposed to the olddplyr::rowwise
-dplyr::do
based workflow.
-
Matrix and vector tidiers have been deprecated in favor of
tibble::as_tibble
andtibble::enframe
-
Dataframe tidiers and rowwise dataframe tidiers have been deprecated
-
bootstrap()
has been deprecated in favor of thersample
-
inflate
has been removed frombroom
-
The
alpha
argument has been removed fromquantreg
tidy methods -
The
separate.levels
argument has been removed fromtidy.TukeyHSD
. To obtain the effect ofseparate.levels = TRUE
, users maytidyr::separate
after tidying. This is consistent with themultcomp
tidier behavior. -
The
fe.error
argument was removed fromtidy.felm
. When fixed effects are tidier, their standard errors are now always included. -
The
diag
argument intidy.dist
has been renameddiagonal
-
Advice to help beginners make PRs (#397 by @karldw)
-
glance
support forarima
objects fit withmethod = "CSS"
(#396 by @josue-rodriguez) -
A bug fix to re-enable tidying
glmnet
objects withfamily = multinomial
(#395 by @erleholgersen) -
A bug fix to allow tidying
quantreg
intercept only models (#378 by @erleholgersen) -
A bug fix for
aovlist
objects (#377 by @mvevans89) -
Support for
glmnetUtils
objects (#352 by @Hong-Revo) -
A bug fix to allow
tidy_emmeans
to handle column names with dashes (#351 by @bmannakee) -
augment.felm
no longer returns.fe_
and.comp
columns -
Support saved formulas in
augment.felm
(#347 by @ShreyasSingh) -
confint_tidy
now drops rows of allNA
(#345 by @atyre2) -
A new tidier for
caret::confusionMatrix
objects (#344 by @mkuehn10) -
Tidiers for
Kendall::Kendall
objects (#343 by @cimentadaj) -
A new tidying method for
car::durbinWatsonTest
objects (#341 by @mkuehn10) -
glance
throws an informative error forquantreg:rq
models fit with multipletau
values (#338 by @bfgray3) -
tidy.glmnet
gains the ability to retain zero-valued coefficients with areturn_zeros
argument that defaults toFALSE
(#337 by @bfgray3) -
tidy.manova
now retains aResiduals
row (#334 by @jarvisc1) -
Tidiers for
ordinal::clm
,ordinal::clmm
,survey::svyolr
andMASS::polr
ordinal model objects (#332 by @larmarange) -
Support for
anova
objects fromcar::Anova
(#325 by @mariusbarth) -
Tidiers for
tseries::garch
models (#323 by @wilsonfreitas) -
Removed dependency on
psych
package (#313 by @nutterb) -
Improved error messages (#303 by @michaelweylandt)
-
Compatibility with new
rstanarm
andloo
packages (#298 by @jgabry) -
Support for tidying lists return by
irlba::irlba
-
A truly huge increase in unit tests (#267 by @dchiu911)
-
Bug fix for
tidy.prcomp
when missing labels (#265 by @corybrunson) -
Added a
pkgdown
site at https://broom.tidyverse.org/ (#260 by @jayhesselberth) -
Added tidiers for
AER::ivreg
models (#247 by @hughjonesd) -
Added tidiers for the
lavaan
package (#233 by @puterleat) -
Added
conf.int
argument totidy.coxph
(#220 by @larmarange) -
Added
augment
method for chi-squared tests (#138 by @larmarange) -
changed default se.type for
tidy.rq
to match that ofquantreg::summary.rq()
(#404 by @ethchr) -
Added argument
quick
fortidy.plm
andtidy.felm
(#502 and #509 by @MatthieuStigler) -
Many small improvements throughout
Many many thanks to all the following for their thoughtful comments on design, bug reports and PRs! The community of broom contributors has been kind, supportive and insightful and I look forward to working you all again!
@atyre2, @batpigandme, @bfgray3, @bmannakee, @briatte, @cawoodjm, @cimentadaj, @dan87134, @dgrtwo, @dmenne, @ekatko1, @ellessenne, @erleholgersen, @ethchr, @Hong-Revo, @huftis, @IndrajeetPatil, @jacob-long, @jarvisc1, @jenzopr, @jgabry, @jimhester, @josue-rodriguez, @karldw, @kfeilich, @larmarange, @lboller, @mariusbarth, @michaelweylandt, @mine-cetinkaya-rundel, @mkuehn10, @mvevans89, @nutterb, @ShreyasSingh, @stephlocke, @strengejacke, @topepo, @willbowditch, @WillemSleegers, @wilsonfreitas, and @MatthieuStigler
-
Fixed gam tidiers to work with "Gam" objects, due to an update in gam 1.15. This fixes failing CRAN tests
-
Improved test coverage (thanks to #267 from Derek Chiu)
-
Changed the deprecated
dplyr::failwith
topurrr::possibly
-
augment
andglance
on NULLs now return an empty data frame -
Deprecated the
inflate()
function in favor oftidyr::crossing
-
Fixed confidence intervals in the gmm tidier (thanks to #242 from David Hugh-Jones)
-
Fixed a bug in bootstrap tidiers (thanks to #167 from Jeremy Biesanz)
-
Fixed tidy.lm with
quick = TRUE
to return terms as character rather than factor (thanks to #191 from Matteo Sostero) -
Added tidiers for
ivreg
objects from the AER package (thanks to #245 from David Hugh-Jones) -
Added tidiers for
survdiff
objects from the survival package (thanks to #147 from Michał Bojanowski) -
Added tidiers for
emmeans
from the emmeans package (thanks to #252 from Matthew Kay) -
Added tidiers for
speedlm
andspeedglm
from the speedglm package (thanks to #248 from David Hugh-Jones) -
Added tidiers for
muhaz
objects from the muhaz package (thanks to #251 from Andreas Bender) -
Added tidiers for
decompose
andstl
objects from stats (thanks to #165 from Aaron Jacobs)
-
Added tidiers for
lsmobj
andref.grid
objects from the lsmeans package -
Added tidiers for
betareg
objects from the betareg package -
Added tidiers for
lmRob
andglmRob
objects from the robust package -
Added tidiers for
brms
objects from the brms package (thanks to #149 from Paul Buerkner) -
Fixed tidiers for orcutt 2.0
-
Changed
tidy.glmnet
to filter out rows where estimate == 0. -
Updates to
rstanarm
tidiers (thanks to #177 from Jonah Gabry) -
Fixed issue with survival package 2.40-1 (thanks to #180 from Marcus Walz)
-
Added AppVeyor, codecov.io, and code of conduct
-
Changed name of "NA's" column in summaryDefault output to "na"
-
Fixed
tidy.TukeyHSD
to includeterm
column. Also addedseparate.levels
argument, with option to separatecomparison
intolevel1
andlevel2
-
Fixed
tidy.manova
to use correct column name for test (previously, alwayspillai
) -
Added
kde_tidiers
to tidy kernel density estimates -
Added
orcutt_tidiers
to tidy the results ofcochrane.orcutt
orcutt package -
Added
tidy.dist
to tidy the distance matrix output ofdist
from the stats package -
Added
tidy
andglance
forlmodel2
objects from the lmodel2 package -
Added tidiers for
poLCA
objects from the poLCA package -
Added tidiers for sparse matrices from the Matrix package
-
Added tidiers for
prcomp
objects -
Added tidiers for
Mclust
objects from the Mclust package -
Added tidiers for
acf
objects -
Fixed to be compatible with dplyr 0.5, which is being submitted to CRAN
-
Added tidiers for geeglm, nlrq, roc, boot, bgterm, kappa, binWidth, binDesign, rcorr, stanfit, rjags, gamlss, and mle2 objects.
-
Added
tidy
methods for lists, including u, d, v lists fromsvd
, and x, y, z lists used byimage
andpersp
-
Added
quick
argument totidy.lm
,tidy.nls
, andtidy.biglm
, to create a smaller and faster version of the output. -
Changed
rowwise_df_tidiers
to allow the original data to be saved as a list column, then provided as a column name toaugment
. This required removingdata
from theaugment
S3 signature. Also addedtests-rowwise.R
-
Fixed various issues in ANOVA output
-
Fixed various issues in lme4 output
-
Fixed issues in tests caused by dev version of ggplot2
-
Added tidiers for "plm" (panel linear model) objects from the plm package.
-
Added
tidy.coeftest
for coeftest objects from the lmtest package. -
Set up
tidy.lm
to work with "mlm" (multiple linear model) objects (those with multiple response columns). -
Added
tidy
andglance
for "biglm" and "bigglm" objects from the biglm package. -
Fixed bug in
tidy.coxph
when one-row matrices are returned -
Added
tidy.power.htest
-
Added
tidy
andglance
forsummaryDefault
objects -
Added tidiers for "lme" (linear mixed effects models) from the nlme package
-
Added
tidy
andglance
formultinom
objects from the nnet package.
-
Fixed bug in
tidy.pairwise.htest
, which now can handle cases where the grouping variable is numeric. -
Added
tidy.aovlist
method. This addedstringr
package to IMPORTS to trim whitespace from the beginning and end of theterm
andstratum
columns. This also required adjustingtidy.aov
so that it could handle strata that are missing p-values. -
Set up
glance.lm
to work withaov
objects along withlm
objects. -
Added
tidy
andglance
for matrix objects, withtidy.matrix
converting a matrix to a data frame with rownames included, andglance.matrix
returning the same result asglance.data.frame
. -
Changed DESCRIPTION Authors@R to new format
-
Fixed small bug in
felm
where the.fitted
and.resid
columns were matrices rather than vectors. -
Added tidiers for
rlm
(robust linear model) andgam
(generalized additive model) objects, including adjustments to "lm" tidiers in order to handle them. See?rlm_tidiers
and?gam_tidiers
for more. -
Removed rownames from
tidy.cv.glmnet
output
-
The behavior of
augment
, particularly with regard to missing data and thena.exclude
argument, has through the use of theaugment_columns
function been made consistent across the following models:-
lm
-
glm
-
nls
-
merMod
(lme4
) -
survreg
(survival
) -
coxph
(survival
)
-
Unit tests in tests/testthat/test-augment.R
were added to ensure consistency
across these models.
tidy
,augment
andglance
methods were added forrowwise_df
objects, and are set up to apply across their rows. This allows for simple patterns such as:
regressions <- mtcars %>% group_by(cyl) %>% do(mod = lm(mpg ~ wt, .)) regressions %>% tidy(mod) regressions %>% augment(mod)
See ?rowwise_df_tidiers
for more.
-
Added
tidy
andglance
methods forArima
objects, andtidy
forpairwise.htest
objects. -
Fixes for CRAN: change package description to title case, removed NOTES, mostly by adding
globals.R
to declare global variables. -
This is the original version published on CRAN.
-
Tidiers have been added for S3 objects from the following packages:
-
lme4
-
glmnet
-
survival
-
zoo
-
felm
-
MASS
(ridgelm
objects)
-
-
tidy
andglance
methods for data.frames have also been added, andaugment.data.frame
produces an error (rather than returning the same data.frame). -
stderror
has been changed tostd.error
(affects many functions) to be consistent with broom's naming conventions for columns. -
A function
bootstrap
has been added based on this example, to perform the common use case of bootstrapping models.
-
Added "augment" S3 generic and various implementations. "augment" does something different from tidy: it adds columns to the original dataset, including predictions, residuals, or cluster assignments. This was originally described as "fortify" in ggplot2.
-
Added "glance" S3 generic and various implementations. "glance" produces a one-row data frame summary, which is necessary for tidy outputs with values like R^2 or F-statistics.
-
Re-wrote intro broom vignette/README to introduce all three methods.
-
Wrote a new kmeans vignette.
-
Added tidying methods for multcomp, sp, and map objects (from fortify-multcomp, fortify-sp, and fortify-map from ggplot2).
-
Because this integrates substantial amounts of ggplot2 code (with permission), added Hadley Wickham as an author in DESCRIPTION.