diff --git a/R/check_autocorrelation.R b/R/check_autocorrelation.R index 52ac9b53c..70930e1d6 100644 --- a/R/check_autocorrelation.R +++ b/R/check_autocorrelation.R @@ -11,6 +11,8 @@ #' @return Invisibly returns the p-value of the test statistics. A p-value < 0.05 #' indicates autocorrelated residuals. #' +#' @family checking model assumptions and quality +#' #' @details Performs a Durbin-Watson-Test to check for autocorrelated residuals. #' In case of autocorrelation, robust standard errors return more accurate #' results for the estimates, or maybe a mixed model with error term for the diff --git a/R/check_collinearity.R b/R/check_collinearity.R index 636ac4a98..8eb2a27de 100644 --- a/R/check_collinearity.R +++ b/R/check_collinearity.R @@ -110,6 +110,8 @@ #' common statistical problems: Data exploration. Methods in Ecology and #' Evolution (2010) 1:3–14. #' +#' @family checking model assumptions and quality +#' #' @note The code to compute the confidence intervals for the VIF and tolerance #' values was adapted from the Appendix B from the Marcoulides et al. paper. #' Thus, credits go to these authors the original algorithm. There is also diff --git a/R/check_convergence.R b/R/check_convergence.R index 4580001e6..192f09377 100644 --- a/R/check_convergence.R +++ b/R/check_convergence.R @@ -12,38 +12,39 @@ #' @return `TRUE` if convergence is fine and `FALSE` if convergence #' is suspicious. Additionally, the convergence value is returned as attribute. #' -#' @details \subsection{Convergence and log-likelihood}{ -#' Convergence problems typically arise when the model hasn't converged -#' to a solution where the log-likelihood has a true maximum. This may result -#' in unreliable and overly complex (or non-estimable) estimates and standard -#' errors. -#' } -#' \subsection{Inspect model convergence}{ -#' **lme4** performs a convergence-check (see `?lme4::convergence`), -#' however, as as discussed [here](https://github.com/lme4/lme4/issues/120) -#' and suggested by one of the lme4-authors in -#' [this comment](https://github.com/lme4/lme4/issues/120#issuecomment-39920269), -#' this check can be too strict. `check_convergence()` thus provides an -#' alternative convergence test for `merMod`-objects. -#' } -#' \subsection{Resolving convergence issues}{ -#' Convergence issues are not easy to diagnose. The help page on -#' `?lme4::convergence` provides most of the current advice about -#' how to resolve convergence issues. Another clue might be large parameter -#' values, e.g. estimates (on the scale of the linear predictor) larger than -#' 10 in (non-identity link) generalized linear model *might* indicate -#' [complete separation](https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/). -#' Complete separation can be addressed by regularization, e.g. penalized -#' regression or Bayesian regression with appropriate priors on the fixed effects. -#' } -#' \subsection{Convergence versus Singularity}{ -#' Note the different meaning between singularity and convergence: singularity -#' indicates an issue with the "true" best estimate, i.e. whether the maximum -#' likelihood estimation for the variance-covariance matrix of the random effects -#' is positive definite or only semi-definite. Convergence is a question of -#' whether we can assume that the numerical optimization has worked correctly -#' or not. -#' } +#' @section Convergence and log-likelihood: +#' Convergence problems typically arise when the model hasn't converged +#' to a solution where the log-likelihood has a true maximum. This may result +#' in unreliable and overly complex (or non-estimable) estimates and standard +#' errors. +#' +#' @section Inspect model convergence: +#' **lme4** performs a convergence-check (see `?lme4::convergence`), +#' however, as as discussed [here](https://github.com/lme4/lme4/issues/120) +#' and suggested by one of the lme4-authors in +#' [this comment](https://github.com/lme4/lme4/issues/120#issuecomment-39920269), +#' this check can be too strict. `check_convergence()` thus provides an +#' alternative convergence test for `merMod`-objects. +#' +#' @section Resolving convergence issues: +#' Convergence issues are not easy to diagnose. The help page on +#' `?lme4::convergence` provides most of the current advice about +#' how to resolve convergence issues. Another clue might be large parameter +#' values, e.g. estimates (on the scale of the linear predictor) larger than +#' 10 in (non-identity link) generalized linear model *might* indicate +#' [complete separation](https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/). +#' Complete separation can be addressed by regularization, e.g. penalized +#' regression or Bayesian regression with appropriate priors on the fixed effects. +#' +#' @section Convergence versus Singularity: +#' Note the different meaning between singularity and convergence: singularity +#' indicates an issue with the "true" best estimate, i.e. whether the maximum +#' likelihood estimation for the variance-covariance matrix of the random effects +#' is positive definite or only semi-definite. Convergence is a question of +#' whether we can assume that the numerical optimization has worked correctly +#' or not. +#' +#' @family checking model assumptions and quality #' #' @examples #' if (require("lme4")) { diff --git a/R/check_heteroscedasticity.R b/R/check_heteroscedasticity.R index 2d074737c..55bf00da8 100644 --- a/R/check_heteroscedasticity.R +++ b/R/check_heteroscedasticity.R @@ -18,6 +18,8 @@ #' #' @references Breusch, T. S., and Pagan, A. R. (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47, 1287-1294. #' +#' @family checking model assumptions and quality +#' #' @examples #' m <<- lm(mpg ~ wt + cyl + gear + disp, data = mtcars) #' check_heteroscedasticity(m) diff --git a/R/check_homogeneity.R b/R/check_homogeneity.R index 1a8fc9a68..f746037cf 100644 --- a/R/check_homogeneity.R +++ b/R/check_homogeneity.R @@ -18,6 +18,8 @@ #' #' @note There is also a [`plot()`-method](https://easystats.github.io/see/articles/performance.html) implemented in the \href{https://easystats.github.io/see/}{\pkg{see}-package}. #' +#' @family checking model assumptions and quality +#' #' @examples #' model <<- lm(len ~ supp + dose, data = ToothGrowth) #' check_homogeneity(model) diff --git a/R/check_model.R b/R/check_model.R index f057d4b49..c4f7a280d 100644 --- a/R/check_model.R +++ b/R/check_model.R @@ -131,6 +131,8 @@ #' look at the `check` argument and see if some of the model checks could be #' skipped, which also increases performance. #' +#' @family checking model assumptions and quality +#' #' @examples #' \dontrun{ #' m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars) diff --git a/R/check_multimodal.R b/R/check_multimodal.R index fb837c517..9223ee59a 100644 --- a/R/check_multimodal.R +++ b/R/check_multimodal.R @@ -2,7 +2,7 @@ #' #' For univariate distributions (one-dimensional vectors), this functions #' performs a Ameijeiras-Alonso et al. (2018) excess mass test. For multivariate -#' distributions (dataframes), it uses mixture modelling. However, it seems that +#' distributions (data frames), it uses mixture modelling. However, it seems that #' it always returns a significant result (suggesting that the distribution is #' multimodal). A better method might be needed here. #' diff --git a/R/check_outliers.R b/R/check_outliers.R index ec25e3aa0..d02b1f145 100644 --- a/R/check_outliers.R +++ b/R/check_outliers.R @@ -34,6 +34,8 @@ #' function. Note that the function will (silently) return a vector of `FALSE` #' for non-supported data types such as character strings. #' +#' @family checking model assumptions and quality +#' #' @note There is also a #' [`plot()`-method](https://easystats.github.io/see/articles/performance.html) #' implemented in the diff --git a/R/check_overdispersion.R b/R/check_overdispersion.R index 4661da515..a84ae9273 100644 --- a/R/check_overdispersion.R +++ b/R/check_overdispersion.R @@ -16,19 +16,17 @@ #' with the mean and, therefore, variance usually (roughly) equals the mean #' value. If the variance is much higher, the data are "overdispersed". #' -#' \subsection{Interpretation of the Dispersion Ratio}{ +#' @section Interpretation of the Dispersion Ratio: #' If the dispersion ratio is close to one, a Poisson model fits well to the #' data. Dispersion ratios larger than one indicate overdispersion, thus a #' negative binomial model or similar might fit better to the data. A p-value < #' .05 indicates overdispersion. -#' } #' -#' \subsection{Overdispersion in Poisson Models}{ +#' @section Overdispersion in Poisson Models: #' For Poisson models, the overdispersion test is based on the code from -#' \cite{Gelman and Hill (2007), page 115}. -#' } +#' _Gelman and Hill (2007), page 115_. #' -#' \subsection{Overdispersion in Mixed Models}{ +#' @section Overdispersion in Mixed Models: #' For `merMod`- and `glmmTMB`-objects, `check_overdispersion()` #' is based on the code in the #' [GLMM FAQ](http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html), @@ -36,16 +34,15 @@ #' function only returns an *approximate* estimate of an overdispersion #' parameter, and is probably inaccurate for zero-inflated mixed models (fitted #' with `glmmTMB`). -#' } #' -#' \subsection{How to fix Overdispersion}{ +#' @section How to fix Overdispersion: #' Overdispersion can be fixed by either modeling the dispersion parameter, or #' by choosing a different distributional family (like Quasi-Poisson, or -#' negative binomial, see \cite{Gelman and Hill (2007), pages 115-116}). -#' } +#' negative binomial, see _Gelman and Hill (2007), pages 115-116_). #' -#' @references +#' @family checking model assumptions and quality #' +#' @references #' - Bolker B et al. (2017): #' [GLMM FAQ.](http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html) #' diff --git a/R/check_singularity.R b/R/check_singularity.R index b8cf5f52b..0891c0324 100644 --- a/R/check_singularity.R +++ b/R/check_singularity.R @@ -12,39 +12,39 @@ #' @return `TRUE` if the model fit is singular. #' #' @details If a model is "singular", this means that some dimensions of the -#' variance-covariance matrix have been estimated as exactly zero. This -#' often occurs for mixed models with complex random effects structures. -#' \cr \cr -#' \dQuote{While singular models are statistically well defined (it is -#' theoretically sensible for the true maximum likelihood estimate to -#' correspond to a singular fit), there are real concerns that (1) singular -#' fits correspond to overfitted models that may have poor power; (2) chances -#' of numerical problems and mis-convergence are higher for singular models -#' (e.g. it may be computationally difficult to compute profile confidence -#' intervals for such models); (3) standard inferential procedures such as -#' Wald statistics and likelihood ratio tests may be inappropriate.} -#' (\cite{lme4 Reference Manual}) -#' \cr \cr -#' There is no gold-standard about how to deal with singularity and which -#' random-effects specification to choose. Beside using fully Bayesian methods -#' (with informative priors), proposals in a frequentist framework are: -#' -ize{ -#' - avoid fitting overly complex models, such that the -#' variance-covariance matrices can be estimated precisely enough -#' (\cite{Matuschek et al. 2017}) -#' - use some form of model selection to choose a model that balances -#' predictive accuracy and overfitting/type I error (\cite{Bates et al. 2015}, -#' \cite{Matuschek et al. 2017}) -#' - \dQuote{keep it maximal}, i.e. fit the most complex model consistent -#' with the experimental design, removing only terms required to allow a -#' non-singular fit (\cite{Barr et al. 2013}) -#' } -#' Note the different meaning between singularity and convergence: singularity -#' indicates an issue with the "true" best estimate, i.e. whether the maximum -#' likelihood estimation for the variance-covariance matrix of the random -#' effects is positive definite or only semi-definite. Convergence is a -#' question of whether we can assume that the numerical optimization has -#' worked correctly or not. +#' variance-covariance matrix have been estimated as exactly zero. This +#' often occurs for mixed models with complex random effects structures. +#' +#' "While singular models are statistically well defined (it is theoretically +#' sensible for the true maximum likelihood estimate to correspond to a singular +#' fit), there are real concerns that (1) singular fits correspond to overfitted +#' models that may have poor power; (2) chances of numerical problems and +#' mis-convergence are higher for singular models (e.g. it may be computationally +#' difficult to compute profile confidence intervals for such models); (3) +#' standard inferential procedures such as Wald statistics and likelihood ratio +#' tests may be inappropriate." (_lme4 Reference Manual_) +#' +#' There is no gold-standard about how to deal with singularity and which +#' random-effects specification to choose. Beside using fully Bayesian methods +#' (with informative priors), proposals in a frequentist framework are: +#' +#' - avoid fitting overly complex models, such that the variance-covariance +#' matrices can be estimated precisely enough (_Matuschek et al. 2017_) +#' - use some form of model selection to choose a model that balances +#' predictive accuracy and overfitting/type I error (_Bates et al. 2015_, +#' _Matuschek et al. 2017_) +#' - "keep it maximal", i.e. fit the most complex model consistent with the +#' experimental design, removing only terms required to allow a non-singular +#' fit (_Barr et al. 2013_) +#' +#' Note the different meaning between singularity and convergence: singularity +#' indicates an issue with the "true" best estimate, i.e. whether the maximum +#' likelihood estimation for the variance-covariance matrix of the random +#' effects is positive definite or only semi-definite. Convergence is a +#' question of whether we can assume that the numerical optimization has +#' worked correctly or not. +#' +#' @family checking model assumptions and quality #' #' @references #' - Bates D, Kliegl R, Vasishth S, Baayen H. Parsimonious Mixed Models. diff --git a/R/check_zeroinflation.R b/R/check_zeroinflation.R index 24c9435b1..5cc1f75de 100644 --- a/R/check_zeroinflation.R +++ b/R/check_zeroinflation.R @@ -2,22 +2,24 @@ #' @name check_zeroinflation #' #' @description `check_zeroinflation()` checks whether count models are -#' over- or underfitting zeros in the outcome. +#' over- or underfitting zeros in the outcome. #' -#' @param x Fitted model of class `merMod`, `glmmTMB`, `glm`, -#' or `glm.nb` (package \pkg{MASS}). +#' @param x Fitted model of class `merMod`, `glmmTMB`, `glm`, or `glm.nb` +#' (package **MASS**). #' @param tolerance The tolerance for the ratio of observed and predicted -#' zeros to considered as over- or underfitting zeros. A ratio -#' between 1 +/- `tolerance` is considered as OK, while a ratio -#' beyond or below this threshold would indicate over- or underfitting. +#' zeros to considered as over- or underfitting zeros. A ratio +#' between 1 +/- `tolerance` is considered as OK, while a ratio +#' beyond or below this threshold would indicate over- or underfitting. #' #' @return A list with information about the amount of predicted and observed -#' zeros in the outcome, as well as the ratio between these two values. +#' zeros in the outcome, as well as the ratio between these two values. #' #' @details If the amount of observed zeros is larger than the amount of -#' predicted zeros, the model is underfitting zeros, which indicates a -#' zero-inflation in the data. In such cases, it is recommended to use -#' negative binomial or zero-inflated models. +#' predicted zeros, the model is underfitting zeros, which indicates a +#' zero-inflation in the data. In such cases, it is recommended to use +#' negative binomial or zero-inflated models. +#' +#' @family checking model assumptions and quality #' #' @examples #' if (require("glmmTMB")) { diff --git a/man/check_autocorrelation.Rd b/man/check_autocorrelation.Rd index e29ae8892..29b91de42 100644 --- a/man/check_autocorrelation.Rd +++ b/man/check_autocorrelation.Rd @@ -34,3 +34,16 @@ cluster groups should be used. m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars) check_autocorrelation(m) } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_collinearity}()}, +\code{\link{check_convergence}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_model}()}, +\code{\link{check_outliers}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_singularity}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_collinearity.Rd b/man/check_collinearity.Rd index bbb3b7567..af8a79386 100644 --- a/man/check_collinearity.Rd +++ b/man/check_collinearity.Rd @@ -159,3 +159,16 @@ common statistical problems: Data exploration. Methods in Ecology and Evolution (2010) 1:3–14. } } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_convergence}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_model}()}, +\code{\link{check_outliers}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_singularity}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_convergence.Rd b/man/check_convergence.Rd index 5f8886e45..ce2370847 100644 --- a/man/check_convergence.Rd +++ b/man/check_convergence.Rd @@ -22,14 +22,16 @@ is suspicious. Additionally, the convergence value is returned as attribute. \code{check_convergence()} provides an alternative convergence test for \code{merMod}-objects. } -\details{ -\subsection{Convergence and log-likelihood}{ +\section{Convergence and log-likelihood}{ + Convergence problems typically arise when the model hasn't converged to a solution where the log-likelihood has a true maximum. This may result in unreliable and overly complex (or non-estimable) estimates and standard errors. } -\subsection{Inspect model convergence}{ + +\section{Inspect model convergence}{ + \strong{lme4} performs a convergence-check (see \code{?lme4::convergence}), however, as as discussed \href{https://github.com/lme4/lme4/issues/120}{here} and suggested by one of the lme4-authors in @@ -37,7 +39,9 @@ and suggested by one of the lme4-authors in this check can be too strict. \code{check_convergence()} thus provides an alternative convergence test for \code{merMod}-objects. } -\subsection{Resolving convergence issues}{ + +\section{Resolving convergence issues}{ + Convergence issues are not easy to diagnose. The help page on \code{?lme4::convergence} provides most of the current advice about how to resolve convergence issues. Another clue might be large parameter @@ -47,7 +51,9 @@ values, e.g. estimates (on the scale of the linear predictor) larger than Complete separation can be addressed by regularization, e.g. penalized regression or Bayesian regression with appropriate priors on the fixed effects. } -\subsection{Convergence versus Singularity}{ + +\section{Convergence versus Singularity}{ + Note the different meaning between singularity and convergence: singularity indicates an issue with the "true" best estimate, i.e. whether the maximum likelihood estimation for the variance-covariance matrix of the random effects @@ -55,7 +61,7 @@ is positive definite or only semi-definite. Convergence is a question of whether we can assume that the numerical optimization has worked correctly or not. } -} + \examples{ if (require("lme4")) { data(cbpp) @@ -83,3 +89,16 @@ if (require("glmmTMB")) { } } } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_collinearity}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_model}()}, +\code{\link{check_outliers}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_singularity}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_heteroscedasticity.Rd b/man/check_heteroscedasticity.Rd index 77906cfa0..470706201 100644 --- a/man/check_heteroscedasticity.Rd +++ b/man/check_heteroscedasticity.Rd @@ -43,3 +43,16 @@ if (require("see")) { \references{ Breusch, T. S., and Pagan, A. R. (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47, 1287-1294. } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_collinearity}()}, +\code{\link{check_convergence}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_model}()}, +\code{\link{check_outliers}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_singularity}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_homogeneity.Rd b/man/check_homogeneity.Rd index fa775f693..68d8654dc 100644 --- a/man/check_homogeneity.Rd +++ b/man/check_homogeneity.Rd @@ -43,3 +43,16 @@ if (require("see")) { plot(result) } } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_collinearity}()}, +\code{\link{check_convergence}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_model}()}, +\code{\link{check_outliers}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_singularity}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_model.Rd b/man/check_model.Rd index 7c2c09c99..d682d8098 100644 --- a/man/check_model.Rd +++ b/man/check_model.Rd @@ -202,3 +202,16 @@ if (require("rstanarm")) { } } } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_collinearity}()}, +\code{\link{check_convergence}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_outliers}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_singularity}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_multimodal.Rd b/man/check_multimodal.Rd index 6e670ad85..43153734a 100644 --- a/man/check_multimodal.Rd +++ b/man/check_multimodal.Rd @@ -14,7 +14,7 @@ check_multimodal(x, ...) \description{ For univariate distributions (one-dimensional vectors), this functions performs a Ameijeiras-Alonso et al. (2018) excess mass test. For multivariate -distributions (dataframes), it uses mixture modelling. However, it seems that +distributions (data frames), it uses mixture modelling. However, it seems that it always returns a significant result (suggesting that the distribution is multimodal). A better method might be needed here. } diff --git a/man/check_outliers.Rd b/man/check_outliers.Rd index 2e1212b2c..1552a2e7d 100644 --- a/man/check_outliers.Rd +++ b/man/check_outliers.Rd @@ -347,3 +347,16 @@ outliers and leverage points. Journal of the American Statistical association, 85(411), 633-639. } } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_collinearity}()}, +\code{\link{check_convergence}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_model}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_singularity}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_overdispersion.Rd b/man/check_overdispersion.Rd index 3feb5c32e..b61ecffa5 100644 --- a/man/check_overdispersion.Rd +++ b/man/check_overdispersion.Rd @@ -25,20 +25,23 @@ Overdispersion occurs when the observed variance is higher than the variance of a theoretical model. For Poisson models, variance increases with the mean and, therefore, variance usually (roughly) equals the mean value. If the variance is much higher, the data are "overdispersed". +} +\section{Interpretation of the Dispersion Ratio}{ -\subsection{Interpretation of the Dispersion Ratio}{ If the dispersion ratio is close to one, a Poisson model fits well to the data. Dispersion ratios larger than one indicate overdispersion, thus a negative binomial model or similar might fit better to the data. A p-value < .05 indicates overdispersion. } -\subsection{Overdispersion in Poisson Models}{ +\section{Overdispersion in Poisson Models}{ + For Poisson models, the overdispersion test is based on the code from -\cite{Gelman and Hill (2007), page 115}. +\emph{Gelman and Hill (2007), page 115}. } -\subsection{Overdispersion in Mixed Models}{ +\section{Overdispersion in Mixed Models}{ + For \code{merMod}- and \code{glmmTMB}-objects, \code{check_overdispersion()} is based on the code in the \href{http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html}{GLMM FAQ}, @@ -48,12 +51,13 @@ parameter, and is probably inaccurate for zero-inflated mixed models (fitted with \code{glmmTMB}). } -\subsection{How to fix Overdispersion}{ +\section{How to fix Overdispersion}{ + Overdispersion can be fixed by either modeling the dispersion parameter, or by choosing a different distributional family (like Quasi-Poisson, or -negative binomial, see \cite{Gelman and Hill (2007), pages 115-116}). -} +negative binomial, see \emph{Gelman and Hill (2007), pages 115-116}). } + \examples{ \dontshow{if (getRversion() >= "4.0.0" && require("glmmTMB", quietly = TRUE)) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} @@ -79,3 +83,16 @@ multilevel/hierarchical models. Cambridge; New York: Cambridge University Press. } } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_collinearity}()}, +\code{\link{check_convergence}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_model}()}, +\code{\link{check_outliers}()}, +\code{\link{check_singularity}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_singularity.Rd b/man/check_singularity.Rd index e5f897cc0..1fe0aec0d 100644 --- a/man/check_singularity.Rd +++ b/man/check_singularity.Rd @@ -25,32 +25,30 @@ Check mixed models for boundary fits. If a model is "singular", this means that some dimensions of the variance-covariance matrix have been estimated as exactly zero. This often occurs for mixed models with complex random effects structures. -\cr \cr -\dQuote{While singular models are statistically well defined (it is -theoretically sensible for the true maximum likelihood estimate to -correspond to a singular fit), there are real concerns that (1) singular -fits correspond to overfitted models that may have poor power; (2) chances -of numerical problems and mis-convergence are higher for singular models -(e.g. it may be computationally difficult to compute profile confidence -intervals for such models); (3) standard inferential procedures such as -Wald statistics and likelihood ratio tests may be inappropriate.} -(\cite{lme4 Reference Manual}) -\cr \cr + +"While singular models are statistically well defined (it is theoretically +sensible for the true maximum likelihood estimate to correspond to a singular +fit), there are real concerns that (1) singular fits correspond to overfitted +models that may have poor power; (2) chances of numerical problems and +mis-convergence are higher for singular models (e.g. it may be computationally +difficult to compute profile confidence intervals for such models); (3) +standard inferential procedures such as Wald statistics and likelihood ratio +tests may be inappropriate." (\emph{lme4 Reference Manual}) + There is no gold-standard about how to deal with singularity and which random-effects specification to choose. Beside using fully Bayesian methods (with informative priors), proposals in a frequentist framework are: --ize{ \itemize{ -\item avoid fitting overly complex models, such that the -variance-covariance matrices can be estimated precisely enough -(\cite{Matuschek et al. 2017}) +\item avoid fitting overly complex models, such that the variance-covariance +matrices can be estimated precisely enough (\emph{Matuschek et al. 2017}) \item use some form of model selection to choose a model that balances -predictive accuracy and overfitting/type I error (\cite{Bates et al. 2015}, -\cite{Matuschek et al. 2017}) -\item \dQuote{keep it maximal}, i.e. fit the most complex model consistent -with the experimental design, removing only terms required to allow a -non-singular fit (\cite{Barr et al. 2013}) +predictive accuracy and overfitting/type I error (\emph{Bates et al. 2015}, +\emph{Matuschek et al. 2017}) +\item "keep it maximal", i.e. fit the most complex model consistent with the +experimental design, removing only terms required to allow a non-singular +fit (\emph{Barr et al. 2013}) } + Note the different meaning between singularity and convergence: singularity indicates an issue with the "true" best estimate, i.e. whether the maximum likelihood estimation for the variance-covariance matrix of the random @@ -58,7 +56,6 @@ effects is positive definite or only semi-definite. Convergence is a question of whether we can assume that the numerical optimization has worked correctly or not. } -} \examples{ \dontshow{if (require("lme4")) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf} library(lme4) @@ -93,3 +90,16 @@ I error and power in linear mixed models. Journal of Memory and Language, \item lme4 Reference Manual, \url{https://cran.r-project.org/package=lme4} } } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_collinearity}()}, +\code{\link{check_convergence}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_model}()}, +\code{\link{check_outliers}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_zeroinflation}()} +} +\concept{checking model assumptions and quality} diff --git a/man/check_zeroinflation.Rd b/man/check_zeroinflation.Rd index 90744e1f1..2460bf2a7 100644 --- a/man/check_zeroinflation.Rd +++ b/man/check_zeroinflation.Rd @@ -7,8 +7,8 @@ check_zeroinflation(x, tolerance = 0.05) } \arguments{ -\item{x}{Fitted model of class \code{merMod}, \code{glmmTMB}, \code{glm}, -or \code{glm.nb} (package \pkg{MASS}).} +\item{x}{Fitted model of class \code{merMod}, \code{glmmTMB}, \code{glm}, or \code{glm.nb} +(package \strong{MASS}).} \item{tolerance}{The tolerance for the ratio of observed and predicted zeros to considered as over- or underfitting zeros. A ratio @@ -36,3 +36,16 @@ if (require("glmmTMB")) { check_zeroinflation(m) } } +\seealso{ +Other checking model assumptions and quality: +\code{\link{check_autocorrelation}()}, +\code{\link{check_collinearity}()}, +\code{\link{check_convergence}()}, +\code{\link{check_heteroscedasticity}()}, +\code{\link{check_homogeneity}()}, +\code{\link{check_model}()}, +\code{\link{check_outliers}()}, +\code{\link{check_overdispersion}()}, +\code{\link{check_singularity}()} +} +\concept{checking model assumptions and quality}