Home forums Mixed Models Significance of random effects

• Author
Posts
• #67
Tanya Jonker
Participant

I would like to test whether including a random effect in my model accounts significant variance in my data. What is the best way to test this? Does afex have a built-in function for comparing models with and without a random effect?

• #68
henrik
Keymaster

This is not a simple question, but the short answer is that `afex` does not have a function for automatically testing whether or not a certain random effects term accounts for a significant portion of the variance. The reason for the absence of this functionality is that following Barr, Levy, Scheepers, and Tily (2013) and Schielzeth and Forstmeier (2009) including all possible random effects (i.e., ‘the maximal random effects structure justified by the design’) is the conservative thing to do. So random effects that are part of the design should in principle be included (and there are rumors that reviewers require this).

A slightly different perspective on this question comes from `lme4` author Doug Bates, Harald Baayen, Reinhold Kliegl and others (Bates, Kliegl, Vasishth, & Baayen, 2015; Matuschek, Kliegl, Vasishth, Baayen, & Bates, 2017) who highlight that an overparameterized model (i.e., models that contain more random effects than can be estimated given the current data) may be prone to failing to converge and exhibit reduced statistical power. Note that these authors nevertheless agree that removing random effects for which there is actually random variation is potentially dangerous (in terms of increasing the Type 1 error rate). So the question then becomes what is the threshold for which we want to say that a random effects term is large enough or ‘significant’. In Matuschek et al., they suggest a downward procedure (i.e., beginning with the maximal model and beginning with the largest effect) with alpha = .2 (instead of the more common alpha = .05). Again, the idea is to avoid situations in which a random effects term is reduced that actually shows random variability.

To gather information about the size of a random effect (irrespective of significance), one can simply call `summary` on mixed objects:

``````require(afex)
data("sk2011.2")
sk2_aff <- droplevels(sk2011.2[sk2011.2\$what == "affirmation",])
sk_m1 <- mixed(response ~ instruction*inference*type+(inference*type|id), sk2_aff)
summary(sk_m1)``````

Which gives:

``````Linear mixed model fit by REML ['merModLmerTest']
Formula: response ~ instruction * inference * type + (inference * type |      id)
Data: sk2_aff

REML criterion at convergence: 10393.7

Scaled residuals:
Min      1Q  Median      3Q     Max
-3.9241 -0.3314  0.0654  0.3796  3.7675

Random effects:
Groups   Name             Variance Std.Dev. Corr
id       (Intercept)      141.84   11.910
inference1       158.83   12.603   -0.71
type1             39.81    6.310   -0.24 -0.09
type2             10.43    3.230    0.43  0.23 -0.41
inference1:type1  54.38    7.374   -0.05 -0.33 -0.55 -0.42
inference1:type2   4.33    2.081   -0.39  0.50  0.16 -0.31 -0.31
Residual                  443.39   21.057
Number of obs: 1134, groups:  id, 63
[...]``````

The size of the random effects is given by the variance or standard deviation of the parameters (and the variances can be related to the residual variance also given). Note that `summary` is an `lme4` function and no `afex` function so the output is given in terms of the model parameters and not the factors. So for `type`, which is a factor with three levels, we have two parameters where neither can be easily mapped to the three factor levels and should be considered jointly (the same is true for the `inference:type` interaction).
We can now fit another model for which we remove the highest order random effect, the `inference:type` interaction. Then we can compare the model fit of the two models using the `anova` function:

``````sk_m2 <- mixed(response ~ instruction*inference*type+(inference+type|id), sk2_aff)
anova(sk_m1, sk_m2)``````

This gives:

``````Data: sk2_aff
Models:
sk_m2: response ~ instruction * inference * type + (inference + type |
sk_m2:     id)
sk_m1: response ~ instruction * inference * type + (inference * type |
sk_m1:     id)
Df   AIC   BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
sk_m2 23 10496 10612 -5225.1    10450
sk_m1 34 10462 10633 -5196.9    10394 56.568     11  3.997e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1``````

Here the likelihood ratio tests performed by `anova` indicates that removing the highest order effect leads to a significant reduction in model fit with chi^2 (11) = 56.57, p < .0001.

Note that for `afex` versions prior to 0.17 (which will arrive on CRAN shortly), there was an error in the model labels when calling `anova` in which case the following produces the same output:
`anova(sk_m1\$full_model, sk_m2\$full_model, refit = FALSE)`

References:
– Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious Mixed Models. arXiv:1506.04967 [Stat]. Retrieved from http://arxiv.org/abs/1506.04967
– Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001
– Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.jml.2017.01.001
– Schielzeth, H., & Forstmeier, W. (2009). Conclusions beyond support: overconfident estimates in mixed models. Behavioral Ecology, 20(2), 416–420. https://doi.org/10.1093/beheco/arn145