September 26, 2017 at 14:45 UTC #133
Douglas Bates states (slide 91 in this presentation) that the following models are equivalent in case of compound symmetry:
m1 <- lmer(y ~ factor + (0 + factor|group)) m2 <- lmer(y ~ factor + (1|group) + (1|group:factor))
If one assumes that group = subjects (which is quite often the case in factorial experiments) this reductions looks (in my eyes) very similar to the reduction from
m4in Henrik Singmann’s post. As m4 corresponds to the RM-ANOVA model (which implicitly assumes compound symmtry/sphericity) I assume that the equivalence of
m2given compound symmetry is the same as the equivalence of
m4given compound symmetry:
m3 <- mixed(Recall~Task*Valence + (Valence+Task|Subject), df) m4 <- mixed(Recall~Task*Valence + (1|Subject) + (1|Task:Subject) + (1|Valence:Subject), df)
1. Does this equivalence hold true if I use
2. Can anyone explain why
m2in case of compound symmetry?
Note that I asked the 2nd question here, but there is still no answer.
October 2, 2017 at 16:23 UTC #134
The slides are a little bit old, so I am not sure if what Doug Bates said in 2008 still holds in the current
lmerversion. However, it is nevertheless worth while to think about what each model actually estimates. Let us do this with some random data generated based on a
stats.stackexchangequestion from some time ago.
We first create some data.
require(afex) set.seed(666) d <- data.frame( y = rnorm(96), group = factor(rep(1:12, 4)), factor = factor(rep(1:2, each=24)))
Then we fit a bunch of different models. Note that we set
REML = FALSEor equivalently
method = "LRT"for
mixed. That makes the interpretation somewhat easier.
m1 <- lmer(y ~ factor + (0 + factor|group), d, REML = FALSE) summary(m1) # Random effects: # Groups Name Variance Std.Dev. Corr # group factor1 0.04164 0.2041 # factor2 0.06435 0.2537 0.79 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group, 12
Model one estimates two random-effects variances (one for each level of
factor) and one correlation.
m2 <- lmer(y ~ factor + (1|group) + (1|group:factor), d, REML = FALSE) summary(m2) # Random effects: # Groups Name Variance Std.Dev. # group:factor (Intercept) 0.01216 0.1103 # group (Intercept) 0.04083 0.2021 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group:factor, 24; group, 12
In contrast, model 2 only estimates two random effects variances. However, the residual variance seems to stay the same
We can extend this and look at a few more models.
m3 <- mixed(y ~ factor + (factor|group), d, method = "LRT") summary(m3) # Random effects: # Groups Name Variance Std.Dev. Corr # group (Intercept) 0.046914 0.21660 # factor1 0.006082 0.07799 -0.34 # Residual 1.021271 1.01058 # Number of obs: 96, groups: group, 12
contr.sum(because that is the default with
mixed) but is otherwise similar to
m1: 2 variances and 1 correlation. The residual variance is still the same.
m4 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, method = "LRT") summary(m4) # Random effects: # Groups Name Variance Std.Dev. # group (Intercept) 0.046914 0.21660 # group.1 re1.factor1 0.006082 0.07799 # Residual 1.021271 1.01058 # Number of obs: 96, groups: group, 12
m4now is similar to
m2, two variances only. However,
m4has the same parameter estimates as
m3for the variances.
m5 <- mixed(y ~ factor + (0 + factor|group), d, method = "LRT") summary(m5) # Random effects: # Groups Name Variance Std.Dev. Corr # group factor1 0.04164 0.2041 # factor2 0.06435 0.2537 0.79 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group, 12
m5is identical to
m1(in terms of the random effects). This is maybe somewhat surprising because it uses
m6 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, check_contrasts = FALSE, method = "LRT") summary(m6) # Random effects: # Groups Name Variance Std.Dev. # group (Intercept) 0.04107 0.2027 # group.1 re1.factor2 0.02328 0.1526 # Residual 1.02156 1.0107 # Number of obs: 96, groups: group, 12
m6uses the default contrasts again (
contr.treatment), and also estimates two variances.
Now let us take a look at the overall model fit. First for all models that estimate only the variances:
anova(m2, m4$full_model, m6$full_model) # Data: d # Models: # m2: y ~ factor + (1 | group) + (1 | group:factor) # m4$full_model: y ~ factor + ((1 | group) + (0 + re1.factor1 | group)) # m6$full_model: y ~ factor + ((1 | group) + (0 + re1.factor2 | group)) # Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) # m2 5 288.77 301.59 -139.39 278.77 # m4$full_model 5 288.77 301.59 -139.39 278.77 0.0000 0 < 2.2e-16 *** # m6$full_model 5 288.75 301.58 -139.38 278.75 0.0165 0 < 2.2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Somewhat surprisingly, these models all provide essentially the same account (if we are graceful with
m6). The same holds if we look at all models that include the correlation. They are all the same as well.
anova(m1, m3$full_model, m5$full_model) # Data: d # Models: # m1: y ~ factor + (0 + factor | group) # m3$full_model: y ~ factor + (factor | group) # m5$full_model: y ~ factor + (0 + factor | group) # Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) # m1 6 290.75 306.14 -139.38 278.75 # m3$full_model 6 290.75 306.14 -139.38 278.75 0 0 1 # m5$full_model 6 290.75 306.14 -139.38 278.75 0 0 <2e-16 ***
In general it should holds that all models without the correlation provide the same account and all models with correlation provide the same account (as long as
REML = FALSEfor both cases), independent of their exact parameterization (i.e.,
set_sum_contrastsplays no role here; this is only relevant for interactions of fixed effects). I am not sure what is happening with
m6here, but let us ignore this for now. I think from this it follows that
m2are equivalent, if the correlation is 0. I am not sure if this is implied by Doug Bates quote, but is what I take from it.
Independent of the parameterization, models with two variances allow each individual level of group to have a random intercept and each level of group to have their own idiosyncratic effect of factor. And these effects are uncorrelated. If we now add the correlations, we get a slightly better fit because they do not appear to be uncorrelated.
Interestingly, this also holds if we compare
m6(without correlation) with the equivalently parameterized model
deviance(m6$full_model) - deviance(m3$full_model) #  3.211216e-05
Note that only one random effect also provides a worse account.
m7 <- lmer(y ~ factor + (1|group:factor), d, REML = FALSE) deviance(m7) #  278.9834
October 12, 2017 at 15:55 UTC #140
Thanks for this detailed answer!
You state that the fits are the same as long as one uses
REML = FALSE. Why would the results be different if one used
REML = TRUE?
October 12, 2017 at 16:38 UTC #141
For models estimated with maximum likelihood estimation the parameterization does in general not matter (this obviously breaks down in certain cases as seen for
m6because of differences in hierarchical shrinkage, but is approximately true).
In contrast, for models estimated with restricted maximum likelihood estimation (i.e.,
REML = TRUE) the random effect estimates (i.e., the so-called conditional modes which are not actual parameters here) are obtained independently from the fixed effects estimates. If I recall correctly, the random effects are estimated first and then the fixed effects. Here the differences in hierarchical shrinkage play a substantively larger role as they are estimated independently and we basically always see differences due to parameterization. Note that in the grand scheme of things these differences are still comparatively small.
The individual deviations or offsets estimated for each random effect parameter are estimated to be normally distributed around the fixed effect parameter, given the parameterization of the model. For example, for a model with treatment contrasts the offsets for the intercept (i.e., the random intercept) are distributed around the first factor level. The offsets for the random slopes are distributed around the deviation from the first factor level. In contrast, for a model with
contr.sumparameterization the random intercepts are distributed around the grand mean and the random slopes are distributed around the differences from the grand mean.
These differences in parameterization can lead to different results if shrinkage is taken into account. For example, the estimate for a participant with an extreme effect for the first factor level would receive comparatively large shrinkage for the random intercept with treatment contrasts. This would then affect all other random effect estimates as well (as the intercept is always included). With a different parameterization the shrinkage on the random intercept would be lower and the other factor levels thus less influenced by the large outlier for one case.
October 13, 2017 at 17:38 UTC #143
Thank you again for explaining the issue in a very comprehensible way!
You must be logged in to reply to this topic.