Home forums Mixed Models Random effect equivalence

• Author
Posts
• #133

Douglas Bates states (slide 91 in this presentation) that the following models are equivalent in case of compound symmetry:

``````m1 <- lmer(y ~ factor + (0 + factor|group))
m2 <- lmer(y ~ factor + (1|group) + (1|group:factor))``````

If one assumes that group = subjects (which is quite often the case in factorial experiments) this reductions looks (in my eyes) very similar to the reduction from `m3` to `m4` in Henrik Singmann’s post. As m4 corresponds to the RM-ANOVA model (which implicitly assumes compound symmtry/sphericity) I assume that the equivalence of `m1` and `m2` given compound symmetry is the same as the equivalence of `m3` and `m4` given compound symmetry:

``````m3 <- mixed(Recall~Task*Valence + (Valence+Task|Subject), df)

1. Does this equivalence hold true if I use `set_sum_contrasts()`?
2. Can anyone explain why `m1` reduces to `m2` in case of compound symmetry?

Note that I asked the 2nd question here, but there is still no answer.

• #134

The slides are a little bit old, so I am not sure if what Doug Bates said in 2008 still holds in the current `lmer` version. However, it is nevertheless worth while to think about what each model actually estimates. Let us do this with some random data generated based on a `stats.stackexchange` question from some time ago.

We first create some data.

``````require(afex)

set.seed(666)
d <- data.frame(
y = rnorm(96),
group = factor(rep(1:12, 4)),
factor = factor(rep(1:2, each=24)))
``````

Then we fit a bunch of different models. Note that we set `REML = FALSE` or equivalently `method = "LRT"` for `mixed`. That makes the interpretation somewhat easier.

``````m1 <- lmer(y ~ factor + (0 + factor|group), d, REML = FALSE)
summary(m1)
# Random effects:
#  Groups   Name    Variance Std.Dev. Corr
#  group    factor1 0.04164  0.2041
#           factor2 0.06435  0.2537   0.79
#  Residual         1.02127  1.0106
# Number of obs: 96, groups:  group, 12``````

Model one estimates two random-effects variances (one for each level of `factor`) and one correlation.

``````m2 <- lmer(y ~ factor + (1|group) + (1|group:factor), d, REML = FALSE)
summary(m2)
# Random effects:
#  Groups       Name        Variance Std.Dev.
#  group:factor (Intercept) 0.01216  0.1103
#  group        (Intercept) 0.04083  0.2021
#  Residual                 1.02127  1.0106
# Number of obs: 96, groups:  group:factor, 24; group, 12``````

In contrast, model 2 only estimates two random effects variances. However, the residual variance seems to stay the same

We can extend this and look at a few more models.

``````m3 <- mixed(y ~ factor + (factor|group), d, method = "LRT")
summary(m3)
# Random effects:
#  Groups   Name        Variance Std.Dev. Corr
#  group    (Intercept) 0.046914 0.21660
#           factor1     0.006082 0.07799  -0.34
#  Residual             1.021271 1.01058
# Number of obs: 96, groups:  group, 12``````

`m3` uses `contr.sum` (because that is the default with `mixed`) but is otherwise similar to `m1`: 2 variances and 1 correlation. The residual variance is still the same.

``````m4 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, method = "LRT")
summary(m4)
# Random effects:
#  Groups   Name        Variance Std.Dev.
#  group    (Intercept) 0.046914 0.21660
#  group.1  re1.factor1 0.006082 0.07799
#  Residual             1.021271 1.01058
# Number of obs: 96, groups:  group, 12``````

`m4` now is similar to `m2`, two variances only. However, `m4` has the same parameter estimates as `m3` for the variances.

``````m5 <- mixed(y ~ factor + (0 + factor|group), d, method = "LRT")
summary(m5)
# Random effects:
#  Groups   Name    Variance Std.Dev. Corr
#  group    factor1 0.04164  0.2041
#           factor2 0.06435  0.2537   0.79
#  Residual         1.02127  1.0106
# Number of obs: 96, groups:  group, 12``````

`m5` is identical to `m1` (in terms of the random effects). This is maybe somewhat surprising because it uses `contr.sum` instead of `contr.treatment`.

``````m6 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, check_contrasts = FALSE, method = "LRT")
summary(m6)
# Random effects:
#  Groups   Name        Variance Std.Dev.
#  group    (Intercept) 0.04107  0.2027
#  group.1  re1.factor2 0.02328  0.1526
#  Residual             1.02156  1.0107
# Number of obs: 96, groups:  group, 12``````

Finally, `m6` uses the default contrasts again (`contr.treatment`), and also estimates two variances.

Now let us take a look at the overall model fit. First for all models that estimate only the variances:

``````anova(m2, m4\$full_model, m6\$full_model)
# Data: d
# Models:
# m2: y ~ factor + (1 | group) + (1 | group:factor)
# m4\$full_model: y ~ factor + ((1 | group) + (0 + re1.factor1 | group))
# m6\$full_model: y ~ factor + ((1 | group) + (0 + re1.factor2 | group))
#               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)
# m2             5 288.77 301.59 -139.39   278.77
# m4\$full_model  5 288.77 301.59 -139.39   278.77 0.0000      0  < 2.2e-16 ***
# m6\$full_model  5 288.75 301.58 -139.38   278.75 0.0165      0  < 2.2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1``````

Somewhat surprisingly, these models all provide essentially the same account (if we are graceful with `m6`). The same holds if we look at all models that include the correlation. They are all the same as well.

``````anova(m1, m3\$full_model, m5\$full_model)
# Data: d
# Models:
# m1: y ~ factor + (0 + factor | group)
# m3\$full_model: y ~ factor + (factor | group)
# m5\$full_model: y ~ factor + (0 + factor | group)
#               Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)
# m1             6 290.75 306.14 -139.38   278.75
# m3\$full_model  6 290.75 306.14 -139.38   278.75     0      0          1
# m5\$full_model  6 290.75 306.14 -139.38   278.75     0      0     <2e-16 ***``````

In general it should holds that all models without the correlation provide the same account and all models with correlation provide the same account (as long as `REML = FALSE` for both cases), independent of their exact parameterization (i.e., `set_sum_contrasts` plays no role here; this is only relevant for interactions of fixed effects). I am not sure what is happening with `m6` here, but let us ignore this for now. I think from this it follows that `m1` and `m2` are equivalent, if the correlation is 0. I am not sure if this is implied by Doug Bates quote, but is what I take from it.

Independent of the parameterization, models with two variances allow each individual level of group to have a random intercept and each level of group to have their own idiosyncratic effect of factor. And these effects are uncorrelated. If we now add the correlations, we get a slightly better fit because they do not appear to be uncorrelated.

Interestingly, this also holds if we compare `m6` (without correlation) with the equivalently parameterized model `m3` (with correlation):

``````deviance(m6\$full_model) - deviance(m3\$full_model)
#  3.211216e-05``````

Note that only one random effect also provides a worse account.

``````m7 <- lmer(y ~ factor + (1|group:factor), d, REML = FALSE)
deviance(m7)
#  278.9834``````
• #140

You state that the fits are the same as long as one uses `REML = FALSE`. Why would the results be different if one used `REML = TRUE`?

• #141

For models estimated with maximum likelihood estimation the parameterization does in general not matter (this obviously breaks down in certain cases as seen for `m6` because of differences in hierarchical shrinkage, but is approximately true).

In contrast, for models estimated with restricted maximum likelihood estimation (i.e., `REML = TRUE`) the random effect estimates (i.e., the so-called conditional modes which are not actual parameters here) are obtained independently from the fixed effects estimates. If I recall correctly, the random effects are estimated first and then the fixed effects. Here the differences in hierarchical shrinkage play a substantively larger role as they are estimated independently and we basically always see differences due to parameterization. Note that in the grand scheme of things these differences are still comparatively small.

The individual deviations or offsets estimated for each random effect parameter are estimated to be normally distributed around the fixed effect parameter, given the parameterization of the model. For example, for a model with treatment contrasts the offsets for the intercept (i.e., the random intercept) are distributed around the first factor level. The offsets for the random slopes are distributed around the deviation from the first factor level. In contrast, for a model with `contr.sum` parameterization the random intercepts are distributed around the grand mean and the random slopes are distributed around the differences from the grand mean.

These differences in parameterization can lead to different results if shrinkage is taken into account. For example, the estimate for a participant with an extreme effect for the first factor level would receive comparatively large shrinkage for the random intercept with treatment contrasts. This would then affect all other random effect estimates as well (as the intercept is always included). With a different parameterization the shrinkage on the random intercept would be lower and the other factor levels thus less influenced by the large outlier for one case.

• #143

Thank you again for explaining the issue in a very comprehensible way!