Home › forums › Mixed Models › Random effect equivalence
This topic contains 4 replies, has 2 voices, and was last updated by statmerkur 1 year, 8 months ago.

AuthorPosts

September 26, 2017 at 14:45 GMT+0000 #133
Douglas Bates states (slide 91 in this presentation) that the following models are equivalent in case of compound symmetry:
m1 < lmer(y ~ factor + (0 + factorgroup)) m2 < lmer(y ~ factor + (1group) + (1group:factor))
If one assumes that group = subjects (which is quite often the case in factorial experiments) this reductions looks (in my eyes) very similar to the reduction from
m3
tom4
in Henrik Singmann’s post. As m4 corresponds to the RMANOVA model (which implicitly assumes compound symmtry/sphericity) I assume that the equivalence ofm1
andm2
given compound symmetry is the same as the equivalence ofm3
andm4
given compound symmetry:m3 < mixed(Recall~Task*Valence + (Valence+TaskSubject), df) m4 < mixed(Recall~Task*Valence + (1Subject) + (1Task:Subject) + (1Valence:Subject), df)
1. Does this equivalence hold true if I use
set_sum_contrasts()
?
2. Can anyone explain whym1
reduces tom2
in case of compound symmetry?Note that I asked the 2nd question here, but there is still no answer.

October 2, 2017 at 16:23 GMT+0000 #134
The slides are a little bit old, so I am not sure if what Doug Bates said in 2008 still holds in the current
lmer
version. However, it is nevertheless worth while to think about what each model actually estimates. Let us do this with some random data generated based on astats.stackexchange
question from some time ago.We first create some data.
require(afex) set.seed(666) d < data.frame( y = rnorm(96), group = factor(rep(1:12, 4)), factor = factor(rep(1:2, each=24)))
Then we fit a bunch of different models. Note that we set
REML = FALSE
or equivalentlymethod = "LRT"
formixed
. That makes the interpretation somewhat easier.m1 < lmer(y ~ factor + (0 + factorgroup), d, REML = FALSE) summary(m1) # Random effects: # Groups Name Variance Std.Dev. Corr # group factor1 0.04164 0.2041 # factor2 0.06435 0.2537 0.79 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group, 12
Model one estimates two randomeffects variances (one for each level of
factor
) and one correlation.m2 < lmer(y ~ factor + (1group) + (1group:factor), d, REML = FALSE) summary(m2) # Random effects: # Groups Name Variance Std.Dev. # group:factor (Intercept) 0.01216 0.1103 # group (Intercept) 0.04083 0.2021 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group:factor, 24; group, 12
In contrast, model 2 only estimates two random effects variances. However, the residual variance seems to stay the same
We can extend this and look at a few more models.
m3 < mixed(y ~ factor + (factorgroup), d, method = "LRT") summary(m3) # Random effects: # Groups Name Variance Std.Dev. Corr # group (Intercept) 0.046914 0.21660 # factor1 0.006082 0.07799 0.34 # Residual 1.021271 1.01058 # Number of obs: 96, groups: group, 12
m3
usescontr.sum
(because that is the default withmixed
) but is otherwise similar tom1
: 2 variances and 1 correlation. The residual variance is still the same.m4 < mixed(y ~ factor + (factorgroup), d, expand_re = TRUE, method = "LRT") summary(m4) # Random effects: # Groups Name Variance Std.Dev. # group (Intercept) 0.046914 0.21660 # group.1 re1.factor1 0.006082 0.07799 # Residual 1.021271 1.01058 # Number of obs: 96, groups: group, 12
m4
now is similar tom2
, two variances only. However,m4
has the same parameter estimates asm3
for the variances.m5 < mixed(y ~ factor + (0 + factorgroup), d, method = "LRT") summary(m5) # Random effects: # Groups Name Variance Std.Dev. Corr # group factor1 0.04164 0.2041 # factor2 0.06435 0.2537 0.79 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group, 12
m5
is identical tom1
(in terms of the random effects). This is maybe somewhat surprising because it usescontr.sum
instead ofcontr.treatment
.m6 < mixed(y ~ factor + (factorgroup), d, expand_re = TRUE, check_contrasts = FALSE, method = "LRT") summary(m6) # Random effects: # Groups Name Variance Std.Dev. # group (Intercept) 0.04107 0.2027 # group.1 re1.factor2 0.02328 0.1526 # Residual 1.02156 1.0107 # Number of obs: 96, groups: group, 12
Finally,
m6
uses the default contrasts again (contr.treatment
), and also estimates two variances.Now let us take a look at the overall model fit. First for all models that estimate only the variances:
anova(m2, m4$full_model, m6$full_model) # Data: d # Models: # m2: y ~ factor + (1  group) + (1  group:factor) # m4$full_model: y ~ factor + ((1  group) + (0 + re1.factor1  group)) # m6$full_model: y ~ factor + ((1  group) + (0 + re1.factor2  group)) # Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) # m2 5 288.77 301.59 139.39 278.77 # m4$full_model 5 288.77 301.59 139.39 278.77 0.0000 0 < 2.2e16 *** # m6$full_model 5 288.75 301.58 139.38 278.75 0.0165 0 < 2.2e16 *** #  # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Somewhat surprisingly, these models all provide essentially the same account (if we are graceful with
m6
). The same holds if we look at all models that include the correlation. They are all the same as well.anova(m1, m3$full_model, m5$full_model) # Data: d # Models: # m1: y ~ factor + (0 + factor  group) # m3$full_model: y ~ factor + (factor  group) # m5$full_model: y ~ factor + (0 + factor  group) # Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) # m1 6 290.75 306.14 139.38 278.75 # m3$full_model 6 290.75 306.14 139.38 278.75 0 0 1 # m5$full_model 6 290.75 306.14 139.38 278.75 0 0 <2e16 ***
In general it should holds that all models without the correlation provide the same account and all models with correlation provide the same account (as long as
REML = FALSE
for both cases), independent of their exact parameterization (i.e.,set_sum_contrasts
plays no role here; this is only relevant for interactions of fixed effects). I am not sure what is happening withm6
here, but let us ignore this for now. I think from this it follows thatm1
andm2
are equivalent, if the correlation is 0. I am not sure if this is implied by Doug Bates quote, but is what I take from it.Independent of the parameterization, models with two variances allow each individual level of group to have a random intercept and each level of group to have their own idiosyncratic effect of factor. And these effects are uncorrelated. If we now add the correlations, we get a slightly better fit because they do not appear to be uncorrelated.
Interestingly, this also holds if we compare
m6
(without correlation) with the equivalently parameterized modelm3
(with correlation):deviance(m6$full_model)  deviance(m3$full_model) # [1] 3.211216e05
Note that only one random effect also provides a worse account.
m7 < lmer(y ~ factor + (1group:factor), d, REML = FALSE) deviance(m7) # [1] 278.9834

October 12, 2017 at 15:55 GMT+0000 #140
Thanks for this detailed answer!
You state that the fits are the same as long as one usesREML = FALSE
. Why would the results be different if one usedREML = TRUE
? 
October 12, 2017 at 16:38 GMT+0000 #141
For models estimated with maximum likelihood estimation the parameterization does in general not matter (this obviously breaks down in certain cases as seen for
m6
because of differences in hierarchical shrinkage, but is approximately true).In contrast, for models estimated with restricted maximum likelihood estimation (i.e.,
REML = TRUE
) the random effect estimates (i.e., the socalled conditional modes which are not actual parameters here) are obtained independently from the fixed effects estimates. If I recall correctly, the random effects are estimated first and then the fixed effects. Here the differences in hierarchical shrinkage play a substantively larger role as they are estimated independently and we basically always see differences due to parameterization. Note that in the grand scheme of things these differences are still comparatively small.The individual deviations or offsets estimated for each random effect parameter are estimated to be normally distributed around the fixed effect parameter, given the parameterization of the model. For example, for a model with treatment contrasts the offsets for the intercept (i.e., the random intercept) are distributed around the first factor level. The offsets for the random slopes are distributed around the deviation from the first factor level. In contrast, for a model with
contr.sum
parameterization the random intercepts are distributed around the grand mean and the random slopes are distributed around the differences from the grand mean.These differences in parameterization can lead to different results if shrinkage is taken into account. For example, the estimate for a participant with an extreme effect for the first factor level would receive comparatively large shrinkage for the random intercept with treatment contrasts. This would then affect all other random effect estimates as well (as the intercept is always included). With a different parameterization the shrinkage on the random intercept would be lower and the other factor levels thus less influenced by the large outlier for one case.

October 13, 2017 at 17:38 GMT+0000 #143
Thank you again for explaining the issue in a very comprehensible way!

AuthorPosts
You must be logged in to reply to this topic.