Home forums Mixed Models Random effect equivalence

Viewing 4 reply threads
  • Author
    Posts
    • #133
      statmerkur
      Participant

      Douglas Bates states (slide 91 in this presentation) that the following models are equivalent in case of compound symmetry:

      m1 <- lmer(y ~ factor + (0 + factor|group))
      m2 <- lmer(y ~ factor + (1|group) + (1|group:factor))

      If one assumes that group = subjects (which is quite often the case in factorial experiments) this reductions looks (in my eyes) very similar to the reduction from m3 to m4 in Henrik Singmann’s post. As m4 corresponds to the RM-ANOVA model (which implicitly assumes compound symmtry/sphericity) I assume that the equivalence of m1 and m2 given compound symmetry is the same as the equivalence of m3 and m4 given compound symmetry:

      m3 <- mixed(Recall~Task*Valence + (Valence+Task|Subject), df)
      m4 <- mixed(Recall~Task*Valence + (1|Subject) + (1|Task:Subject) + (1|Valence:Subject), df)

      1. Does this equivalence hold true if I use set_sum_contrasts()?
      2. Can anyone explain why m1 reduces to m2 in case of compound symmetry?

      Note that I asked the 2nd question here, but there is still no answer.

    • #134
      henrik
      Keymaster

      The slides are a little bit old, so I am not sure if what Doug Bates said in 2008 still holds in the current lmer version. However, it is nevertheless worth while to think about what each model actually estimates. Let us do this with some random data generated based on a stats.stackexchange question from some time ago.

      We first create some data.

      require(afex)
      
      set.seed(666)
      d <- data.frame(
          y = rnorm(96),
          group = factor(rep(1:12, 4)),
          factor = factor(rep(1:2, each=24)))
      

      Then we fit a bunch of different models. Note that we set REML = FALSE or equivalently method = "LRT" for mixed. That makes the interpretation somewhat easier.

      m1 <- lmer(y ~ factor + (0 + factor|group), d, REML = FALSE)
      summary(m1)
      # Random effects:
      #  Groups   Name    Variance Std.Dev. Corr
      #  group    factor1 0.04164  0.2041       
      #           factor2 0.06435  0.2537   0.79
      #  Residual         1.02127  1.0106       
      # Number of obs: 96, groups:  group, 12

      Model one estimates two random-effects variances (one for each level of factor) and one correlation.

      m2 <- lmer(y ~ factor + (1|group) + (1|group:factor), d, REML = FALSE)
      summary(m2)
      # Random effects:
      #  Groups       Name        Variance Std.Dev.
      #  group:factor (Intercept) 0.01216  0.1103  
      #  group        (Intercept) 0.04083  0.2021  
      #  Residual                 1.02127  1.0106  
      # Number of obs: 96, groups:  group:factor, 24; group, 12

      In contrast, model 2 only estimates two random effects variances. However, the residual variance seems to stay the same

      We can extend this and look at a few more models.

      m3 <- mixed(y ~ factor + (factor|group), d, method = "LRT")
      summary(m3)
      # Random effects:
      #  Groups   Name        Variance Std.Dev. Corr 
      #  group    (Intercept) 0.046914 0.21660       
      #           factor1     0.006082 0.07799  -0.34
      #  Residual             1.021271 1.01058       
      # Number of obs: 96, groups:  group, 12

      m3 uses contr.sum (because that is the default with mixed) but is otherwise similar to m1: 2 variances and 1 correlation. The residual variance is still the same.

      m4 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, method = "LRT")
      summary(m4)
      # Random effects:
      #  Groups   Name        Variance Std.Dev.
      #  group    (Intercept) 0.046914 0.21660 
      #  group.1  re1.factor1 0.006082 0.07799 
      #  Residual             1.021271 1.01058 
      # Number of obs: 96, groups:  group, 12

      m4 now is similar to m2, two variances only. However, m4 has the same parameter estimates as m3 for the variances.

      m5 <- mixed(y ~ factor + (0 + factor|group), d, method = "LRT")
      summary(m5)
      # Random effects:
      #  Groups   Name    Variance Std.Dev. Corr
      #  group    factor1 0.04164  0.2041       
      #           factor2 0.06435  0.2537   0.79
      #  Residual         1.02127  1.0106       
      # Number of obs: 96, groups:  group, 12

      m5 is identical to m1 (in terms of the random effects). This is maybe somewhat surprising because it uses contr.sum instead of contr.treatment.

      m6 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, check_contrasts = FALSE, method = "LRT")
      summary(m6)
      # Random effects:
      #  Groups   Name        Variance Std.Dev.
      #  group    (Intercept) 0.04107  0.2027  
      #  group.1  re1.factor2 0.02328  0.1526  
      #  Residual             1.02156  1.0107  
      # Number of obs: 96, groups:  group, 12

      Finally, m6 uses the default contrasts again (contr.treatment), and also estimates two variances.

      Now let us take a look at the overall model fit. First for all models that estimate only the variances:

      anova(m2, m4$full_model, m6$full_model)
      # Data: d
      # Models:
      # m2: y ~ factor + (1 | group) + (1 | group:factor)
      # m4$full_model: y ~ factor + ((1 | group) + (0 + re1.factor1 | group))
      # m6$full_model: y ~ factor + ((1 | group) + (0 + re1.factor2 | group))
      #               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
      # m2             5 288.77 301.59 -139.39   278.77                             
      # m4$full_model  5 288.77 301.59 -139.39   278.77 0.0000      0  < 2.2e-16 ***
      # m6$full_model  5 288.75 301.58 -139.38   278.75 0.0165      0  < 2.2e-16 ***
      # ---
      # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

      Somewhat surprisingly, these models all provide essentially the same account (if we are graceful with m6). The same holds if we look at all models that include the correlation. They are all the same as well.

      anova(m1, m3$full_model, m5$full_model)
      # Data: d
      # Models:
      # m1: y ~ factor + (0 + factor | group)
      # m3$full_model: y ~ factor + (factor | group)
      # m5$full_model: y ~ factor + (0 + factor | group)
      #               Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)    
      # m1             6 290.75 306.14 -139.38   278.75                            
      # m3$full_model  6 290.75 306.14 -139.38   278.75     0      0          1    
      # m5$full_model  6 290.75 306.14 -139.38   278.75     0      0     <2e-16 ***

      In general it should holds that all models without the correlation provide the same account and all models with correlation provide the same account (as long as REML = FALSE for both cases), independent of their exact parameterization (i.e., set_sum_contrasts plays no role here; this is only relevant for interactions of fixed effects). I am not sure what is happening with m6 here, but let us ignore this for now. I think from this it follows that m1 and m2 are equivalent, if the correlation is 0. I am not sure if this is implied by Doug Bates quote, but is what I take from it.

      Independent of the parameterization, models with two variances allow each individual level of group to have a random intercept and each level of group to have their own idiosyncratic effect of factor. And these effects are uncorrelated. If we now add the correlations, we get a slightly better fit because they do not appear to be uncorrelated.

      Interestingly, this also holds if we compare m6 (without correlation) with the equivalently parameterized model m3 (with correlation):

      deviance(m6$full_model) - deviance(m3$full_model)
      # [1] 3.211216e-05

      Note that only one random effect also provides a worse account.

      m7 <- lmer(y ~ factor + (1|group:factor), d, REML = FALSE)
      deviance(m7)
      # [1] 278.9834
    • #140
      statmerkur
      Participant

      Thanks for this detailed answer!
      You state that the fits are the same as long as one uses REML = FALSE. Why would the results be different if one used REML = TRUE?

    • #141
      henrik
      Keymaster

      For models estimated with maximum likelihood estimation the parameterization does in general not matter (this obviously breaks down in certain cases as seen for m6 because of differences in hierarchical shrinkage, but is approximately true).

      In contrast, for models estimated with restricted maximum likelihood estimation (i.e., REML = TRUE) the random effect estimates (i.e., the so-called conditional modes which are not actual parameters here) are obtained independently from the fixed effects estimates. If I recall correctly, the random effects are estimated first and then the fixed effects. Here the differences in hierarchical shrinkage play a substantively larger role as they are estimated independently and we basically always see differences due to parameterization. Note that in the grand scheme of things these differences are still comparatively small.

      The individual deviations or offsets estimated for each random effect parameter are estimated to be normally distributed around the fixed effect parameter, given the parameterization of the model. For example, for a model with treatment contrasts the offsets for the intercept (i.e., the random intercept) are distributed around the first factor level. The offsets for the random slopes are distributed around the deviation from the first factor level. In contrast, for a model with contr.sum parameterization the random intercepts are distributed around the grand mean and the random slopes are distributed around the differences from the grand mean.

      These differences in parameterization can lead to different results if shrinkage is taken into account. For example, the estimate for a participant with an extreme effect for the first factor level would receive comparatively large shrinkage for the random intercept with treatment contrasts. This would then affect all other random effect estimates as well (as the intercept is always included). With a different parameterization the shrinkage on the random intercept would be lower and the other factor levels thus less influenced by the large outlier for one case.

    • #143
      statmerkur
      Participant

      Thank you again for explaining the issue in a very comprehensible way!

Viewing 4 reply threads
  • You must be logged in to reply to this topic.