Compute effect sizes for mixed() objects

Tagged: effect size, lmer, mixed

This topic has 11 replies, 6 voices, and was last updated 4 years, 2 months ago by henrik.

Viewing 4 reply threads

Author

Posts
- July 24, 2018 at 08:52 GMT+0000 #293
  
  blazko m. (b1azk0)
  Participant
  
  Hello,
  
  with the help of the discussion we had here: Finding the optimal structure[…] I finished got my article reviewed – so a big thanks for hints.
  
  Now, to satisfy one of the reviewers I was asked to add partial eta squared effect sizes to each of the F/t tests reported in the paper.
  My question here is this: is there any automatic / or semi auto method for computing etas for anova(mixed()) objects as well as for pairs(emmeans(m0, ~A*B|C), interaction=TRUE) simple effects or contrasts?
  
  I couldn’t find any package computing effects sizes for lmerMod objects but as afex::aov can do it pretty easy I thought I ask here.
  
  Any help with this issue would be great.
  
  All the best
- July 25, 2018 at 16:01 GMT+0000 #294
  
  blazko m. (b1azk0)
  Participant
  
  In case someone in here knows an answer, I posted this question on CrossValidated
  
  I’m actively monitoring CV as well as this topic.
  
  All the bests
- July 28, 2018 at 15:07 GMT+0000 #295
  
  henrik
  Keymaster
  
  Unfortunately this is currently not possible. The problem is that the effect on the response scale need to be normalized by some estimate of variability (e.g., standard deviation). And it is not really clear which estimate to take here in the case of a mixed model, as there are usually several. This is also one of the reasons why there is not easy way to calculate R^2 in LMMs: http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#how-do-i-compute-a-coefficient-of-determination-r2-or-an-analogue-for-glmms
  
  I believe that most of these problems are also discussed in a recent Psych Methods paper which can be found here:
  Rights, J. D., & Sterba, S. K. (2018). Quantifying explained variance in multilevel models: An integrative framework for defining R-squared measures. Psychological Methods. Advance online publication. http://dx.doi.org/10.1037/met0000184
  
  The fact that calculating a global measure of model fit (such as R2) is already riddled with complications and that no simple single number can be found, should be a hint that doing so for a subset of the model parameters (i.e., main-effects or interactions) is even more difficult. Given this, I would not recommend to try finding a measure of standardized effect sizes for mixed models.
  
  It is also important to note that APA in fact recommends unstandardized compared to standardized effect sizes. This is even listed in the first paragraph on effect sizes on wikipedia: https://en.wikipedia.org/wiki/Effect_size
  
  I believe that a similar message of reporting unstandardized effect sizes is being conveyed in a different recent Psych Methods paper:
  Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23(2), 208-225. http://dx.doi.org/10.1037/met0000126
  
  I know that you still need to somehow handle the reviewer. My first suggestion is to report unstandardized effect sizes and cite the corresponding APA recommendation (we did this e.g., here, Table 2). Alternatively, you could try to follow some of the recommendations in the Rights and Sterba paper. Finally, if this also does not help you, you might tell the reviewer something like:
  
  Unfortunately, due to the way that variance is partitioned in linear mixed models (e.g., Rights & Sterba, in press, Psych Methods), there does not exist an agreed upon way to calculate standard effect sizes for individual model terms such as main effects or interactions. We nevertheless decided to primarily employ mixed models in our analysis, because mixed models are vastly superior in controlling for Type I errors than alternative approaches and consequently results from mixed models are more likely to generalize to new observations (e.g., Barr, Levy, Scheepers, & Tily, 2013; Judd, Westfall, & Kenny, 2012). Whenever possible, we report unstandardized effect sizes which is in line with general recommendation of how to report effect sizes (e.g., Pek & Flora, 2018).
  
  References:
  Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001
  Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54–69. https://doi.org/10.1037/a0028347
  Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23, 208–225. https://doi.org/10.1037/met0000126
  Rights, J. D., & Sterba, S. K. (in press). Quantifying explained variance in multilevel models: An integrative framework for defining R-squared measures. Psychological Methods. https://doi.org/10.1037/met0000184
  - August 6, 2018 at 15:54 GMT+0000 #300
    
    Xinming Xu
    Participant
    
    Then how to perform power analysis as also required by the journals, if standard effect size is not available (e.g., in PANGEA)? Thanks.
    - August 9, 2018 at 08:50 GMT+0000 #306
      
      henrik
      Keymaster
      
      Good question and honestly, I am not sure. Pangea seems to require some d measure of standardized effect size. I think you could use one of two approaches:
      1. Use some reasonable default value (e.g., small effect size) and explain that this is something of a lower bound of power because you do not expect a smaller effect, but likely larger.
      2. Alternatively, simply standardize the observed mean difference from a previous study by a reasonable measure of standard deviation for that specific difference. In the mixed model case maybe the by-participant random slope standard deviation. I guess it really depends on the specific case on how to do it. But if one makes a reasonable argument why this is okay for the sake of power analysis, then that should be okay.
      
      As should be maybe clear from this paragraph. I do not often use power-analysis myself. For such highly parameterized models like mixed-models they require so many assumptions that it is really unclear what the value of them is. If possible I would avoid them and make other arguments for why I decided to collect a specific number of participants (e.g., prior samples sizes, money or time restrictions).
      - August 9, 2018 at 11:31 GMT+0000 #307
        
        Xinming Xu
        Participant
        
        Thanks for your suggestions. I might just use some default values then.
- July 28, 2018 at 15:15 GMT+0000 #296
  
  blazko m. (b1azk0)
  Participant
  
  Henrik, thank you very much!
  This is a really insightful answer and it clears a lot to me.
  Great help.
  - October 26, 2018 at 16:02 GMT+0000 #325
    
    Pablo Bernabeu
    Participant
    
    +1
    
    Core issues here with a reality check to them.
- May 29, 2019 at 13:14 GMT+0000 #368
  
  LIU Lei
  Participant
  
  Then, is there any accepted methods or packages to compute unstandardized effect sizes ? I found most discussion about LMM effect sizes relative to R2.
  - May 29, 2019 at 13:44 GMT+0000 #369
    
    henrik
    Keymaster
    
    The idea of unstandardised effects is to simply report the effect on the response scale. For example, if the dependent variable is response time in seconds you report the effect in time (e.g., “the difference between condition was 0.1 seconds” or “the estimated slope was 1.5 seconds”).
    - May 14, 2020 at 12:17 GMT+0000 #409
      
      Dom42
      Participant
      
      A follow-up question in terms of the report of unstandardized effect sizes:
      If I fit a model using mixed, no bs are reported for the different fixed effects (which makes sense, as the factor levels might not be equally spaced and have no numerical representation at all).
      However, how can I then obtain reliable unstandardized effect sizes? Here you mention, that the bs from the summary of the full model are essentially not interpretable for k>2 factor levels.
      Would it be then be more appropriate to report the estimate produced by a post-hoc contrast analysis using emmeans? Can I then aggregate over the levels of the other factors present in the model, resulting in something that is comparable to a b? Or is this estimate generally only interpretable as a difference in the means, estimated by emmeans?
      - May 15, 2020 at 17:58 GMT+0000 #410
        
        henrik
        Keymaster
        
        The best is always to report specific contrasts from emmeans (i.e., the estimate from the contrast). They particularly answer your research question and can easily be interpreted.
        
        I do not understand really understand your further suggestions. There is no such thing as one b in the case of more than two levels. Consider for example the case of the Machines data with three levels.
        
        library("afex") library("emmeans") data("Machines", package = "MEMSS") emm_options(lmer.df = "asymptotic") # simple model with random-slopes for repeated-measures factor m1 <- mixed(score ~ Machine + (Machine|Worker), data=Machines) em1 <- emmeans(m1, "Machine") em1 # Machine emmean SE df lower.CL upper.CL # A 52.4 1.68 5 48.0 56.7 # B 60.3 3.53 5 51.3 69.4 # C 66.3 1.81 5 61.6 70.9 # # Degrees-of-freedom method: kenward-roger # Confidence level used: 0.95
        
        The parameter that is actually estimated from the model is the difference from the intercept (i.e., unweighted grand mean) as the following code shows:
        
        round(summary(m1)$coefficients, 3) # Estimate Std. Error df t value Pr(>|t|) # (Intercept) 59.650 2.145 4.999 27.811 0.000 # Machine1 -7.294 1.079 4.999 -6.760 0.001 # Machine2 0.672 1.539 4.999 0.437 0.681 summary(em1)$emmean - mean(summary(em1)$emmean) # [1] -7.2944444 0.6722222 6.6222222
        
        We see that the parameters (the two bs), Machine1 and Machine2 are the first two differences. The third difference is the negative sum of the two. Thus, the average of the three differences is:
        
        mean(summary(em1)$emmean - mean(summary(em1)$emmean)) # [1] 0
        
        So try to figure out which contrasts are of interest to you or answer your research question and report those.
Author

Posts