Forum Replies Created

Viewing 25 posts - 1 through 25 (of 75 total)
  • Author
  • in reply to: Mixed-design experiment with non-binary non-ordinal response variable #382

    If the DV is categorical with five levels, you cannot use such a mixed model. Binomial models only support categorical data with two categories. For more categories you need to use multinomial logistic models. I find these model quite advanced and their results are not easy to interpret.

    You can however split your categories in what is called nested dichotomies and analyse each of these with a binomial model. The theory is described in the books by John Fox.

    Let’s say you only have three categories, A, B, C. Then you first pick one category of interest, such as A. You then make a binary variable with A versus the rest (i.e., B and C). You analyse this with a binomial model. In the next step you discard all observations in which the response is A and only analyse the remaining data with a new binomial variable B versus C. We did exactly this in one paper once:

    in reply to: Mixed-design experiment with non-binary non-ordinal response variable #376

    I am not sure I fully understand the question, but I guess the simple answer is yes. The question of how to map aspects of your design onto the random-effects part of the model (i.e., for which grouping-factors to you want to have random intercepts and for which fixed factors random slopes) is independent of the choice of the family or link function. So you can easily do this for a response variable that is assumed to have a conditional normal distribution.

    For an introduction to this topics see our chapter:

    in reply to: Compute effect sizes for mixed() objects #369

    The idea of unstandardised effects is to simply report the effect on the response scale. For example, if the dependent variable is response time in seconds you report the effect in time (e.g., “the difference between condition was 0.1 seconds” or “the estimated slope was 1.5 seconds”).

    in reply to: Default Sphericity correction method: GG? #364

    The reason I have not implemented the approach you suggest is because I believe it is not statistically fully appropriate. Specifically, the Mauchly test, as any significance test, has a specific power and Type I error rate (the latter of which is controlled by the alpha/significance level). I do not believe it is worthwhile to incorporate these additional error probabilities into an analysis. Instead, I believe that the small loss in power by always using Greenhouse-Geisser is overall a more appropriate strategy and more conservative.

    Consider the case in which the Mauchly test fails to correctly detect a violation because the power is too low in a specific case. If this happens, you get inflated Type I error rates. I consider this a potentially larger drawback than occasionally committing a Type II error in case Greenhouse-Geisser is too strict.

    I believe that in general, significance tests for testing assumptions is not a good idea. For some arguments why see:

    Nevertheless, the current development version of afex now makes it easier to get assumption tests, see:

    in reply to: Non-significant explanatory variables #358

    It does not matter how many levels each of your factor has. What matters if each participants sees both levels of each factor (i.e., it varies within-participants). If this is the case, corresponding random slopes should be included.

    in reply to: What is happening here? #354

    That can happen due to hierarchical shrinkage, which nudges the individual-level effects to follow a normal distribution. If the individual-level effects show a normal distribution around their mean, which is one of the assumptions of the mixed-model framework, this should not have too dramatic effects. Your plot suggests that this assumption is violated here. In any case, it suggests also that the specific pattern across levels of threat is not very strong.

    It is difficult to say more without additional details (at least the corresponding standard errors). But removing random-slopes should only be done mildly. Try removing the correlations among slopes first. Please have a look at this discussion in my chapter:

    in reply to: Non-significant explanatory variables #353

    I forgot to say one more thing. If any of your factors or additional variables varies within-participants, it should be added as a random slope. Random-intercept only models, as the one you fit, seem generally quite dubious. Please see my discussion on this issue at:

    in reply to: Non-significant explanatory variables #352

    I have some answer to your questions, but it makes sense to read something more on variable selection in regression modeling as this is probably the problem your are struggling with and it is not a trivial problem. A good starting point may be ‘Regression Modeling Strategies’ by Frank Harrel (2015):
    Note that this book deals a lot with simple regression models (i.e., not mixed models), but the problem you have holds in the same manner.

    Model selection using AIC and hypothesis testing using significance test do different things. Reasons why they can diverge can be because they use different cut-offs. However, usually a variable that dramatically improves AIC should also be significant in a significance test. Because what this test essentially does is fit a restricted model in which this variable is withheld and then compare the fit using something akin a likelihood-ratio test that compares the fit of two nested models.

    If a variable alone improves AIC but not jointly is probably a case of multicollinearity:
    Your additional predictor variables are probably itself correlated with each other. So the predictive ability by one can then often be covered by other variables that are still correlated with it.

    To answer your question directly: A variable should not really be non-significant but improve the model considerably in terms of AIC. At least not with all the other variables are in there but only the one is withhold.

    I repeat again that variable selection is generally a difficult problem and many complicated procedures (not implemented in afex or lme4) exist to tackle this in a more principled manner.

    in reply to: Including a covariate #351

    Sorry for the slow reply, your post somehow went through the cracks. You need to set factorize = FALSE in the call to use numerical covariates.

    For the example data that would be:

    data(obk.long, package = "afex")
    aov_ez("id", "value", obk.long, between = c("treatment", "gender"), 
            within = c("phase", "hour"), covariate = "age", 
            observed = c("gender", "age"), factorize = FALSE)

    In your case this would be:

    fit.grat2 <-aov_car(grat ~ tgrat+condition*time + Error(PPT/time),
    data=mf_long2, covariate=centgrat, factorize = FALSE)

    I should probably add a corresponding note to the documentation.

    in reply to: Mixed model specification and centering of predictor #343

    Treating something as a random-effects grouping factor (i.e., to the right side of |) leads to shrinkage. Levels of this factor (e.g., specific electrodes) for which the effect differs quite strongly from the mean are shrunk towards the overall mean. If this is an assumption you are willing to make (i.e., that specific electrodes for which the effect diverges relatively strongly from the others is probably a measurement error), then it makes sense to treat it as a random effect.

    The benefit of treating a factor as random-effects grouping factor is that under this assumption (i.e., that the effect across the different electrodes is randomly distributed around the grand mean effect) the overall prediction will be better.

    There are two downsides: (1) if the assumption is false, you miss out on “true” differences. (2) You are unable to easily check for differences between electrodes.

    Having said that, m2 looks somewhat reasonable. However, the same caveat as before regarding the continuous covariates holds. The main effects are tested when the covariates are 0. So 0 needs to be meaningful given the data (and one way to achieve this is by centering them).
    I only wonder why not have the FrB and Distress interaction as random slopes: ... + (FrB * Cluster | PatID)

    in reply to: Trouble with ordered contrasts and lmer_alt #328

    Thanks for the bug report. I have fixed this in the development version on github, which you can get via:

    It might take some time until it gets on CRAN as there are some things I still want to fix before that.

    in reply to: Logistic models using mixed() #326

    I seem to be getting convergence errors when using afex that don’t occur when just using lme4 and glmer(). Does this mean that the results of glmer() shouldn’t be trusted?

    No, this is not a good reason. Note that afex uses glmer. So if the model uses the same parameterization (which can be achieved by running afex::set_sum_contrasts()before fitting withglmer) then the (full) model should be identical. In this case, runningsummaryon both models (the one fitted withafexand the one fitted withglmer`) will reval them being identical.

    The problem is really that Wald tests for generalized models are not particularly trustworthy. In my opinion, LRTs are way better then. However, as said above, if computationally feasible parametric bootstrap is the better choice. But of course, if fitting the model takes very long, then this is not a good option (as parametric bootstrap fits the model several times, preferably 1000 times or more).

    Note that the convergence warnings can be false positives, some more on that: (note that this blog seems somewhat too alarmist for my taste)

    in reply to: Logistic models using mixed() #323

    There are several questions here:

    1. Is afex the best option for me?

    If you want to stay in the frequentist realm, than afex is probably as easy as it gets (but I might of course be biased).

    If you can consider going Bayesian then both packages, rstanarm or brms are pretty good options. However, note that if you use those packages you have to make sure to use the correct coding (e.g., afex::set_sum_contrasts()) when running the model. afex does so automatically, but not the other packages.

    2. Should I use parametric bootstrapping?

    If computationally feasible that would be great. If not, LRTs is your only remaining option.

    3. However, due to the nonlinear nature of most link functions, the interpretations of most model predictions, specifically of lower-order effects in factorial designs, can be quite challenging.

    What we mean here is that due to the non-linear nature of the link function, the lower order effects might not faithfully represent your lower-order effect in the data. So it is worth checking whether or not the lower order effect actually represent pattern that are in the data and not an artifact. So compare the marginal estimates on the response scale with those in the data. If this makes sense, you should be mostly fine.

    in reply to: how to pass 'weights' to mixed() #311

    Yeah, that should probably be better documented. You need to pass the whole vector:

    > mixed(pCorrect ~ touchType + cued + (1 |exptPID), 
          weights=touchCompare$nTotal, family=binomial, data = touchCompare)
    in reply to: Compute effect sizes for mixed() objects #306

    Good question and honestly, I am not sure. Pangea seems to require some d measure of standardized effect size. I think you could use one of two approaches:
    1. Use some reasonable default value (e.g., small effect size) and explain that this is something of a lower bound of power because you do not expect a smaller effect, but likely larger.
    2. Alternatively, simply standardize the observed mean difference from a previous study by a reasonable measure of standard deviation for that specific difference. In the mixed model case maybe the by-participant random slope standard deviation. I guess it really depends on the specific case on how to do it. But if one makes a reasonable argument why this is okay for the sake of power analysis, then that should be okay.

    As should be maybe clear from this paragraph. I do not often use power-analysis myself. For such highly parameterized models like mixed-models they require so many assumptions that it is really unclear what the value of them is. If possible I would avoid them and make other arguments for why I decided to collect a specific number of participants (e.g., prior samples sizes, money or time restrictions).

    in reply to: Afex documentation follow up contrasts. #305

    Thanks for reporting this. Was indeed an error. Now fixed on github and soon on CRAN. See:

    in reply to: Model equation for correlated and uncorrelated random slopes #298

    From ?afex::mixed:

    Expand Random Effects

    expand_re = TRUE allows to expand the random effects structure before passing it to lmer. This allows to disable estimation of correlation among random effects for random effects term containing factors using the || notation which may aid in achieving model convergence (see Bates et al., 2015). This is achieved by first creating a model matrix for each random effects term individually, rename and append the so created columns to the data that will be fitted, replace the actual random effects term with the so created variables (concatenated with +), and then fit the model. The variables are renamed by prepending all variables with rei (where i is the number of the random effects term) and replacing ":" with "_by_".

    Hence, try: mixed(Y ~ A*B*C + (A*B*C || Subj), data, expand_re = TRUE)

    in reply to: Compute effect sizes for mixed() objects #295

    Unfortunately this is currently not possible. The problem is that the effect on the response scale need to be normalized by some estimate of variability (e.g., standard deviation). And it is not really clear which estimate to take here in the case of a mixed model, as there are usually several. This is also one of the reasons why there is not easy way to calculate R^2 in LMMs:

    I believe that most of these problems are also discussed in a recent Psych Methods paper which can be found here:
    Rights, J. D., & Sterba, S. K. (2018). Quantifying explained variance in multilevel models: An integrative framework for defining R-squared measures. Psychological Methods. Advance online publication.

    The fact that calculating a global measure of model fit (such as R2) is already riddled with complications and that no simple single number can be found, should be a hint that doing so for a subset of the model parameters (i.e., main-effects or interactions) is even more difficult. Given this, I would not recommend to try finding a measure of standardized effect sizes for mixed models.

    It is also important to note that APA in fact recommends unstandardized compared to standardized effect sizes. This is even listed in the first paragraph on effect sizes on wikipedia:

    I believe that a similar message of reporting unstandardized effect sizes is being conveyed in a different recent Psych Methods paper:
    Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23(2), 208-225.

    I know that you still need to somehow handle the reviewer. My first suggestion is to report unstandardized effect sizes and cite the corresponding APA recommendation (we did this e.g., here, Table 2). Alternatively, you could try to follow some of the recommendations in the Rights and Sterba paper. Finally, if this also does not help you, you might tell the reviewer something like:

    Unfortunately, due to the way that variance is partitioned in linear mixed models (e.g., Rights & Sterba, in press, Psych Methods), there does not exist an agreed upon way to calculate standard effect sizes for individual model terms such as main effects or interactions. We nevertheless decided to primarily employ mixed models in our analysis, because mixed models are vastly superior in controlling for Type I errors than alternative approaches and consequently results from mixed models are more likely to generalize to new observations (e.g., Barr, Levy, Scheepers, & Tily, 2013; Judd, Westfall, & Kenny, 2012). Whenever possible, we report unstandardized effect sizes which is in line with general recommendation of how to report effect sizes (e.g., Pek & Flora, 2018).

    Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.
    Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54–69.
    Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological Methods, 23, 208–225.
    Rights, J. D., & Sterba, S. K. (in press). Quantifying explained variance in multilevel models: An integrative framework for defining R-squared measures. Psychological Methods.

    in reply to: Mixed model specification and centering of predictor #292

    1. The warning by afex is just there to remind you that this is a non-trivial issue. I do not recommend mean centering as a general solution. If you have thought about a good point of centering (maybe at 30 in your case) then you might ignore the warning. It is just there to force you to think about your problem.

    2. That is the problem with continuous variables. You need to make sure that the story your results tell are not contingent on some arbitrary value you choose for centering at. Maybe it makes sense to explore the possible range of where you can center and report the results throughout (in the sense of a multiverse analysis). You should give the reader the chance to fully understand what the consequences of your more or less arbitrary choices on your results are.

    Finally, you say:

    Is this really appropriate? IST is a person-specific measure like IQ, measured by a pencil and paper test before the EEG session. Therefore, channels do not contribute to random variation of IST (which is really constant within each person).

    Anyway, even if I try to run the extended model as you suggested, I get convergence errors. I guess this means that even if this model was justified by the design, I would have to reduce it to get a stable model (which would be the model I proposed initially).

    Yes, it is of course appropriate. The ist| chan parts allows the effect of ist to differ idiosyncratically per channel. Maybe the channels right above the area responsible for ist more strongly react to the corresponding signal. You really have to think about the effect of ist independent of the effect of id here.

    And if it shows convergence warnings, try to suppress the estimation of correlations via || and expand_re = TRUE.


    Not really. (1|face_id/face_emotion) is just a shorthand for (1|face_id) + (1|face_id:face_emotion), see:
    This would analogously hold for the random slopes of course.
    Please see also:

    I do not see why this idea of treating it as nested would make sense in your case. Nested really is only important when you have a truly
    hierarchical structure, for example students in classroom and each student can only occur in exactly one classroom. In your case the lower level thing (i.e., emotions) occur however in different faces. So it does not really apply.


    In your case, the source of random variability most likely comes from the identity of the face. So this would be the natural random-effects grouping factor (i.e., face_id). The important question now would be, if each emotion exists for each face_id? If this is the case, then this should be added to the model as well, together with the other random slopes you are missing. That is, a reasonable model could be:
    lmm_model <- mixed(RT ~ session * training * face_emotion + (session*face_emotion|p#) + (session*training*face_emotion|face_id)

    The important thing to keep in mind in this kind of design, is that the two random-effeccts grouping factors are crossed. This means, the question of multiple observations need to be determined for each combination of random-effects grouping factor and fixed effect individually. So, for the face question you need to ask yourself: Do I have multiple observations per face_id for each level of session (or emotion or training), across participants? That is, whether or not these multiple observations come from the same participant or not does not play any role. And I guess that across your experiment, you have multiple observations for each face_id for each of your factors. Hence, the random-effects structure I asked above.

    Said more explicitly: For the by-participant random-effects structure you have to ask yourself, which factors are within-participant, ignoring the face_id factor. Conversly, for the by-face_id random-effects structure you have to ask yourself which factors vary within face_id, ignoring the participant factor.

    Hope that helps!

    in reply to: Mixed model specification and centering of predictor #285

    (1) Is the model correctly specified? I’m not sure because you could also say that chan is nested in id since we recorded the same 64 EEG channels for each subject.

    If you measured the same 64 channels for each subject this means these two variables are crossed and not nested. Nested means that some specific levels of one factors (e.g., EEG channel) only appears within specific levels of another factors (e.g., ID). So for example, a nesting would exist if for each participant you would have an idiosyncratic set of EEG channels. Which of course seems quite unlikely.

    However, one thing to consider is that ist is also a within-channel factor. So maybe
    m <- mixed(erds ~ difficulty * ist + (difficulty | id) + (difficulty * ist| chan), erds, method="S") might be more appropriate (i.e., reflect the maximum random-effects structure justified by the design).

    (2) I get different results whether or not I scale the continuous predictor ist. […]
    Why does centering/scaling the ist predictor affect the estimates, and more noticeably the p-values of the difficulty levels? Whereas in the first case, both difficulty1 and difficulty2 are highly significant, these factors are not significant in the latter case anymore. What is going on? What is the correct way to deal with this situation?

    A few things to consider here.

    First, afex tries to discourage you to inspect the parameter estimates as you do via summary. These are very often not very helpful, especially in cases such as your, where you have factors with more than two levels. I would highly suggest to use the print method (i.e., just m), nice(), or anova() when your interest is in the fixed-effects. You can then use the interplay with emmeans to look at specific effects.

    And now, to your specific question, yes centering can have dramatic effect on the interpretation of your model. This is for example discussed on CrossValidated:
    However, there exist also numerous papers discussing this. Some specifically in the context of mixed models. For example two papers that I know of are:

    Dalal, D. K., & Zickar, M. J. (2012). Some Common Myths About Centering Predictor Variables in Moderated Multiple Regression and Polynomial Regression. Organizational Research Methods, 15(3), 339–362.

    Wang, L., & Maxwell, S. E. (2015). On disaggregating between-person and within-person effects with longitudinal data using multilevel models. Psychological Methods, 20(1), 63–83.

    The thing to keep in mind is that variables are tested when the other variables are set to 0. So the 0 value should be meaningful. Often centering makes it meaningful, because it is then on the mean. But other values of 0 can be meaningful as well (e.g., the midpoint of a scale).

    in reply to: precise estimates from emmeans across lm and aov_ez #272

    A new version of afex, version 0.21-2, that fixes this bug, has just appeared on CRAN:

    I apologize for the delay!


    I suggest you take an intensive look at our chapter and further literature. This is probably more efficient than waiting for my responses. For example, the chapter contains references to the main references for whether or not it is appropriate to reduce the random-effects structure based on data. One camp, basically Barr, Levy, Scheepers, and Tily (2013), suggest that this is almost never a good idea. The other camp, Bates et al. and Matuschek et al. (see rferences below), suggest that it is reasonable in case of convergence problems. I think both sides have good arguments. But if your model does not converge, what else can you do other than reduce the random-effects structure. My first preferred solution is to start by removing the correlation among random-effects (also discussed in the chapter).

    I am not sure how relevant it is to try to build the equivalent model to a repeated-measures ANOVA. Mixed models are more adequate for your data. So try to use them in the best way possible.
    Perhaps one concrete response. Including a fixed and random effect for trialno is essentially equivalent to using the residuals. You “control” for this effect (you can search CV, there are plenty of relevant questions with answers on controlling in a regression framework).

    The old lme syntax did not allow random-slopes in the same way lme4 did. People who still use this older approach therefore have to use the nesting syntax. As discussed in detail in for example Barr et al., the recommended approach nowadays uses random slopes. Some more details can again be found on CV. For example:

    Good luck!

    Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious Mixed Models. arXiv:1506.04967 [stat]. Abgerufen von
    Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.


    There are many questions here. In general, I would advise you to read our introductory chapter that covers quite a few of them:

    Can I assume that this is a mixed model “equivalent” of a “normal” repeated measure anova (as done with SPSS) computed on aggregated data (wide format with one mean logRT for each combination of A * B * C) ?

    This is of course a different model that will produce different results, but this is probably the recommended model (if you ignore response and trialno). For the within-subject design this is the “maximal model justified by the design”, which is recommended following Barr, Levy, Scheeperes and Tily (2013, JML).

    How can I extend the above syntax to account for trialnumber (a tendency to have a negative correlation between trialno and logRT)?
    What would be considered “best practice” if I wanted to add response as a separate factor making it a A * B * C * response design and considering that adding response breaks the balance of A*B*C? I’m not sure how to approach both fixed and random parts for this

    Let us first consider response. As each participant should have (in principle) both yes and no responses for each cell of the design, the natural way to extend this model would be to add response as both fixed and random:
    mixed(logRT ~ A * B * C * response + (A * B * C * response | Subject), df)
    I do not immediately see how the issue of imbalance is a huge problem. Of course, balance would be better, but the model should be able to deal with it.
    The question of trialno is more difficult. Of cxourse you could simply add this as fixed and random effect as well, for example:
    mixed(logRT ~ A * B * C * response + trialno + (A * B * C * response + trialno | Subject), df)
    However, this would only allow a linear effect. Maybe some other functional form is better. If this is important another type of model such as a GAMM may be better. See:
    Baayen, H., Vasishth, S., Kliegl, R., & Bates, D. (2017). The cave of shadows: Addressing the human factor with generalized additive mixed models. Journal of Memory and Language, 94, 206–234.

    Should I consider any non default covariance structures for both random and fixed effects? I’ve been advised to use compound symmetry for the fixed part – but to be honest I don’t know any reason for it, nor I can explain differences between various structures (avaible for example in SPSS procedure MIXED)

    Unfortunately, this is at the moment not really possible with lme4 (which mixed uses).

    One final thought. Your model seems to have quite a lot of data, in this case method = "S" is probably indicated in the call to mixed(). It is faster and very similar.

    Hope that helps.

Viewing 25 posts - 1 through 25 (of 75 total)