Home forums Mixed Models Non-significant explanatory variables

This topic contains 5 replies, has 2 voices, and was last updated by  kns 5 months, 3 weeks ago.

  • Author
    Posts
  • #350

    kns
    Participant

    Hi folks,

    I have what’s likely a common and perhaps naive question.

    In a current experiment I have three factors, each with two levels, and
    a number of other measures based on cogntive tasks as well as questionnaires.

    I have taken to leaving the three experimental factors in the model while
    systematically exploring as many as 3 or 4 of the other measures within that
    context. Thus far, as a preliminary step, I’ve been using using AIC to guide
    the model selection. I’m limiting the exploration to relatively small subset
    of the explanatory variables.

    For now, I wish to explore whether in addition to perhaps the factors the
    other cognitive and trait measures explain my dependent variable. Just to
    be clear, my model might look like the following

    m1 <- lmer(DV ~ cog1 + cog2 + cog10 + factor1*factor2*factor3 + (1|ID), …)

    What I often find is that the best model includes predictor variables that are
    not significant but dramatically improve the AIC. Keep in mind that I retain
    the experimental factors which may be of no benefit to the AIC metric/model
    fit.

    In your opinion, should these non-significant variables be left in final
    models and should one strictly care about significance of individual
    explanatory variables. I’m not yet interested in prediction but rather
    identifying variables and/or models that best account for the DV. So, if a
    variable is not significant but improves the model what can we say about that
    variable?

    Any comment would be most appreciated.

  • #352

    henrik
    Keymaster

    I have some answer to your questions, but it makes sense to read something more on variable selection in regression modeling as this is probably the problem your are struggling with and it is not a trivial problem. A good starting point may be ‘Regression Modeling Strategies’ by Frank Harrel (2015): https://www.springer.com/gb/book/9781441929181
    Note that this book deals a lot with simple regression models (i.e., not mixed models), but the problem you have holds in the same manner.

    Model selection using AIC and hypothesis testing using significance test do different things. Reasons why they can diverge can be because they use different cut-offs. However, usually a variable that dramatically improves AIC should also be significant in a significance test. Because what this test essentially does is fit a restricted model in which this variable is withheld and then compare the fit using something akin a likelihood-ratio test that compares the fit of two nested models.

    If a variable alone improves AIC but not jointly is probably a case of multicollinearity: https://en.wikipedia.org/wiki/Multicollinearity
    Your additional predictor variables are probably itself correlated with each other. So the predictive ability by one can then often be covered by other variables that are still correlated with it.

    To answer your question directly: A variable should not really be non-significant but improve the model considerably in terms of AIC. At least not with all the other variables are in there but only the one is withhold.

    I repeat again that variable selection is generally a difficult problem and many complicated procedures (not implemented in afex or lme4) exist to tackle this in a more principled manner.

  • #353

    henrik
    Keymaster

    I forgot to say one more thing. If any of your factors or additional variables varies within-participants, it should be added as a random slope. Random-intercept only models, as the one you fit, seem generally quite dubious. Please see my discussion on this issue at: http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf

  • #357

    kns
    Participant

    Thank you.

    Now, this is embarrassing, since posting I have reviewed the most recent data set and it seems that the non-significant variable includes a couple NAs when that was not the case in earlier versions. As a consequence, it appears that I was misled by AIC. At this stage I was simply screening.

    I will however follow-up with your suggested readings.

    Right now my factors are only 2 levels each.

  • #358

    henrik
    Keymaster

    It does not matter how many levels each of your factor has. What matters if each participants sees both levels of each factor (i.e., it varies within-participants). If this is the case, corresponding random slopes should be included.

  • #359

    kns
    Participant

    Thanks again, your time is most appreciated. I’ll adjust accordingly.

You must be logged in to reply to this topic.