Home forums Mixed Models Non-significant explanatory variables

Viewing 5 reply threads
  • Author
    Posts
    • #350
      kns
      Participant

      Hi folks,

      I have what’s likely a common and perhaps naive question.

      In a current experiment I have three factors, each with two levels, and
      a number of other measures based on cogntive tasks as well as questionnaires.

      I have taken to leaving the three experimental factors in the model while
      systematically exploring as many as 3 or 4 of the other measures within that
      context. Thus far, as a preliminary step, I’ve been using using AIC to guide
      the model selection. I’m limiting the exploration to relatively small subset
      of the explanatory variables.

      For now, I wish to explore whether in addition to perhaps the factors the
      other cognitive and trait measures explain my dependent variable. Just to
      be clear, my model might look like the following

      m1 <- lmer(DV ~ cog1 + cog2 + cog10 + factor1*factor2*factor3 + (1|ID), …)

      What I often find is that the best model includes predictor variables that are
      not significant but dramatically improve the AIC. Keep in mind that I retain
      the experimental factors which may be of no benefit to the AIC metric/model
      fit.

      In your opinion, should these non-significant variables be left in final
      models and should one strictly care about significance of individual
      explanatory variables. I’m not yet interested in prediction but rather
      identifying variables and/or models that best account for the DV. So, if a
      variable is not significant but improves the model what can we say about that
      variable?

      Any comment would be most appreciated.

    • #352
      henrik
      Keymaster

      I have some answer to your questions, but it makes sense to read something more on variable selection in regression modeling as this is probably the problem your are struggling with and it is not a trivial problem. A good starting point may be ‘Regression Modeling Strategies’ by Frank Harrel (2015): https://www.springer.com/gb/book/9781441929181
      Note that this book deals a lot with simple regression models (i.e., not mixed models), but the problem you have holds in the same manner.

      Model selection using AIC and hypothesis testing using significance test do different things. Reasons why they can diverge can be because they use different cut-offs. However, usually a variable that dramatically improves AIC should also be significant in a significance test. Because what this test essentially does is fit a restricted model in which this variable is withheld and then compare the fit using something akin a likelihood-ratio test that compares the fit of two nested models.

      If a variable alone improves AIC but not jointly is probably a case of multicollinearity: https://en.wikipedia.org/wiki/Multicollinearity
      Your additional predictor variables are probably itself correlated with each other. So the predictive ability by one can then often be covered by other variables that are still correlated with it.

      To answer your question directly: A variable should not really be non-significant but improve the model considerably in terms of AIC. At least not with all the other variables are in there but only the one is withhold.

      I repeat again that variable selection is generally a difficult problem and many complicated procedures (not implemented in afex or lme4) exist to tackle this in a more principled manner.

    • #353
      henrik
      Keymaster

      I forgot to say one more thing. If any of your factors or additional variables varies within-participants, it should be added as a random slope. Random-intercept only models, as the one you fit, seem generally quite dubious. Please see my discussion on this issue at: http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf

    • #357
      kns
      Participant

      Thank you.

      Now, this is embarrassing, since posting I have reviewed the most recent data set and it seems that the non-significant variable includes a couple NAs when that was not the case in earlier versions. As a consequence, it appears that I was misled by AIC. At this stage I was simply screening.

      I will however follow-up with your suggested readings.

      Right now my factors are only 2 levels each.

    • #358
      henrik
      Keymaster

      It does not matter how many levels each of your factor has. What matters if each participants sees both levels of each factor (i.e., it varies within-participants). If this is the case, corresponding random slopes should be included.

    • #359
      kns
      Participant

      Thanks again, your time is most appreciated. I’ll adjust accordingly.

Viewing 5 reply threads
  • You must be logged in to reply to this topic.