Non-significant explanatory variables

This topic has 5 replies, 2 voices, and was last updated 5 years, 3 months ago by kns.

Viewing 5 reply threads

Author

Posts
- April 23, 2019 at 11:15 GMT+0000 #350
  
  kns
  Participant
  
  Hi folks,
  
  I have what’s likely a common and perhaps naive question.
  
  In a current experiment I have three factors, each with two levels, and
  a number of other measures based on cogntive tasks as well as questionnaires.
  
  I have taken to leaving the three experimental factors in the model while
  systematically exploring as many as 3 or 4 of the other measures within that
  context. Thus far, as a preliminary step, I’ve been using using AIC to guide
  the model selection. I’m limiting the exploration to relatively small subset
  of the explanatory variables.
  
  For now, I wish to explore whether in addition to perhaps the factors the
  other cognitive and trait measures explain my dependent variable. Just to
  be clear, my model might look like the following
  
  m1 <- lmer(DV ~ cog1 + cog2 + cog10 + factor1*factor2*factor3 + (1|ID), …)
  
  What I often find is that the best model includes predictor variables that are
  not significant but dramatically improve the AIC. Keep in mind that I retain
  the experimental factors which may be of no benefit to the AIC metric/model
  fit.
  
  In your opinion, should these non-significant variables be left in final
  models and should one strictly care about significance of individual
  explanatory variables. I’m not yet interested in prediction but rather
  identifying variables and/or models that best account for the DV. So, if a
  variable is not significant but improves the model what can we say about that
  variable?
  
  Any comment would be most appreciated.
- April 24, 2019 at 11:12 GMT+0000 #352
  
  henrik
  Keymaster
  
  I have some answer to your questions, but it makes sense to read something more on variable selection in regression modeling as this is probably the problem your are struggling with and it is not a trivial problem. A good starting point may be ‘Regression Modeling Strategies’ by Frank Harrel (2015): https://www.springer.com/gb/book/9781441929181
  Note that this book deals a lot with simple regression models (i.e., not mixed models), but the problem you have holds in the same manner.
  
  Model selection using AIC and hypothesis testing using significance test do different things. Reasons why they can diverge can be because they use different cut-offs. However, usually a variable that dramatically improves AIC should also be significant in a significance test. Because what this test essentially does is fit a restricted model in which this variable is withheld and then compare the fit using something akin a likelihood-ratio test that compares the fit of two nested models.
  
  If a variable alone improves AIC but not jointly is probably a case of multicollinearity: https://en.wikipedia.org/wiki/Multicollinearity
  Your additional predictor variables are probably itself correlated with each other. So the predictive ability by one can then often be covered by other variables that are still correlated with it.
  
  To answer your question directly: A variable should not really be non-significant but improve the model considerably in terms of AIC. At least not with all the other variables are in there but only the one is withhold.
  
  I repeat again that variable selection is generally a difficult problem and many complicated procedures (not implemented in afex or lme4) exist to tackle this in a more principled manner.
- April 24, 2019 at 11:18 GMT+0000 #353
  
  henrik
  Keymaster
  
  I forgot to say one more thing. If any of your factors or additional variables varies within-participants, it should be added as a random slope. Random-intercept only models, as the one you fit, seem generally quite dubious. Please see my discussion on this issue at: http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf
- April 24, 2019 at 15:34 GMT+0000 #357
  
  kns
  Participant
  
  Thank you.
  
  Now, this is embarrassing, since posting I have reviewed the most recent data set and it seems that the non-significant variable includes a couple NAs when that was not the case in earlier versions. As a consequence, it appears that I was misled by AIC. At this stage I was simply screening.
  
  I will however follow-up with your suggested readings.
  
  Right now my factors are only 2 levels each.
- April 24, 2019 at 15:46 GMT+0000 #358
  
  henrik
  Keymaster
  
  It does not matter how many levels each of your factor has. What matters if each participants sees both levels of each factor (i.e., it varies within-participants). If this is the case, corresponding random slopes should be included.
- April 24, 2019 at 16:19 GMT+0000 #359
  
  kns
  Participant
  
  Thanks again, your time is most appreciated. I’ll adjust accordingly.
Author

Posts

Viewing 5 reply threads

You must be logged in to reply to this topic.

Forums

Recent Topics

Login

Search Forums

Non-significant explanatory variables