April 23, 2019 at 11:15 GMT+0000 #350
I have what’s likely a common and perhaps naive question.
In a current experiment I have three factors, each with two levels, and
a number of other measures based on cogntive tasks as well as questionnaires.
I have taken to leaving the three experimental factors in the model while
systematically exploring as many as 3 or 4 of the other measures within that
context. Thus far, as a preliminary step, I’ve been using using AIC to guide
the model selection. I’m limiting the exploration to relatively small subset
of the explanatory variables.
For now, I wish to explore whether in addition to perhaps the factors the
other cognitive and trait measures explain my dependent variable. Just to
be clear, my model might look like the following
m1 <- lmer(DV ~ cog1 + cog2 + cog10 + factor1*factor2*factor3 + (1|ID), …)
What I often find is that the best model includes predictor variables that are
not significant but dramatically improve the AIC. Keep in mind that I retain
the experimental factors which may be of no benefit to the AIC metric/model
In your opinion, should these non-significant variables be left in final
models and should one strictly care about significance of individual
explanatory variables. I’m not yet interested in prediction but rather
identifying variables and/or models that best account for the DV. So, if a
variable is not significant but improves the model what can we say about that
Any comment would be most appreciated.
April 24, 2019 at 11:12 GMT+0000 #352
I have some answer to your questions, but it makes sense to read something more on variable selection in regression modeling as this is probably the problem your are struggling with and it is not a trivial problem. A good starting point may be ‘Regression Modeling Strategies’ by Frank Harrel (2015): https://www.springer.com/gb/book/9781441929181
Note that this book deals a lot with simple regression models (i.e., not mixed models), but the problem you have holds in the same manner.
Model selection using AIC and hypothesis testing using significance test do different things. Reasons why they can diverge can be because they use different cut-offs. However, usually a variable that dramatically improves AIC should also be significant in a significance test. Because what this test essentially does is fit a restricted model in which this variable is withheld and then compare the fit using something akin a likelihood-ratio test that compares the fit of two nested models.
If a variable alone improves AIC but not jointly is probably a case of multicollinearity: https://en.wikipedia.org/wiki/Multicollinearity
Your additional predictor variables are probably itself correlated with each other. So the predictive ability by one can then often be covered by other variables that are still correlated with it.
To answer your question directly: A variable should not really be non-significant but improve the model considerably in terms of AIC. At least not with all the other variables are in there but only the one is withhold.
I repeat again that variable selection is generally a difficult problem and many complicated procedures (not implemented in afex or lme4) exist to tackle this in a more principled manner.
April 24, 2019 at 11:18 GMT+0000 #353
I forgot to say one more thing. If any of your factors or additional variables varies within-participants, it should be added as a random slope. Random-intercept only models, as the one you fit, seem generally quite dubious. Please see my discussion on this issue at: http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf
April 24, 2019 at 15:34 GMT+0000 #357
Now, this is embarrassing, since posting I have reviewed the most recent data set and it seems that the non-significant variable includes a couple NAs when that was not the case in earlier versions. As a consequence, it appears that I was misled by AIC. At this stage I was simply screening.
I will however follow-up with your suggested readings.
Right now my factors are only 2 levels each.
April 24, 2019 at 15:46 GMT+0000 #358
It does not matter how many levels each of your factor has. What matters if each participants sees both levels of each factor (i.e., it varies within-participants). If this is the case, corresponding random slopes should be included.
April 24, 2019 at 16:19 GMT+0000 #359
Thanks again, your time is most appreciated. I’ll adjust accordingly.
You must be logged in to reply to this topic.