December 7, 2017 at 06:03 GMT+0000 #152
Hello! I would like to use the mixed function to calculate p-values using the parametric bootstrap (PB) method on a GLMM, specifically a log-link Poisson model with an offset. Is this possible? I have included an offset argument in the mixed() function as one would when using glmer(), but I don’t think the mixed function likes it; I get an ‘object of type builtin is not subsettable’ error.
This is the script I am using:
mod1<-mixed((index*abundance)~factor1*factor2+(1|subject), data=exp, family=poisson(link=”log”), offset=(log(abundance), method=”PB”)
It works fine as:
mod1<-glmer((index*abundance)~factor1*factor2+(1|subject), data=exp, family=poisson(link=”log”), offset=(log(abundance), REML=F)
… but it would be great to have some p-values.
Please advise if I should be using a different formula to use an offset with mixed(). Many thanks in advance!
December 7, 2017 at 10:57 GMT+0000 #153henrikKeymaster
Can you provide some example data? Without being able to reproduce the problem I am unable to fix it.
The example data can be as trivial as possible. See the following link for advise on how to create it: https://stackoverflow.com/q/5963269/289572
- This reply was modified 2 years, 4 months ago by henrik.
December 8, 2017 at 21:40 GMT+0000 #157
Whoops, sorry, whilst creating the example data I identified the cause of the ‘type builtin not subsettable’ was related to my rookie mistake naming my dataframe ‘exp’ (doh!).
However things still don’t seem right: the mixed function now runs, but is taking an extremely long time to run compared with glmer, and hasn’t completed calculating P-values after several hours on my full dataset. It also produces a lot of non-convergence warnings.
Here’s an example data subset and code. The mixed function does eventually complete calculations on this data, but still comes up with the non-convergence warnings. It would be great if you could let me know if it’s working as expected or not. I hope I haven’t done anything else silly (I have checked everything I can think of). Thanks very much!!
#R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree" #packages library(lme4) library(afex) #example data zz<-"factor1 factor2 subject abundance max A a 1 659 658 A a 2 1408 1407 A b 1 826 789 A b 2 1348 1271 B a 3 242 242 B a 4 487 480 B b 3 605 585 B b 4 618 599 C a 5 382 346 C a 6 613 598 C b 5 370 336 C b 6 1002 963" df <- read.table(text=zz, header = TRUE) #factors df$factor1<-as.factor(df$factor1) df$factor2<-as.factor(df$factor2) df$subject<-as.factor(df$subject) #create index variable df$index<-df$max/df$abundance #lme4::glmer mod1<-glmer((index*abundance)~factor1*factor2+(1|subject), data=df, family=poisson(link="log"), offset=(log(df$abundance))) #afex::mixed mod2<-mixed((index*abundance)~factor1*factor2+(1|subject), data=df, family=poisson(link="log"), offset=(log(df$abundance)), method="PB")
- This reply was modified 2 years, 4 months ago by henrik. Reason: nicer formatting of code
December 10, 2017 at 17:58 GMT+0000 #160henrikKeymaster
As the help page says, ‘
method = "PB"calculates p-values using parametric bootstrap’. This means, it calculates a reference distribution of chi^2-square values assuming the null model is true, by generating data from the full model and then fitting both models (i.e., full and restricted model) to it. For each of those fits the chi-square value is obtained. The p-values is then the proportion of sampled chi-square values larger than the actual chi-square value based on the real data (it actually is the number of samples larger plus one divided by the number of samples). Because of these repeated fits you also get the repeated non-convergence warnings.
Per default it uses 1000 samples for this per test. This means, it has to be considerably slower than just fitting in
glmer. In fact, roughly 1000 times the number of effects in your model slower. You can change the number of parametric bootstrap samples via the
args_test = list(nsim = 500). However, values lower than 500 seem suspicious. Note that you can also distribute the estimation using multiple cores (see examples of
If you do not have the time to wait, you can also use
method = "LRT". Note however that for this you should have enough levels for each of your random-effects grouping factors (more than 30, better more than 50). Otherwise the results might be anti-conservative.
December 10, 2017 at 20:00 GMT+0000 #162
Thanks Henrik. I will just be more patient then 🙂
I was just concerned because I am running some lmers on the same dataset (different response variables) with mixed using method=”PB” and they only take a few minutes to run, compared with the hours in which the glmer had not completed its calculations. The dataset is not that big, only 144 samples.
Thank you very much for your help and explanations!
- You must be logged in to reply to this topic.