PB p-values for a poisson glmm with an offset?

Home forums Mixed Models PB p-values for a poisson glmm with an offset?

This topic contains 4 replies, has 2 voices, and was last updated by  CMac 1 day, 21 hours ago.

  • Author
    Posts
  • #152

    CMac
    Participant

    Hello! I would like to use the mixed function to calculate p-values using the parametric bootstrap (PB) method on a GLMM, specifically a log-link Poisson model with an offset. Is this possible? I have included an offset argument in the mixed() function as one would when using glmer(), but I don’t think the mixed function likes it; I get an ‘object of type builtin is not subsettable’ error.

    This is the script I am using:
    mod1<-mixed((index*abundance)~factor1*factor2+(1|subject), data=exp, family=poisson(link=”log”), offset=(log(abundance), method=”PB”)

    It works fine as:
    mod1<-glmer((index*abundance)~factor1*factor2+(1|subject), data=exp, family=poisson(link=”log”), offset=(log(abundance), REML=F)

    … but it would be great to have some p-values.
    Please advise if I should be using a different formula to use an offset with mixed(). Many thanks in advance!

  • #153

    henrik
    Keymaster

    Can you provide some example data? Without being able to reproduce the problem I am unable to fix it.

    The example data can be as trivial as possible. See the following link for advise on how to create it: https://stackoverflow.com/q/5963269/289572

    • This reply was modified 5 days, 6 hours ago by  henrik.
  • #157

    CMac
    Participant

    Whoops, sorry, whilst creating the example data I identified the cause of the ‘type builtin not subsettable’ was related to my rookie mistake naming my dataframe ‘exp’ (doh!).

    However things still don’t seem right: the mixed function now runs, but is taking an extremely long time to run compared with glmer, and hasn’t completed calculating P-values after several hours on my full dataset. It also produces a lot of non-convergence warnings.

    Here’s an example data subset and code. The mixed function does eventually complete calculations on this data, but still comes up with the non-convergence warnings. It would be great if you could let me know if it’s working as expected or not. I hope I haven’t done anything else silly (I have checked everything I can think of). Thanks very much!!

    #R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
    #packages
    library(lme4)
    library(afex)
    #example data
    zz<-"factor1 factor2 subject abundance max
    A a 1 659 658
    A a 2 1408 1407
    A b 1 826 789
    A b 2 1348 1271
    B a 3 242 242
    B a 4 487 480
    B b 3 605 585
    B b 4 618 599
    C a 5 382 346
    C a 6 613 598
    C b 5 370 336
    C b 6 1002 963"
    df <- read.table(text=zz, header = TRUE)
    #factors
    df$factor1<-as.factor(df$factor1)
    df$factor2<-as.factor(df$factor2)
    df$subject<-as.factor(df$subject)
    #create index variable
    df$index<-df$max/df$abundance
    #lme4::glmer
    mod1<-glmer((index*abundance)~factor1*factor2+(1|subject), data=df, 
                  family=poisson(link="log"), offset=(log(df$abundance)))
    #afex::mixed
    mod2<-mixed((index*abundance)~factor1*factor2+(1|subject), data=df, 
                 family=poisson(link="log"), offset=(log(df$abundance)), method="PB")
    • This reply was modified 1 day, 23 hours ago by  henrik. Reason: nicer formatting of code
  • #160

    henrik
    Keymaster

    As the help page says, ‘method = "PB" calculates p-values using parametric bootstrap’. This means, it calculates a reference distribution of chi^2-square values assuming the null model is true, by generating data from the full model and then fitting both models (i.e., full and restricted model) to it. For each of those fits the chi-square value is obtained. The p-values is then the proportion of sampled chi-square values larger than the actual chi-square value based on the real data (it actually is the number of samples larger plus one divided by the number of samples). Because of these repeated fits you also get the repeated non-convergence warnings.

    Per default it uses 1000 samples for this per test. This means, it has to be considerably slower than just fitting in glmer. In fact, roughly 1000 times the number of effects in your model slower. You can change the number of parametric bootstrap samples via the args_test and nsim, e.g., args_test = list(nsim = 500). However, values lower than 500 seem suspicious. Note that you can also distribute the estimation using multiple cores (see examples of mixed help).

    If you do not have the time to wait, you can also use method = "LRT". Note however that for this you should have enough levels for each of your random-effects grouping factors (more than 30, better more than 50). Otherwise the results might be anti-conservative.

  • #162

    CMac
    Participant

    Thanks Henrik. I will just be more patient then 🙂
    I was just concerned because I am running some lmers on the same dataset (different response variables) with mixed using method=”PB” and they only take a few minutes to run, compared with the hours in which the glmer had not completed its calculations. The dataset is not that big, only 144 samples.

    Thank you very much for your help and explanations!

You must be logged in to reply to this topic.

 

Author: CMac