Home forums Mixed Models PB p-values for a poisson glmm with an offset?

Viewing 4 reply threads
  • Author
    Posts
    • #152
      CMac
      Participant

      Hello! I would like to use the mixed function to calculate p-values using the parametric bootstrap (PB) method on a GLMM, specifically a log-link Poisson model with an offset. Is this possible? I have included an offset argument in the mixed() function as one would when using glmer(), but I don’t think the mixed function likes it; I get an ‘object of type builtin is not subsettable’ error.

      This is the script I am using:
      mod1<-mixed((index*abundance)~factor1*factor2+(1|subject), data=exp, family=poisson(link=”log”), offset=(log(abundance), method=”PB”)

      It works fine as:
      mod1<-glmer((index*abundance)~factor1*factor2+(1|subject), data=exp, family=poisson(link=”log”), offset=(log(abundance), REML=F)

      … but it would be great to have some p-values.
      Please advise if I should be using a different formula to use an offset with mixed(). Many thanks in advance!

    • #153
      henrik
      Keymaster

      Can you provide some example data? Without being able to reproduce the problem I am unable to fix it.

      The example data can be as trivial as possible. See the following link for advise on how to create it: https://stackoverflow.com/q/5963269/289572

      • This reply was modified 6 years, 7 months ago by henrik.
    • #157
      CMac
      Participant

      Whoops, sorry, whilst creating the example data I identified the cause of the ‘type builtin not subsettable’ was related to my rookie mistake naming my dataframe ‘exp’ (doh!).

      However things still don’t seem right: the mixed function now runs, but is taking an extremely long time to run compared with glmer, and hasn’t completed calculating P-values after several hours on my full dataset. It also produces a lot of non-convergence warnings.

      Here’s an example data subset and code. The mixed function does eventually complete calculations on this data, but still comes up with the non-convergence warnings. It would be great if you could let me know if it’s working as expected or not. I hope I haven’t done anything else silly (I have checked everything I can think of). Thanks very much!!

      #R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
      #packages
      library(lme4)
      library(afex)
      #example data
      zz<-"factor1 factor2 subject abundance max
      A a 1 659 658
      A a 2 1408 1407
      A b 1 826 789
      A b 2 1348 1271
      B a 3 242 242
      B a 4 487 480
      B b 3 605 585
      B b 4 618 599
      C a 5 382 346
      C a 6 613 598
      C b 5 370 336
      C b 6 1002 963"
      df <- read.table(text=zz, header = TRUE)
      #factors
      df$factor1<-as.factor(df$factor1)
      df$factor2<-as.factor(df$factor2)
      df$subject<-as.factor(df$subject)
      #create index variable
      df$index<-df$max/df$abundance
      #lme4::glmer
      mod1<-glmer((index*abundance)~factor1*factor2+(1|subject), data=df, 
                    family=poisson(link="log"), offset=(log(df$abundance)))
      #afex::mixed
      mod2<-mixed((index*abundance)~factor1*factor2+(1|subject), data=df, 
                   family=poisson(link="log"), offset=(log(df$abundance)), method="PB")
      • This reply was modified 6 years, 7 months ago by henrik. Reason: nicer formatting of code
    • #160
      henrik
      Keymaster

      As the help page says, ‘method = "PB" calculates p-values using parametric bootstrap’. This means, it calculates a reference distribution of chi^2-square values assuming the null model is true, by generating data from the full model and then fitting both models (i.e., full and restricted model) to it. For each of those fits the chi-square value is obtained. The p-values is then the proportion of sampled chi-square values larger than the actual chi-square value based on the real data (it actually is the number of samples larger plus one divided by the number of samples). Because of these repeated fits you also get the repeated non-convergence warnings.

      Per default it uses 1000 samples for this per test. This means, it has to be considerably slower than just fitting in glmer. In fact, roughly 1000 times the number of effects in your model slower. You can change the number of parametric bootstrap samples via the args_test and nsim, e.g., args_test = list(nsim = 500). However, values lower than 500 seem suspicious. Note that you can also distribute the estimation using multiple cores (see examples of mixed help).

      If you do not have the time to wait, you can also use method = "LRT". Note however that for this you should have enough levels for each of your random-effects grouping factors (more than 30, better more than 50). Otherwise the results might be anti-conservative.

    • #162
      CMac
      Participant

      Thanks Henrik. I will just be more patient then 🙂
      I was just concerned because I am running some lmers on the same dataset (different response variables) with mixed using method=”PB” and they only take a few minutes to run, compared with the hours in which the glmer had not completed its calculations. The dataset is not that big, only 144 samples.

      Thank you very much for your help and explanations!

Viewing 4 reply threads
  • You must be logged in to reply to this topic.