Forum Replies Created

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • blazej
    Participant

    Henrik,
    thanks a lot for pointing out your chapter (it’s great and I learned a lot!) as well as links to your CV questions (seems like you had some similar doubts, only a couple of years earlier).

    Anyway I did my homework and eventually came up with the “maximum model justified by design” as you suggested on exemplary data.
    I do however get convergence errors on my real data. I’ve been trying to sort it out in various ways – for example uncorrelated factor:

    mixed(logRT ~ A*B*C*response + (A*B*C*response||Subject), 
                data = df, progress = TRUE, method = "S", expand_re = TRUE)

    but then I started to wonder whether or not they are truly uncorrelated and how to set 2 factors as uncorrelated but leave two as correlated.
    Would this syntax make any sense?

    mixed(logRT ~ A*B*C*response + (A*B||Subject) + (C*response|Subject), 
                data = df, progress = TRUE, method = "S", expand_re = TRUE)

    A*B are uncorrelated as those factors conceptually could be uncorrelated. C and response on the other hand can be strongly correlated.

    Also, do you know of any (preferably easy) way to simulate predicted values from different models (as the ones above) and then compare their predictions between each other to (for example) establish which of those fits better in terms a true prediction (and not only fit)?
    I found Placida code on CV for something really similar (lost the link accidentally) that was based on sleep study data and compared p-values from 1000 simulations of two different random structures. My take (but not working for reasons I don’t understand) on exemplary data was this:

    fm.sim <- lmer(logRT ~ A*B*C +(1|Subject), data=df, REML=FALSE) 
    anova(fm.sim)
    n.sim <- 50
    sim.data <- vector(mode="list",)
    tempReaction <- simulate(fm.sim, nsim=n.sim)
    tempdata <- model.frame(fm.sim)
    
    for (i in 1:n.sim){
          tempdata$rtLN <- tempReaction[,i]
          output0 <- lmer(logRT ~ A*B*C +(A*B*C|Subject), data=tempdata, REML=FALSE)
          output1 <- lmer(logRT ~ A*B*C +(A*B*C||Subject), data=tempdata, REML=FALSE)
          temp <- anova(output0,output1)
          pval <- temp$<code>Pr(>Chisq)</code>[2]
          sim.data[[i]] <- list(model0=output0,modelA=output1, pvalue=pval)
    }
    
    pval_l <- sapply(sim.data, "[", "pvalue")
    pval_df<- do.call(rbind.data.frame, pval_l)
    
    hist(pval_df[,1])

    In original Placida’a answer histogram showed a nice uniform distribution of all p-values indicating no true difference between two models (as I understand it). My code on the other hand shows only one, fixed result across all simulations.

    If I’m expecting too much from you and your time, please let me know. Otherwise answers or links to similar questions / solutions are also very helpful.
    Thanks!

    blazej
    Participant

    Thank you.
    I’ll do my homework with all the reading and get back here if any major question pops up.

    blazej
    Participant

    Another way to put my confusion between your syntax and Field’s syntax is this example

    m1 <- mixed(logRT ~ A*B*C + (A*B*C|Subject), 
                    data = df, progress = TRUE, method = "S")
    m2 <- mixed(logRT ~ A*B*C + (1|Subject/A/B/C), 
                data = df, progress = TRUE, method = "S")

    Considering we are still talking about the same experiment, how do those two models differ, and why should I pick one over the other?

    PS.
    I added a “simulation” of missing values that are initially present in my data (i don’t know if that makes any difference)

    library(AlgDesign) #for generating all levels a factorial design)
    
    df <-gen.factorial(c(16,2,2,2,100), factors = "all", 
                       varNames = c("rep", "A", "B", "C", "Subject"))
    df$rep <- as.numeric(df$rep)
    df$Subject <- as.numeric(df$Subject)
    logRT <- rnorm(n=12800, m=7, sd=1)
    trialno<- rep(1:128, times = 100)
    response <- sample(0:1, 12800, prob = c(0.3, 0.7), replace = T)
    
    #Simulate some values to drop (set as missings) due to extremly low latency
    missingRTs<- as.logical(sample(0:1, 12800, prob = c(0.96, 0.04), replace = T))
    logRT[missingRTs==1] <- NA
    
    df <- cbind(df, logRT, trialno, response)
    df <- df[complete.cases(df),]
    • This reply was modified 6 years, 6 months ago by blazej.
    blazej
    Participant

    Henrik, thank you very much for the links and some clarifications.

    A couple of remarks:

    This is of course a different model that will produce different results, but this is probably the recommended model (if you ignore response and trialno). For the within-subject design this is the “maximal model justified by the design”, which is recommended following Barr, Levy, Scheeperes and Tily (2013, JML).

    Of course, I’m aware that this is a different model. What I wanted to say, that I’m after something as close (conceptually) as possible to a rmAnova.
    Is it common to reduce the maximal model based on results? I’m not that familiar with any agreement what to do and what to avoid in order to get results allowing tests of interactions within factors of a psychological (social cognitive) experiment.

    Let us first consider response. As each participant should have (in principle) both yes and no responses for each cell of the design, the natural way to extend this model would be to add response as both fixed and random:
    mixed(logRT ~ A * B * C * response + (A * B * C * response | Subject), df)
    I do not immediately see how the issue of imbalance is a huge problem. Of course, balance would be better, but the model should be able to deal with it.

    That’s clear and I already tried this analysis. Unfortunatly I’m getting convergence errors and don’t really know how to deal with them.
    I mentioned imbalance as the way I see it (mostly SPSS rm Anova), if I aggregate my data according to A*B*C*response i will have empty cells. There is no clear pattern between yes and no’s besides the fact, that in my entire sample 30% are NOs and 70% are YESes

    The question of trialno is more difficult. Of cxourse you could simply add this as fixed and random effect as well, for example:
    mixed(logRT ~ A * B * C * response + trialno + (A * B * C * response + trialno | Subject), df)
    However, this would only allow a linear effect. Maybe some other functional form is better. If this is important another type of model such as a GAMM may be better. See:
    Baayen, H., Vasishth, S., Kliegl, R., & Bates, D. (2017). The cave of shadows: Addressing the human factor with generalized additive mixed models. Journal of Memory and Language, 94, 206–234. https://doi.org/10.1016/j.jml.2016.11.006

    I’ll have a look at the article. Thank you. Please let me explain why I want to account for trialno and how I did it before when running rmAnovas in SPSS on previous research:
    1) there is a general tendency to speed up with consequtive trails. Negative correlation betwen trialno and logRT. The degree to which participants speed up varies across subjects.
    2) in order to deal with the above, in previous research I would compute standardized residuals from a lm(logRT~trialno) equation separatly of each subject. This standardized result would make my main dependend variable and replace the orignal log of RTs.
    So again – i’m interested in including trialno just to get something conceptually similar to what I did with SPSS Anovas, but in a mixed model.

    Unfortunately, this is at the moment not really possible with lme4 (which mixed uses).

    One final thought. Your model seems to have quite a lot of data, in this case method = “S” is probably indicated in the call to mixed(). It is faster and very similar.

    Models on original data indeed take a lot of time and I already figured out that “S” is faster then anything else.

    I saw you commented on CV – if that’s of any value (for you or CV community) I can rephrase that question the way I did here, allowing you a normal answer that I can eventually accept.
    Speaking of my original post on CV – what’s your view on the syntax proposed by A.Field in his book, where factors defining the repeated measure structure of data are specified as lme(logRT~ 1, random = ~1|Subject/A/B/C, data = df, method = "ML")?
    Same syntax suggestion can be found here and it confuses me a lot.
    Are those guys wrong? Or is this specific to nlme syntax and basically random = ~1|Subject/A/B/C means the same as (A*B*C|Subject) in lme4?
    Or maybe it’s because Field is not thinking about a repeaed measure experiment as having random slopes? But if that’s the case – maybe I should’t do it with my data as well…

    Once again thank you for your time and helping me out.

    • This reply was modified 6 years, 6 months ago by blazej.
    blazej
    Participant

    As a follow up here is what I tested so far:

    The main analysis – a balanced full factorial of 2x2x2 factors mixed(logRT ~ A * B * C + (A * B * C | Subject), df)

    Can I assume that this is a mixed model “equivalent” of a “normal” repeated measure anova (as done with SPSS) computed on aggregated data (wide format with one mean logRT for each combination of A * B * C) ?
    If not, what syntax would give me something similar (conceptually), considering natural differences between analysis (mixed vs rm anova) ?

    How can I extend the above syntax to account for trialnumber (a tendency to have a negative correlation between trialno and logRT)?

    What would be considered “best practice” if I wanted to add response as a separate factor making it a A * B * C * response design and considering that adding response breaks the balance of A*B*C? I’m not sure how to approach both fixed and random parts for this.

    Should I consider any non default covariance structures for both random and fixed effects? I’ve been advised to use compound symmetry for the fixed part – but to be honest I don’t know any reason for it, nor I can explain differences between various structures (avaible for example in SPSS procedure MIXED)

    I’m really stuck here and would love to avoid guessing, eventually describing my results and analysis with decent understanding of the theoretical background of what I did.

    Thank you 🙂

    • This reply was modified 6 years, 6 months ago by blazej.
    • This reply was modified 6 years, 6 months ago by blazej.
Viewing 5 posts - 1 through 5 (of 5 total)