Forum Replies Created

Viewing 25 posts - 51 through 75 (of 86 total)
  • Author
    Posts
  • henrik
    Keymaster

    Short answer: You have to set factorize = FALSE.

    Longer answer: afex‘s default behavior is to assume that the user wants an ANOVA and so transforms all variables to factors. This is what likely happens in your case. To turn this off, use the factorize argument.

    The help page (?aov_car) further specifies:

    To run an ANCOVA you need to set factorize = FALSE and make sure that all variables have the correct type (i.e., factors are factors and numeric variables are numeric and centered).

    This is most likely what you want to do.

    in reply to: When to uncorrelate random effects #205
    henrik
    Keymaster

    There probably are research question that explicitly call for suppressed correlation estimate. However, they are mostly suppressed to avoid convergence problems. We wrote in our chapter on mixed models:

    For the limited sample sizes that are common in psychology and related disciplines a common problem is that the maximal model is not fully identified (Bates, Kliegl, Vasishth, & Baayen, 2015), especially for mixed models with complicated random effects structures. Even though the optimization algorithm converges to the optimum (i.e., the maximum-likelihood estimate for a given data set) the variance-covariance matrix of the random effects parameters at the optimum is degenerate or singular. At least for models estimated with lme4 this is often signified by convergence warnings. Other signs of singular fits are variance estimates of or near zero and correlation estimates of $\pm 1$. The occurrence of such situations is due to the fact the parameters associated to random effects (e.g., $\sigma^2_{S_\delta}$) are more difficult to estimate than fixed effects (e.g., $\beta_{\delta}$).

    The important part for your question is the last sentence. Random-effects parameters are more difficult to estimate than fixed-effects. We need considerably more data to estimate a variance or a correlation parameter than a simple fixed-effect. Furthermore, correlations require even more data than variances.

    Therefore, in case a model shows convergence problems (which can be indicated by warnings or problematic parameter values as mentioned in the quote) a good first strategy to address these problems is to remove the correlation among random-effects parameters. We do this because these are the most difficult to estimate and they are often not of primary interest. Also removing them, often comes with comparatively little cost in terms of model precision.

    More details and some really instructive examples of this are given in the “Parsimonious Mixed Models” paper by Bates, Kliegl, Vasishth and Baayen (2015): https://arxiv.org/abs/1506.04967
    See especially Figures 3 and 5.

    There is also some relevant discussion on stats.stackexchange: https://stats.stackexchange.com/a/323341/442

    in reply to: Printing tables with sjt.lmer -function #203
    henrik
    Keymaster

    The output from summary() for mixed() objects is identical to the output of summary() for lmer() objects (this is why you get output on the level of the parameters and not at the level of model terms/effects). So if this is possible for normal lmer() models it is possible here as well (if it does make sense in your case is not something I can answer). If you want to get at the contents of the summary output, you can save it and work with it:

    summ <- summary(fit)
    as.data.frame(summ$varcor)
    summ$coefficients
    

    The problem when using expand_re = TRUE is that it creates a new data.frame that is used for fitting. When trying to access this, this fails. There are several ways around this.
    1. You could avoid accessing it. emmeans only tries to access it, if the lmer.df are "satterthwaite":

    library("afex")
    data("Machines", package = "MEMSS") 
    m1 <- mixed(score ~ Machine + (Machine  || Worker), data=Machines, expand_re = TRUE)
    
    emm_options(lmer.df = "kenward-roger")
    emmeans(m1, "Machine")  #works
    #  Machine   emmean       SE    df lower.CL upper.CL
    #  A       52.35556 2.397324  7.35 46.74087 57.97024
    #  B       60.32222 2.633283  9.04 54.36897 66.27548
    #  C       66.27222 2.836796 11.60 60.06746 72.47698
    # 
    # Degrees-of-freedom method: kenward-roger 
    # Confidence level used: 0.95 
    
    emm_options(lmer.df = "satterthwaite")
    emmeans(m1, "Machine") #fails
    # Error in eval(predvars, data, env) : object 're1.Machine1' not found
    
    emm_options(lmer.df = "asymptotic")
    emmeans(m1, "Machine") #works
    #  Machine   emmean       SE  df asymp.LCL asymp.UCL
    #  A       52.35556 2.397324 Inf  47.65689  57.05422
    #  B       60.32222 2.633283 Inf  55.16108  65.48336
    #  C       66.27222 2.836796 Inf  60.71220  71.83224
    # 
    # Degrees-of-freedom method: asymptotic 
    # Confidence level used: 0.95

    2. You can overwrite your existing data.framewith one that contains the missing columns (note that this might change the number of rows if you have missing data). I would only do this on a copy of my real data:

    m1 <- mixed(score ~ Machine + (Machine  || Worker), data=Machines, expand_re = TRUE)
    Machines <- mixed(score ~ Machine + (Machine  || Worker), data=Machines, expand_re = TRUE, return = "data")
    emm_options(lmer.df = "satterthwaite")
    emmeans(m1, "Machine") # works now
    #  Machine   emmean       SE    df lower.CL upper.CL
    #  A       52.35556 2.397324  7.35 46.74094 57.97017
    #  B       60.32222 2.633283  9.04 54.36898 66.27546
    #  C       66.27222 2.836796 11.56 60.06544 72.47901
    # 
    # Degrees-of-freedom method: satterthwaite 
    # Confidence level used: 0.95 
    in reply to: Printing tables with sjt.lmer -function #199
    henrik
    Keymaster

    I am sorry, but I do not understand what you want to back-transform into what. In general, a reproducible example (e.g., http://stackoverflow.com/q/5963269/289572) helps a lot in this regard. But you at least have to show the output you have and then tell me exactly what you want.

    in reply to: Printing tables with sjt.lmer -function #196
    henrik
    Keymaster

    In general, I would give up on the idea of having something like a standardized effect size on the level of the effects (i.e. model terms) in mixed models. They generally do not appear to work as soon as a model includes random slopes or multiple random-effects grouping factors. Same applies for something like R-squared.

    Confidence intervals can be obtained on the level of the cell means (i.e., factor level or linear combinations thereof) using emmeans. For linear mixed models emm_options(lmer.df = ...) allows you to choose how the degrees of freedoms are calculated which can have dramatic consequences on the speed. Three options exist: lmer.df = c("kenward-roger", "satterthwaite", "asymptotic").

    It is not always trivial how to combine the table of effects with a measure of effect size, but one can sometimes find ways. For example see Table 2 in: http://singmann.org/download/publications/2015_kellen_singmann_vogt_klauer.pdf

    in reply to: Printing tables with sjt.lmer -function #194
    henrik
    Keymaster

    I do not know exactly which package r.squaredGLMM comes from, but I expect that, similar to confint, it also works on the level of the model parameters and not the level of the model effects (or ‘model terms’). So they are in principle incompatible with afex. If in your case all effects only consist of one parameter, then you can of course use them.

    In such a case, the simplest way would be to add these values to the data.frames returned by either nice() or anova(). Continuing from the previous example:

    df1 <- nice(m1)
    df1$new_col <- 2
    df1
    # Mixed Model Anova Table (Type 3 tests, KR-method)
    # 
    # Model: score ~ Machine + (Machine | Worker)
    # Data: Machines
    #    Effect   df        F p.value new_col
    # 1 Machine 2, 4 32.80 **    .003       2
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
    
    df2 <- anova(m1)
    df2$new_col <- 2
    df2
    # Mixed Model Anova Table (Type 3 tests, KR-method)
    # 
    # Model: score ~ Machine + (Machine | Worker)
    # Data: Machines
    #         num Df den Df      F    Pr(>F) new_col
    # Machine      2      4 32.803 0.0033024       2

    As you can see, both functions simply return a data.frame for which specific print methods exists. But you can simply add additional columns. If you want to remove the specific print methods, simply wrap the returned object in a as.data.frame() call. For example:

    as.data.frame(df2)
    #         num Df den Df        F      Pr(>F) new_col
    # Machine      2      4 32.80302 0.003302373       2

    Note that you can also get confidence intervals via the methods implemented in emmeans, which work nicely with afex functions, see: https://cran.r-project.org/package=emmeans

    in reply to: Printing tables with sjt.lmer -function #192
    henrik
    Keymaster

    I do not think afex and sjt. or sjp. functions are compatible. The main reason is that the sj functions focus on the standard lmer output and their fixed-effects parameters. In contrast, afex focuses on tests of effects. In many cases, effects involve more than one parameter.

    However, afex involves many functions that support nice printing of the "mixed" objects returned from mixed. Mainly nice() and anova(). See the following example:

    library("afex")
    data("Machines", package = "MEMSS") 
    
    # simple model with random-slopes for repeated-measures factor
    m1 <- mixed(score ~ Machine + (Machine|Worker), data=Machines)
    m1
    # Mixed Model Anova Table (Type 3 tests, KR-method)
    # 
    # Model: score ~ Machine + (Machine | Worker)
    # Data: Machines
    #    Effect   df        F p.value
    # 1 Machine 2, 4 32.80 **    .003
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
    
    nice(m1)
    # Mixed Model Anova Table (Type 3 tests, KR-method)
    # 
    # Model: score ~ Machine + (Machine | Worker)
    # Data: Machines
    #    Effect   df        F p.value
    # 1 Machine 2, 4 32.80 **    .003
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1
    
    knitr::kable(nice(m1))
    # |Effect  |df   |F        |p.value |
    # |:-------|:----|:--------|:-------|
    # |Machine |2, 4 |32.80 ** |.003    |
    
    anova(m1)
    # Mixed Model Anova Table (Type 3 tests, KR-method)
    # 
    # Model: score ~ Machine + (Machine | Worker)
    # Data: Machines
    #         num Df den Df      F   Pr(>F)   
    # Machine      2      4 32.803 0.003302 **
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    knitr::kable(anova(m1))
    # |        | num Df| den Df|        F|    Pr(>F)|
    # |:-------|------:|------:|--------:|---------:|
    # |Machine |      2|      4| 32.80302| 0.0033024|

    Please let me know if this is not enough. If you definitely want to use the sj family of functions you need to contact the maintainer and ask if he is willing to support afex functions.

    in reply to: defining model with aov_car #190
    henrik
    Keymaster

    AFAIK, the problem of the sphericity tests is not so much related to the number of conditions or their levels (as long as you have more participants than cells of the design), but probably more bad luck with your data. You might try to randomly remove one or two participants to see if this still happens. However, this is not part of afex so might also be wrong.

    For balanced designs where the contrasts are orthogonal (as is always the case for repeated measures ANOVA via car::Anova()) ignoring individual effects does not change the other effects. Hence you can do it here. However, should one of the previously mentioned conditions (balance and orthogonality) not be met, omitting one of the effects from a model changes the other effects. This could be what your thesis committee had in mind. Furthermore, if a test is always at least implicitly part of the model then why omit it? If its inclusion does not change the model, knowing it results does not hurt.

    in reply to: defining model with aov_car #185
    henrik
    Keymaster

    You can simply and safely ignore the tests that you are not interested in. A repeated-measures ANOVA as run with car::Anova() (this is exactly what afex uses) **always** tests all effects (so this is a consequence of car::Anova() and not of afex). I know this because I asked John Fox (the author of car) a very similar question some years ago and he responded with:

    The within-subjects contrasts are constructed by Anova() to be orthogonal in the row-basis of the design, so you should be able to safely ignore the effects in which (for some reason that escapes me) you are uninterested.

    The full quote can be seen at the following link which also provides link to the full exchange:
    https://stat.ethz.ch/pipermail/r-help/2012-July/319143.html

    I also think there is not much you can do if the sphericity tests fail for your data (at least I do not know what, these tests are also all done by car).

    Hope that helps!

    in reply to: plotting models created by lmer_alt functions #181
    henrik
    Keymaster

    emmip() is not part of afex, but part of the emmeans package. So I am not the one that can provide adequate support. I am also not sure what it is supposed to do in this case or if it even supports numerical covariates on the x-axis.

    In general, lmer_alt returns the same object as lmer or glmer. In the default setting it returns objects of class merModLmerTest for lmer objects. If you prefer the ‘true’ merMod objects returned from lme4 in this case you can change this via:
    afex_options(lmer_function = "lme4") (but note that then method = "KR" and "S" stop working).

    mixed returns objects of class mixed, for which the slot full_model contains the same model oject as lmer_alt.

    Thus, objects returned from lmer_alt (or the full_model slot of mixed objects) should in principle work with all functions which work with merMod objects. For example for effects:

    library("afex")
    library("effects")
    data("Machines", package = "MEMSS") # some example data
    
    m1 <- mixed(score ~ Machine + (Machine  || Worker), data=Machines, expand_re = TRUE)
    
    # when using effects with mixed we need to set contr.sum globally:
    set_sum_contrasts()
    plot(Effect("Machine", m1$full_model))
    # equal to:
    emmeans(m1, "Machine")
    
    ## compare again (now gives wrong results):
    set_treatment_contrasts()
    plot(Effect("Machine", m1$full_model))
    

    In principle this should also work for objects returned from lmer_alt. However it does not right now, because it does not set the data argument correctly (I will correct this in the next version and on github). For now the following works:

    m2 <- lmer_alt(score ~ Machine + (Machine  || Worker), data=Machines)
    m2@call[["data"]] <- as.name("Machines")
    plot(Effect("Machine", m2))
    in reply to: Interpretation of Logistic Mixed Interactions #177
    henrik
    Keymaster

    To be able to provide a more useful answer to your question, I have to ask you why are you looking at the fixed effects in the first place? What is your goal with this?

    One of the design goals of afex is to hide the fixed effects from the user, as they are not particularly useful in cases such as yours (i.e., factorial designs with factors with more than two levels). If your goal is to test specific hypotheses on the level of the factor levels do so using the functionality provided by the lsmeans (or from afex v. 0.19.1 onwards using emmeans). This is described in some detail in the ANOVA vignette and applies in exactly the same way to mixed models as to ANOVA models. For more details on how to set up contrasts (or post-hoc tests) you are interested in, have a lookj at the great vignettes of emmeans: https://cran.r-project.org/package=emmeans

    emmeans also automatically calculates CIs for you. This is certainly possible, but needs to be done on the unconstrained (i.e., logistic) scale. Only after calculating the CIs can the values be transformed on the response scale (i.e., probabilities). For example in your case, compare:

    emmeans(full_mixed, c("Method", "Levels"))
    emmeans(full_mixed, c("Method", "Levels"), type = "response")

    Note that you can simply specify any type of contrasts on these estimated marginal means. Several examples of this for logistic mixed models are provided in the supplemental material of a recent paper of mine (https://doi.org/10.1016/j.cogpsych.2017.09.002), which can be found at: https://osf.io/3uajq/

    One final thought, as discussed in length by Barr, Levy, Scheepers, and Tily (2013), you most likely do not want a random-intercept only model. Anything they write in this paper applies in the same manner to logistic mixed model. The recommended maximal model has random-slopes for all within-subjects factors (e.g., Correct ~ Method * Levels * MATB + (Method * Levels * MATB | SubjectID)). The model you estimated likely produces anti-conservatie results.

    All I have discussed here is also discussed in more length in our recent chapter: http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf
    Again, this deals with linear mixed models, but the parts relevant to your question apply in the same manner to generalized linear mixed models.

    in reply to: Estimated marginal means don't match SPSS #171
    henrik
    Keymaster

    As can be seen from the output, the difference is whether or not one wants to assume equal variances for the follow-up tests (as does lsmeans) or nor (as does SPSS).

    One can almost replicate the results from SPSS in R, by running independent analyses for each difference. I demonstrate this here for the w1 - w2 contrast:

    library("tidyr")
    
    dw <- spread(d, factor_within, value)
    dw$d1 <- dw$w1 - dw$w2
    
    summary(lm(d1 ~ 1, dw))
    # Call:
    # lm(formula = d1 ~ 1, data = dw)
    # 
    # Residuals:
    #     Min      1Q  Median      3Q     Max 
    # -47.296  -6.296  -1.296   8.204  36.704 
    # 
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)   
    # (Intercept)   10.296      3.016   3.414  0.00211 **
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    # 
    # Residual standard error: 15.67 on 26 degrees of freedom 

    Note that to get the standard errors, I rely on the fact that a within-subject contrast of two means is simply the test on whether the difference differs from zero. And this can be tested with a linear model with an Intercept only. I do not know how to get the standard error with t.test. As can be seen, the estimate is again identical and now the standard error is pretty close to the one reported (this also holds for the other two contrasts). I am not sure why the standard error is not identical, because I did a similar analysis some time ago in which it was. Maybe this has something to do with your specific SPSS call (e.g., Polynomial). In any case, the p-value then still needs to be corrected.

    What should be preferred is not a simple question. One can make arguments for both decisions.

    Independent tests (t-tests or ANOVAs depending on the follow-up test) is the approach recommended in the Maxwell and Delaney stats book (which is a great book). IIRC, their main argument is basically that the equal variance assumption is likely false in real psychological situations.

    I do not think this is a very strong argument (why?). In contrast, I think that in cases with limited data the positive effect of pooling on precision of the variance estimate seems to be a stronger argument for pooling. That is why I usually use lsmeans on the ANOVA result. In addition, it makes everything quite easy because one can program all tests directly on the ANOVA output.

    As many time in statistics, this is one of the situations where you as the user has to make an informed decision.

    in reply to: tbl_df vs data.frame #170
    henrik
    Keymaster

    Your question unfortunately presents very little detail to understand what the actual problem is. I would guess it has something to do with the data types of the variables passed to mixed. Like most R modeling functions it behaves differently for factors versus numerical variables. You can check the type of your data with str(), e.g., str(my_data).

    Also, mixed should behave completely identical for data.frame and tbl_df as it converts the data to a data.frame (see line 164 at: https://github.com/singmann/afex/blob/master/R/mixed.R). If your could show that this is not the case, this would be a bug.

    If you want a more precise answer, provide a reproducible example: http://stackoverflow.com/q/5963269/289572

    in reply to: PB p-values for a poisson glmm with an offset? #160
    henrik
    Keymaster

    As the help page says, ‘method = "PB" calculates p-values using parametric bootstrap’. This means, it calculates a reference distribution of chi^2-square values assuming the null model is true, by generating data from the full model and then fitting both models (i.e., full and restricted model) to it. For each of those fits the chi-square value is obtained. The p-values is then the proportion of sampled chi-square values larger than the actual chi-square value based on the real data (it actually is the number of samples larger plus one divided by the number of samples). Because of these repeated fits you also get the repeated non-convergence warnings.

    Per default it uses 1000 samples for this per test. This means, it has to be considerably slower than just fitting in glmer. In fact, roughly 1000 times the number of effects in your model slower. You can change the number of parametric bootstrap samples via the args_test and nsim, e.g., args_test = list(nsim = 500). However, values lower than 500 seem suspicious. Note that you can also distribute the estimation using multiple cores (see examples of mixed help).

    If you do not have the time to wait, you can also use method = "LRT". Note however that for this you should have enough levels for each of your random-effects grouping factors (more than 30, better more than 50). Otherwise the results might be anti-conservative.

    in reply to: PB p-values for a poisson glmm with an offset? #153
    henrik
    Keymaster

    Can you provide some example data? Without being able to reproduce the problem I am unable to fix it.

    The example data can be as trivial as possible. See the following link for advise on how to create it: https://stackoverflow.com/q/5963269/289572

    • This reply was modified 6 years, 5 months ago by henrik.
    in reply to: Sum coding for ANOVA style results with unbalanced lmer data #147
    henrik
    Keymaster

    Question 1: In principle yes. If the imbalance is only small and random then go ahead. If the imbalance is however structural and provides information (e.g., you tried to sample more in this cell but it was more complicated than the other condition and participants dropped out), then it is time to think about it more (you should probably use Type 2 sums of squares then). In any case, with sum contrasts and unbalanced groups, the intercept simply represents the unweighted grand mean (i.e., the mean of the cell means and not the overall grand mean).

    Question 2: I am not sure what produces this warning in this case. To fully understand what is going on, I need a reproducible example (see here). Note that the output indicates that both Coherence and Finitness have two levels, which seems to be intended. You might try to run droplevels() on the data before fitting, maybe this removes it.
    Also note that it should be method = 'S' (and not method = afex_options("S")).

    Hope that helps.

    in reply to: warning epsilon > 1 for HF correction #146
    henrik
    Keymaster

    This warning hopefully only appears if you actually use HF as correction. It indicates that the Huynh-Feldt epsilon is above 1. This is the correction factor applied to the degrees of freedom for violations of sphericity. Values above 1 do not make any sense, as the df are multiplied with this correction factor and should not be above the uncorrected df (which would appear for values above 1). Thus, for epsilons above 1, 1 is used. See also: https://en.wikipedia.org/wiki/Mauchly%27s_sphericity_test

    You did not include a reproducible example, so I am not sure if this appears incorrectly. In any case, this is not a critical warning and simply passed on from summary.Anova.mlm which produces it.

    in reply to: Random effect equivalence #141
    henrik
    Keymaster

    For models estimated with maximum likelihood estimation the parameterization does in general not matter (this obviously breaks down in certain cases as seen for m6 because of differences in hierarchical shrinkage, but is approximately true).

    In contrast, for models estimated with restricted maximum likelihood estimation (i.e., REML = TRUE) the random effect estimates (i.e., the so-called conditional modes which are not actual parameters here) are obtained independently from the fixed effects estimates. If I recall correctly, the random effects are estimated first and then the fixed effects. Here the differences in hierarchical shrinkage play a substantively larger role as they are estimated independently and we basically always see differences due to parameterization. Note that in the grand scheme of things these differences are still comparatively small.

    The individual deviations or offsets estimated for each random effect parameter are estimated to be normally distributed around the fixed effect parameter, given the parameterization of the model. For example, for a model with treatment contrasts the offsets for the intercept (i.e., the random intercept) are distributed around the first factor level. The offsets for the random slopes are distributed around the deviation from the first factor level. In contrast, for a model with contr.sum parameterization the random intercepts are distributed around the grand mean and the random slopes are distributed around the differences from the grand mean.

    These differences in parameterization can lead to different results if shrinkage is taken into account. For example, the estimate for a participant with an extreme effect for the first factor level would receive comparatively large shrinkage for the random intercept with treatment contrasts. This would then affect all other random effect estimates as well (as the intercept is always included). With a different parameterization the shrinkage on the random intercept would be lower and the other factor levels thus less influenced by the large outlier for one case.

    in reply to: Random effect equivalence #134
    henrik
    Keymaster

    The slides are a little bit old, so I am not sure if what Doug Bates said in 2008 still holds in the current lmer version. However, it is nevertheless worth while to think about what each model actually estimates. Let us do this with some random data generated based on a stats.stackexchange question from some time ago.

    We first create some data.

    require(afex)
    
    set.seed(666)
    d <- data.frame(
        y = rnorm(96),
        group = factor(rep(1:12, 4)),
        factor = factor(rep(1:2, each=24)))
    

    Then we fit a bunch of different models. Note that we set REML = FALSE or equivalently method = "LRT" for mixed. That makes the interpretation somewhat easier.

    m1 <- lmer(y ~ factor + (0 + factor|group), d, REML = FALSE)
    summary(m1)
    # Random effects:
    #  Groups   Name    Variance Std.Dev. Corr
    #  group    factor1 0.04164  0.2041       
    #           factor2 0.06435  0.2537   0.79
    #  Residual         1.02127  1.0106       
    # Number of obs: 96, groups:  group, 12

    Model one estimates two random-effects variances (one for each level of factor) and one correlation.

    m2 <- lmer(y ~ factor + (1|group) + (1|group:factor), d, REML = FALSE)
    summary(m2)
    # Random effects:
    #  Groups       Name        Variance Std.Dev.
    #  group:factor (Intercept) 0.01216  0.1103  
    #  group        (Intercept) 0.04083  0.2021  
    #  Residual                 1.02127  1.0106  
    # Number of obs: 96, groups:  group:factor, 24; group, 12

    In contrast, model 2 only estimates two random effects variances. However, the residual variance seems to stay the same

    We can extend this and look at a few more models.

    m3 <- mixed(y ~ factor + (factor|group), d, method = "LRT")
    summary(m3)
    # Random effects:
    #  Groups   Name        Variance Std.Dev. Corr 
    #  group    (Intercept) 0.046914 0.21660       
    #           factor1     0.006082 0.07799  -0.34
    #  Residual             1.021271 1.01058       
    # Number of obs: 96, groups:  group, 12

    m3 uses contr.sum (because that is the default with mixed) but is otherwise similar to m1: 2 variances and 1 correlation. The residual variance is still the same.

    m4 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, method = "LRT")
    summary(m4)
    # Random effects:
    #  Groups   Name        Variance Std.Dev.
    #  group    (Intercept) 0.046914 0.21660 
    #  group.1  re1.factor1 0.006082 0.07799 
    #  Residual             1.021271 1.01058 
    # Number of obs: 96, groups:  group, 12

    m4 now is similar to m2, two variances only. However, m4 has the same parameter estimates as m3 for the variances.

    m5 <- mixed(y ~ factor + (0 + factor|group), d, method = "LRT")
    summary(m5)
    # Random effects:
    #  Groups   Name    Variance Std.Dev. Corr
    #  group    factor1 0.04164  0.2041       
    #           factor2 0.06435  0.2537   0.79
    #  Residual         1.02127  1.0106       
    # Number of obs: 96, groups:  group, 12

    m5 is identical to m1 (in terms of the random effects). This is maybe somewhat surprising because it uses contr.sum instead of contr.treatment.

    m6 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, check_contrasts = FALSE, method = "LRT")
    summary(m6)
    # Random effects:
    #  Groups   Name        Variance Std.Dev.
    #  group    (Intercept) 0.04107  0.2027  
    #  group.1  re1.factor2 0.02328  0.1526  
    #  Residual             1.02156  1.0107  
    # Number of obs: 96, groups:  group, 12

    Finally, m6 uses the default contrasts again (contr.treatment), and also estimates two variances.

    Now let us take a look at the overall model fit. First for all models that estimate only the variances:

    anova(m2, m4$full_model, m6$full_model)
    # Data: d
    # Models:
    # m2: y ~ factor + (1 | group) + (1 | group:factor)
    # m4$full_model: y ~ factor + ((1 | group) + (0 + re1.factor1 | group))
    # m6$full_model: y ~ factor + ((1 | group) + (0 + re1.factor2 | group))
    #               Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
    # m2             5 288.77 301.59 -139.39   278.77                             
    # m4$full_model  5 288.77 301.59 -139.39   278.77 0.0000      0  < 2.2e-16 ***
    # m6$full_model  5 288.75 301.58 -139.38   278.75 0.0165      0  < 2.2e-16 ***
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Somewhat surprisingly, these models all provide essentially the same account (if we are graceful with m6). The same holds if we look at all models that include the correlation. They are all the same as well.

    anova(m1, m3$full_model, m5$full_model)
    # Data: d
    # Models:
    # m1: y ~ factor + (0 + factor | group)
    # m3$full_model: y ~ factor + (factor | group)
    # m5$full_model: y ~ factor + (0 + factor | group)
    #               Df    AIC    BIC  logLik deviance Chisq Chi Df Pr(>Chisq)    
    # m1             6 290.75 306.14 -139.38   278.75                            
    # m3$full_model  6 290.75 306.14 -139.38   278.75     0      0          1    
    # m5$full_model  6 290.75 306.14 -139.38   278.75     0      0     <2e-16 ***

    In general it should holds that all models without the correlation provide the same account and all models with correlation provide the same account (as long as REML = FALSE for both cases), independent of their exact parameterization (i.e., set_sum_contrasts plays no role here; this is only relevant for interactions of fixed effects). I am not sure what is happening with m6 here, but let us ignore this for now. I think from this it follows that m1 and m2 are equivalent, if the correlation is 0. I am not sure if this is implied by Doug Bates quote, but is what I take from it.

    Independent of the parameterization, models with two variances allow each individual level of group to have a random intercept and each level of group to have their own idiosyncratic effect of factor. And these effects are uncorrelated. If we now add the correlations, we get a slightly better fit because they do not appear to be uncorrelated.

    Interestingly, this also holds if we compare m6 (without correlation) with the equivalently parameterized model m3 (with correlation):

    deviance(m6$full_model) - deviance(m3$full_model)
    # [1] 3.211216e-05

    Note that only one random effect also provides a worse account.

    m7 <- lmer(y ~ factor + (1|group:factor), d, REML = FALSE)
    deviance(m7)
    # [1] 278.9834
    in reply to: expand_re with missing values #128
    henrik
    Keymaster

    It is not advised to use any afex functions with missing data. Perhaps more generally, it is not advised for any standard (i.e., part of the stats package) R modeling function to have missing data. All those functions simply remove missing data on a casewise basis and it is a good idea to remove them before running any model. Otherwise you might get the false sense that all your data was used for fitting, whereas those observations with missing missing values were in fact simply removed.

    The way in which mixed works (i.e., in many situations it first creates several model matrices that are then fitted separately) somewhat complicates this. If the different model matrices have different numbers of rows due to missing values the ensuing models are non-commensurable and cannot be used for hypothesis testing. This can also appear when using expand_re = TRUE as this also creates a new model matrix (as the argument implies, it first expands the random effects structure into a new model matrix).

    The point of this warning is simply that the way in which the creation of the new model matrix deals with missing values cannot be guaranteed to be the same as is done in g/lmer. This also applies to both lmer_alt and mixed. Hence, I highly advise to remove missing values before fitting a model if they appear in any variables that are part of the model (i.e., for for dependent variables, fixed-, and random-effects). I would go so far to suggest this even when using modeling functions outside of afex unless they explicitly deal with missing values (e.g., multiple imputation, but note that these also fit models where each data set has no missings as they are imputed in each case).

    I hope this clears it up.

    in reply to: lsmeans compatibility with mixed() and type 2 tests #122
    henrik
    Keymaster

    Okay, my bug fix did not work for models with interactions. I have provided a new bug fix that should work now, simply download the newest github version.

    The last error (Error in calculation of the Satterthwaite's approximation. The output of lme4 package is returned) however has nothing to do with afex. This is a problem of lsmeans. You need to switch how the df are calculated:

    lsm.options(lmer.df = "Kenward-Roger") # set df for lsmeans to KR
    lsm.options(lmer.df = "Satterthwaite") # the default (does not work in some cases)
    lsm.options(lmer.df = "asymptotic") # the fastest, no df
    in reply to: lsmeans compatibility with mixed() and type 2 tests #120
    henrik
    Keymaster

    Can you please post example code that triggers the error? I never use type II tests so I do not understand what you are up to. If I can recreate the error I will most likely be able to fix it in the github version.

    in reply to: lsmeans compatibility with mixed() and type 2 tests #115
    henrik
    Keymaster

    That is a indeed a bug when passing mixed objects with type=2 and method="LRT" (or "PB"). It is now corrected in the development version of afex which can be installed via: devtools::install_github("singmann/afex@master")

    If it is not possible to use the development version, a work-around along the following lines should work:

    
    require(afex)
    data(md_16.4)
    (mixed2 <- mixed(induct ~ cond + (1|room:cond), md_16.4, type=2, method="LRT"))
    lsmeans(mixed2$full_model[[1]], specs = c("cond"))
    lsmeans(mixed2$full_model[[1]], ~cond)
    lsmeans(mixed2$full_model[[1]], pairwise ~ cond)$contrast

    As you can see, we simply directly pass the full model. The problem for type=2 is that in case in which nested model comparisons are used (e.g., when method = "LRT") the full_model slot is a list of lists, so we need to use the first element of the slot.

    Thanks again for the report.

    in reply to: Mixed effects ANOVA for Latin square designs #107
    henrik
    Keymaster

    I cannot see the figure, you need to add it in another way.

    Are you sure you are using the latest version of afex and R? Does it also occur with other methods (e.g., method = "S")? If yes to both, this is an error I haven’t seen so far and I will not be able to help you without reproducing it (i.e., I will need the data).

    in reply to: Mixed effects ANOVA for Latin square designs #105
    henrik
    Keymaster

    I am not sure why you need the anova() function, as the return output of mixed() should already give you the Fand p values (although you can use anova() to get a specific ANOVA table). Are you sure that all the variables are of the correct type (i.e., factors are in fact factors, you can check this with str())? Are there any warnings?
    Perhaps it would be best to post the output from mixed().

    Regarding the simulation, I think about one where all your fixed effects have no effect. You simulate data from the Null model. So you generate data that is structurally identical to your data in all regards (same factor levels and number of participants), but has no effect besides the random effects, and then fit it with the model you use here. You do this a lot of times (e.g., 1000) and see how often the p-value is significant. Should be 5% of the times. Similar as done here: singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/

Viewing 25 posts - 51 through 75 (of 86 total)