Forum Replies Created
-
AuthorPosts
-
February 21, 2018 at 17:24 GMT+0000 in reply to: Some parameters are not estimable, most likely due to empty cells of the design #207
henrik
KeymasterShort answer: You have to set
factorize = FALSE
.Longer answer:
afex
‘s default behavior is to assume that the user wants an ANOVA and so transforms all variables to factors. This is what likely happens in your case. To turn this off, use thefactorize
argument.The help page (
?aov_car
) further specifies:To run an ANCOVA you need to set
factorize = FALSE
and make sure that all variables have the correct type (i.e., factors are factors and numeric variables are numeric and centered).This is most likely what you want to do.
henrik
KeymasterThere probably are research question that explicitly call for suppressed correlation estimate. However, they are mostly suppressed to avoid convergence problems. We wrote in our chapter on mixed models:
For the limited sample sizes that are common in psychology and related disciplines a common problem is that the maximal model is not fully identified (Bates, Kliegl, Vasishth, & Baayen, 2015), especially for mixed models with complicated random effects structures. Even though the optimization algorithm converges to the optimum (i.e., the maximum-likelihood estimate for a given data set) the variance-covariance matrix of the random effects parameters at the optimum is degenerate or singular. At least for models estimated with lme4 this is often signified by convergence warnings. Other signs of singular fits are variance estimates of or near zero and correlation estimates of $\pm 1$. The occurrence of such situations is due to the fact the parameters associated to random effects (e.g., $\sigma^2_{S_\delta}$) are more difficult to estimate than fixed effects (e.g., $\beta_{\delta}$).
The important part for your question is the last sentence. Random-effects parameters are more difficult to estimate than fixed-effects. We need considerably more data to estimate a variance or a correlation parameter than a simple fixed-effect. Furthermore, correlations require even more data than variances.
Therefore, in case a model shows convergence problems (which can be indicated by warnings or problematic parameter values as mentioned in the quote) a good first strategy to address these problems is to remove the correlation among random-effects parameters. We do this because these are the most difficult to estimate and they are often not of primary interest. Also removing them, often comes with comparatively little cost in terms of model precision.
More details and some really instructive examples of this are given in the “Parsimonious Mixed Models” paper by Bates, Kliegl, Vasishth and Baayen (2015): https://arxiv.org/abs/1506.04967
See especially Figures 3 and 5.There is also some relevant discussion on stats.stackexchange: https://stats.stackexchange.com/a/323341/442
henrik
KeymasterThe output from
summary()
formixed()
objects is identical to the output ofsummary()
forlmer()
objects (this is why you get output on the level of the parameters and not at the level of model terms/effects). So if this is possible for normallmer()
models it is possible here as well (if it does make sense in your case is not something I can answer). If you want to get at the contents of the summary output, you can save it and work with it:summ <- summary(fit) as.data.frame(summ$varcor) summ$coefficients
The problem when using
expand_re = TRUE
is that it creates a newdata.frame
that is used for fitting. When trying to access this, this fails. There are several ways around this.
1. You could avoid accessing it.emmeans
only tries to access it, if thelmer.df
are"satterthwaite"
:library("afex") data("Machines", package = "MEMSS") m1 <- mixed(score ~ Machine + (Machine || Worker), data=Machines, expand_re = TRUE) emm_options(lmer.df = "kenward-roger") emmeans(m1, "Machine") #works # Machine emmean SE df lower.CL upper.CL # A 52.35556 2.397324 7.35 46.74087 57.97024 # B 60.32222 2.633283 9.04 54.36897 66.27548 # C 66.27222 2.836796 11.60 60.06746 72.47698 # # Degrees-of-freedom method: kenward-roger # Confidence level used: 0.95 emm_options(lmer.df = "satterthwaite") emmeans(m1, "Machine") #fails # Error in eval(predvars, data, env) : object 're1.Machine1' not found emm_options(lmer.df = "asymptotic") emmeans(m1, "Machine") #works # Machine emmean SE df asymp.LCL asymp.UCL # A 52.35556 2.397324 Inf 47.65689 57.05422 # B 60.32222 2.633283 Inf 55.16108 65.48336 # C 66.27222 2.836796 Inf 60.71220 71.83224 # # Degrees-of-freedom method: asymptotic # Confidence level used: 0.95
2. You can overwrite your existing
data.frame
with one that contains the missing columns (note that this might change the number of rows if you have missing data). I would only do this on a copy of my real data:m1 <- mixed(score ~ Machine + (Machine || Worker), data=Machines, expand_re = TRUE) Machines <- mixed(score ~ Machine + (Machine || Worker), data=Machines, expand_re = TRUE, return = "data") emm_options(lmer.df = "satterthwaite") emmeans(m1, "Machine") # works now # Machine emmean SE df lower.CL upper.CL # A 52.35556 2.397324 7.35 46.74094 57.97017 # B 60.32222 2.633283 9.04 54.36898 66.27546 # C 66.27222 2.836796 11.56 60.06544 72.47901 # # Degrees-of-freedom method: satterthwaite # Confidence level used: 0.95
henrik
KeymasterI am sorry, but I do not understand what you want to back-transform into what. In general, a reproducible example (e.g., http://stackoverflow.com/q/5963269/289572) helps a lot in this regard. But you at least have to show the output you have and then tell me exactly what you want.
henrik
KeymasterIn general, I would give up on the idea of having something like a standardized effect size on the level of the effects (i.e. model terms) in mixed models. They generally do not appear to work as soon as a model includes random slopes or multiple random-effects grouping factors. Same applies for something like R-squared.
Confidence intervals can be obtained on the level of the cell means (i.e., factor level or linear combinations thereof) using
emmeans
. For linear mixed modelsemm_options(lmer.df = ...)
allows you to choose how the degrees of freedoms are calculated which can have dramatic consequences on the speed. Three options exist:lmer.df = c("kenward-roger", "satterthwaite", "asymptotic")
.It is not always trivial how to combine the table of effects with a measure of effect size, but one can sometimes find ways. For example see Table 2 in: http://singmann.org/download/publications/2015_kellen_singmann_vogt_klauer.pdf
henrik
KeymasterI do not know exactly which package
r.squaredGLMM
comes from, but I expect that, similar toconfint
, it also works on the level of the model parameters and not the level of the model effects (or ‘model terms’). So they are in principle incompatible withafex
. If in your case all effects only consist of one parameter, then you can of course use them.In such a case, the simplest way would be to add these values to the
data.frame
s returned by eithernice()
oranova()
. Continuing from the previous example:df1 <- nice(m1) df1$new_col <- 2 df1 # Mixed Model Anova Table (Type 3 tests, KR-method) # # Model: score ~ Machine + (Machine | Worker) # Data: Machines # Effect df F p.value new_col # 1 Machine 2, 4 32.80 ** .003 2 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1 df2 <- anova(m1) df2$new_col <- 2 df2 # Mixed Model Anova Table (Type 3 tests, KR-method) # # Model: score ~ Machine + (Machine | Worker) # Data: Machines # num Df den Df F Pr(>F) new_col # Machine 2 4 32.803 0.0033024 2
As you can see, both functions simply return a
data.frame
for which specificprint
methods exists. But you can simply add additional columns. If you want to remove the specificprint
methods, simply wrap the returned object in aas.data.frame()
call. For example:as.data.frame(df2) # num Df den Df F Pr(>F) new_col # Machine 2 4 32.80302 0.003302373 2
Note that you can also get confidence intervals via the methods implemented in
emmeans
, which work nicely withafex
functions, see: https://cran.r-project.org/package=emmeanshenrik
KeymasterI do not think
afex
andsjt.
orsjp.
functions are compatible. The main reason is that thesj
functions focus on the standardlmer
output and their fixed-effects parameters. In contrast,afex
focuses on tests of effects. In many cases, effects involve more than one parameter.However,
afex
involves many functions that support nice printing of the"mixed"
objects returned frommixed
. Mainlynice()
andanova()
. See the following example:library("afex") data("Machines", package = "MEMSS") # simple model with random-slopes for repeated-measures factor m1 <- mixed(score ~ Machine + (Machine|Worker), data=Machines) m1 # Mixed Model Anova Table (Type 3 tests, KR-method) # # Model: score ~ Machine + (Machine | Worker) # Data: Machines # Effect df F p.value # 1 Machine 2, 4 32.80 ** .003 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1 nice(m1) # Mixed Model Anova Table (Type 3 tests, KR-method) # # Model: score ~ Machine + (Machine | Worker) # Data: Machines # Effect df F p.value # 1 Machine 2, 4 32.80 ** .003 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1 knitr::kable(nice(m1)) # |Effect |df |F |p.value | # |:-------|:----|:--------|:-------| # |Machine |2, 4 |32.80 ** |.003 | anova(m1) # Mixed Model Anova Table (Type 3 tests, KR-method) # # Model: score ~ Machine + (Machine | Worker) # Data: Machines # num Df den Df F Pr(>F) # Machine 2 4 32.803 0.003302 ** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 knitr::kable(anova(m1)) # | | num Df| den Df| F| Pr(>F)| # |:-------|------:|------:|--------:|---------:| # |Machine | 2| 4| 32.80302| 0.0033024|
Please let me know if this is not enough. If you definitely want to use the
sj
family of functions you need to contact the maintainer and ask if he is willing to supportafex
functions.henrik
KeymasterAFAIK, the problem of the sphericity tests is not so much related to the number of conditions or their levels (as long as you have more participants than cells of the design), but probably more bad luck with your data. You might try to randomly remove one or two participants to see if this still happens. However, this is not part of
afex
so might also be wrong.For balanced designs where the contrasts are orthogonal (as is always the case for repeated measures ANOVA via
car::Anova()
) ignoring individual effects does not change the other effects. Hence you can do it here. However, should one of the previously mentioned conditions (balance and orthogonality) not be met, omitting one of the effects from a model changes the other effects. This could be what your thesis committee had in mind. Furthermore, if a test is always at least implicitly part of the model then why omit it? If its inclusion does not change the model, knowing it results does not hurt.henrik
KeymasterYou can simply and safely ignore the tests that you are not interested in. A repeated-measures ANOVA as run with
car::Anova()
(this is exactly whatafex
uses) **always** tests all effects (so this is a consequence ofcar::Anova()
and not ofafex
). I know this because I asked John Fox (the author ofcar
) a very similar question some years ago and he responded with:The within-subjects contrasts are constructed by Anova() to be orthogonal in the row-basis of the design, so you should be able to safely ignore the effects in which (for some reason that escapes me) you are uninterested.
The full quote can be seen at the following link which also provides link to the full exchange:
https://stat.ethz.ch/pipermail/r-help/2012-July/319143.htmlI also think there is not much you can do if the sphericity tests fail for your data (at least I do not know what, these tests are also all done by
car
).Hope that helps!
henrik
Keymasteremmip()
is not part ofafex
, but part of theemmeans
package. So I am not the one that can provide adequate support. I am also not sure what it is supposed to do in this case or if it even supports numerical covariates on the x-axis.In general,
lmer_alt
returns the same object aslmer
orglmer
. In the default setting it returns objects of classmerModLmerTest
forlmer
objects. If you prefer the ‘true’merMod
objects returned fromlme4
in this case you can change this via:
afex_options(lmer_function = "lme4")
(but note that thenmethod = "KR"
and"S"
stop working).mixed
returns objects of classmixed
, for which the slotfull_model
contains the same model oject aslmer_alt
.Thus, objects returned from
lmer_alt
(or thefull_model
slot ofmixed
objects) should in principle work with all functions which work withmerMod
objects. For example foreffects
:library("afex") library("effects") data("Machines", package = "MEMSS") # some example data m1 <- mixed(score ~ Machine + (Machine || Worker), data=Machines, expand_re = TRUE) # when using effects with mixed we need to set contr.sum globally: set_sum_contrasts() plot(Effect("Machine", m1$full_model)) # equal to: emmeans(m1, "Machine") ## compare again (now gives wrong results): set_treatment_contrasts() plot(Effect("Machine", m1$full_model))
In principle this should also work for objects returned from
lmer_alt
. However it does not right now, because it does not set the data argument correctly (I will correct this in the next version and on github). For now the following works:m2 <- lmer_alt(score ~ Machine + (Machine || Worker), data=Machines) m2@call[["data"]] <- as.name("Machines") plot(Effect("Machine", m2))
henrik
KeymasterTo be able to provide a more useful answer to your question, I have to ask you why are you looking at the fixed effects in the first place? What is your goal with this?
One of the design goals of
afex
is to hide the fixed effects from the user, as they are not particularly useful in cases such as yours (i.e., factorial designs with factors with more than two levels). If your goal is to test specific hypotheses on the level of the factor levels do so using the functionality provided by thelsmeans
(or fromafex
v. 0.19.1 onwards usingemmeans
). This is described in some detail in the ANOVA vignette and applies in exactly the same way tomixed
models as to ANOVA models. For more details on how to set up contrasts (or post-hoc tests) you are interested in, have a lookj at the great vignettes ofemmeans
: https://cran.r-project.org/package=emmeansemmeans
also automatically calculates CIs for you. This is certainly possible, but needs to be done on the unconstrained (i.e., logistic) scale. Only after calculating the CIs can the values be transformed on the response scale (i.e., probabilities). For example in your case, compare:emmeans(full_mixed, c("Method", "Levels")) emmeans(full_mixed, c("Method", "Levels"), type = "response")
Note that you can simply specify any type of contrasts on these estimated marginal means. Several examples of this for logistic mixed models are provided in the supplemental material of a recent paper of mine (https://doi.org/10.1016/j.cogpsych.2017.09.002), which can be found at: https://osf.io/3uajq/
One final thought, as discussed in length by Barr, Levy, Scheepers, and Tily (2013), you most likely do not want a random-intercept only model. Anything they write in this paper applies in the same manner to logistic mixed model. The recommended maximal model has random-slopes for all within-subjects factors (e.g.,
Correct ~ Method * Levels * MATB + (Method * Levels * MATB | SubjectID)
). The model you estimated likely produces anti-conservatie results.All I have discussed here is also discussed in more length in our recent chapter: http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf
Again, this deals with linear mixed models, but the parts relevant to your question apply in the same manner to generalized linear mixed models.henrik
KeymasterAs can be seen from the output, the difference is whether or not one wants to assume equal variances for the follow-up tests (as does
lsmeans
) or nor (as doesSPSS
).One can almost replicate the results from
SPSS
inR
, by running independent analyses for each difference. I demonstrate this here for thew1 - w2
contrast:library("tidyr") dw <- spread(d, factor_within, value) dw$d1 <- dw$w1 - dw$w2 summary(lm(d1 ~ 1, dw)) # Call: # lm(formula = d1 ~ 1, data = dw) # # Residuals: # Min 1Q Median 3Q Max # -47.296 -6.296 -1.296 8.204 36.704 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) 10.296 3.016 3.414 0.00211 ** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 15.67 on 26 degrees of freedom
Note that to get the standard errors, I rely on the fact that a within-subject contrast of two means is simply the test on whether the difference differs from zero. And this can be tested with a linear model with an Intercept only. I do not know how to get the standard error with
t.test
. As can be seen, the estimate is again identical and now the standard error is pretty close to the one reported (this also holds for the other two contrasts). I am not sure why the standard error is not identical, because I did a similar analysis some time ago in which it was. Maybe this has something to do with your specificSPSS
call (e.g.,Polynomial
). In any case, the p-value then still needs to be corrected.What should be preferred is not a simple question. One can make arguments for both decisions.
Independent tests (t-tests or ANOVAs depending on the follow-up test) is the approach recommended in the Maxwell and Delaney stats book (which is a great book). IIRC, their main argument is basically that the equal variance assumption is likely false in real psychological situations.
I do not think this is a very strong argument (why?). In contrast, I think that in cases with limited data the positive effect of pooling on precision of the variance estimate seems to be a stronger argument for pooling. That is why I usually use
lsmeans
on the ANOVA result. In addition, it makes everything quite easy because one can program all tests directly on the ANOVA output.As many time in statistics, this is one of the situations where you as the user has to make an informed decision.
henrik
KeymasterYour question unfortunately presents very little detail to understand what the actual problem is. I would guess it has something to do with the data types of the variables passed to
mixed
. Like mostR
modeling functions it behaves differently for factors versus numerical variables. You can check the type of your data withstr()
, e.g.,str(my_data)
.Also,
mixed
should behave completely identical fordata.frame
andtbl_df
as it converts the data to adata.frame
(see line 164 at: https://github.com/singmann/afex/blob/master/R/mixed.R). If your could show that this is not the case, this would be a bug.If you want a more precise answer, provide a reproducible example: http://stackoverflow.com/q/5963269/289572
December 10, 2017 at 17:58 GMT+0000 in reply to: PB p-values for a poisson glmm with an offset? #160henrik
KeymasterAs the help page says, ‘
method = "PB"
calculates p-values using parametric bootstrap’. This means, it calculates a reference distribution of chi^2-square values assuming the null model is true, by generating data from the full model and then fitting both models (i.e., full and restricted model) to it. For each of those fits the chi-square value is obtained. The p-values is then the proportion of sampled chi-square values larger than the actual chi-square value based on the real data (it actually is the number of samples larger plus one divided by the number of samples). Because of these repeated fits you also get the repeated non-convergence warnings.Per default it uses 1000 samples for this per test. This means, it has to be considerably slower than just fitting in
glmer
. In fact, roughly 1000 times the number of effects in your model slower. You can change the number of parametric bootstrap samples via theargs_test
andnsim
, e.g.,args_test = list(nsim = 500)
. However, values lower than 500 seem suspicious. Note that you can also distribute the estimation using multiple cores (see examples ofmixed
help).If you do not have the time to wait, you can also use
method = "LRT"
. Note however that for this you should have enough levels for each of your random-effects grouping factors (more than 30, better more than 50). Otherwise the results might be anti-conservative.henrik
KeymasterCan you provide some example data? Without being able to reproduce the problem I am unable to fix it.
The example data can be as trivial as possible. See the following link for advise on how to create it: https://stackoverflow.com/q/5963269/289572
-
This reply was modified 5 years, 6 months ago by
henrik.
November 15, 2017 at 22:51 GMT+0000 in reply to: Sum coding for ANOVA style results with unbalanced lmer data #147henrik
KeymasterQuestion 1: In principle yes. If the imbalance is only small and random then go ahead. If the imbalance is however structural and provides information (e.g., you tried to sample more in this cell but it was more complicated than the other condition and participants dropped out), then it is time to think about it more (you should probably use Type 2 sums of squares then). In any case, with sum contrasts and unbalanced groups, the intercept simply represents the unweighted grand mean (i.e., the mean of the cell means and not the overall grand mean).
Question 2: I am not sure what produces this warning in this case. To fully understand what is going on, I need a reproducible example (see here). Note that the output indicates that both
Coherence
andFinitness
have two levels, which seems to be intended. You might try to rundroplevels()
on the data before fitting, maybe this removes it.
Also note that it should bemethod = 'S'
(and notmethod = afex_options("S")
).Hope that helps.
henrik
KeymasterThis warning hopefully only appears if you actually use
HF
ascorrection
. It indicates that the Huynh-Feldt epsilon is above 1. This is the correction factor applied to the degrees of freedom for violations of sphericity. Values above 1 do not make any sense, as the df are multiplied with this correction factor and should not be above the uncorrected df (which would appear for values above 1). Thus, for epsilons above 1, 1 is used. See also: https://en.wikipedia.org/wiki/Mauchly%27s_sphericity_testYou did not include a reproducible example, so I am not sure if this appears incorrectly. In any case, this is not a critical warning and simply passed on from
summary.Anova.mlm
which produces it.henrik
KeymasterFor models estimated with maximum likelihood estimation the parameterization does in general not matter (this obviously breaks down in certain cases as seen for
m6
because of differences in hierarchical shrinkage, but is approximately true).In contrast, for models estimated with restricted maximum likelihood estimation (i.e.,
REML = TRUE
) the random effect estimates (i.e., the so-called conditional modes which are not actual parameters here) are obtained independently from the fixed effects estimates. If I recall correctly, the random effects are estimated first and then the fixed effects. Here the differences in hierarchical shrinkage play a substantively larger role as they are estimated independently and we basically always see differences due to parameterization. Note that in the grand scheme of things these differences are still comparatively small.The individual deviations or offsets estimated for each random effect parameter are estimated to be normally distributed around the fixed effect parameter, given the parameterization of the model. For example, for a model with treatment contrasts the offsets for the intercept (i.e., the random intercept) are distributed around the first factor level. The offsets for the random slopes are distributed around the deviation from the first factor level. In contrast, for a model with
contr.sum
parameterization the random intercepts are distributed around the grand mean and the random slopes are distributed around the differences from the grand mean.These differences in parameterization can lead to different results if shrinkage is taken into account. For example, the estimate for a participant with an extreme effect for the first factor level would receive comparatively large shrinkage for the random intercept with treatment contrasts. This would then affect all other random effect estimates as well (as the intercept is always included). With a different parameterization the shrinkage on the random intercept would be lower and the other factor levels thus less influenced by the large outlier for one case.
henrik
KeymasterThe slides are a little bit old, so I am not sure if what Doug Bates said in 2008 still holds in the current
lmer
version. However, it is nevertheless worth while to think about what each model actually estimates. Let us do this with some random data generated based on astats.stackexchange
question from some time ago.We first create some data.
require(afex) set.seed(666) d <- data.frame( y = rnorm(96), group = factor(rep(1:12, 4)), factor = factor(rep(1:2, each=24)))
Then we fit a bunch of different models. Note that we set
REML = FALSE
or equivalentlymethod = "LRT"
formixed
. That makes the interpretation somewhat easier.m1 <- lmer(y ~ factor + (0 + factor|group), d, REML = FALSE) summary(m1) # Random effects: # Groups Name Variance Std.Dev. Corr # group factor1 0.04164 0.2041 # factor2 0.06435 0.2537 0.79 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group, 12
Model one estimates two random-effects variances (one for each level of
factor
) and one correlation.m2 <- lmer(y ~ factor + (1|group) + (1|group:factor), d, REML = FALSE) summary(m2) # Random effects: # Groups Name Variance Std.Dev. # group:factor (Intercept) 0.01216 0.1103 # group (Intercept) 0.04083 0.2021 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group:factor, 24; group, 12
In contrast, model 2 only estimates two random effects variances. However, the residual variance seems to stay the same
We can extend this and look at a few more models.
m3 <- mixed(y ~ factor + (factor|group), d, method = "LRT") summary(m3) # Random effects: # Groups Name Variance Std.Dev. Corr # group (Intercept) 0.046914 0.21660 # factor1 0.006082 0.07799 -0.34 # Residual 1.021271 1.01058 # Number of obs: 96, groups: group, 12
m3
usescontr.sum
(because that is the default withmixed
) but is otherwise similar tom1
: 2 variances and 1 correlation. The residual variance is still the same.m4 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, method = "LRT") summary(m4) # Random effects: # Groups Name Variance Std.Dev. # group (Intercept) 0.046914 0.21660 # group.1 re1.factor1 0.006082 0.07799 # Residual 1.021271 1.01058 # Number of obs: 96, groups: group, 12
m4
now is similar tom2
, two variances only. However,m4
has the same parameter estimates asm3
for the variances.m5 <- mixed(y ~ factor + (0 + factor|group), d, method = "LRT") summary(m5) # Random effects: # Groups Name Variance Std.Dev. Corr # group factor1 0.04164 0.2041 # factor2 0.06435 0.2537 0.79 # Residual 1.02127 1.0106 # Number of obs: 96, groups: group, 12
m5
is identical tom1
(in terms of the random effects). This is maybe somewhat surprising because it usescontr.sum
instead ofcontr.treatment
.m6 <- mixed(y ~ factor + (factor||group), d, expand_re = TRUE, check_contrasts = FALSE, method = "LRT") summary(m6) # Random effects: # Groups Name Variance Std.Dev. # group (Intercept) 0.04107 0.2027 # group.1 re1.factor2 0.02328 0.1526 # Residual 1.02156 1.0107 # Number of obs: 96, groups: group, 12
Finally,
m6
uses the default contrasts again (contr.treatment
), and also estimates two variances.Now let us take a look at the overall model fit. First for all models that estimate only the variances:
anova(m2, m4$full_model, m6$full_model) # Data: d # Models: # m2: y ~ factor + (1 | group) + (1 | group:factor) # m4$full_model: y ~ factor + ((1 | group) + (0 + re1.factor1 | group)) # m6$full_model: y ~ factor + ((1 | group) + (0 + re1.factor2 | group)) # Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) # m2 5 288.77 301.59 -139.39 278.77 # m4$full_model 5 288.77 301.59 -139.39 278.77 0.0000 0 < 2.2e-16 *** # m6$full_model 5 288.75 301.58 -139.38 278.75 0.0165 0 < 2.2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Somewhat surprisingly, these models all provide essentially the same account (if we are graceful with
m6
). The same holds if we look at all models that include the correlation. They are all the same as well.anova(m1, m3$full_model, m5$full_model) # Data: d # Models: # m1: y ~ factor + (0 + factor | group) # m3$full_model: y ~ factor + (factor | group) # m5$full_model: y ~ factor + (0 + factor | group) # Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) # m1 6 290.75 306.14 -139.38 278.75 # m3$full_model 6 290.75 306.14 -139.38 278.75 0 0 1 # m5$full_model 6 290.75 306.14 -139.38 278.75 0 0 <2e-16 ***
In general it should holds that all models without the correlation provide the same account and all models with correlation provide the same account (as long as
REML = FALSE
for both cases), independent of their exact parameterization (i.e.,set_sum_contrasts
plays no role here; this is only relevant for interactions of fixed effects). I am not sure what is happening withm6
here, but let us ignore this for now. I think from this it follows thatm1
andm2
are equivalent, if the correlation is 0. I am not sure if this is implied by Doug Bates quote, but is what I take from it.Independent of the parameterization, models with two variances allow each individual level of group to have a random intercept and each level of group to have their own idiosyncratic effect of factor. And these effects are uncorrelated. If we now add the correlations, we get a slightly better fit because they do not appear to be uncorrelated.
Interestingly, this also holds if we compare
m6
(without correlation) with the equivalently parameterized modelm3
(with correlation):deviance(m6$full_model) - deviance(m3$full_model) # [1] 3.211216e-05
Note that only one random effect also provides a worse account.
m7 <- lmer(y ~ factor + (1|group:factor), d, REML = FALSE) deviance(m7) # [1] 278.9834
henrik
KeymasterIt is not advised to use any afex functions with missing data. Perhaps more generally, it is not advised for any standard (i.e., part of the
stats
package) R modeling function to have missing data. All those functions simply remove missing data on a casewise basis and it is a good idea to remove them before running any model. Otherwise you might get the false sense that all your data was used for fitting, whereas those observations with missing missing values were in fact simply removed.The way in which
mixed
works (i.e., in many situations it first creates several model matrices that are then fitted separately) somewhat complicates this. If the different model matrices have different numbers of rows due to missing values the ensuing models are non-commensurable and cannot be used for hypothesis testing. This can also appear when usingexpand_re = TRUE
as this also creates a new model matrix (as the argument implies, it first expands the random effects structure into a new model matrix).The point of this warning is simply that the way in which the creation of the new model matrix deals with missing values cannot be guaranteed to be the same as is done in
g/lmer
. This also applies to bothlmer_alt
andmixed
. Hence, I highly advise to remove missing values before fitting a model if they appear in any variables that are part of the model (i.e., for for dependent variables, fixed-, and random-effects). I would go so far to suggest this even when using modeling functions outside ofafex
unless they explicitly deal with missing values (e.g., multiple imputation, but note that these also fit models where each data set has no missings as they are imputed in each case).I hope this clears it up.
July 31, 2017 at 16:51 GMT+0000 in reply to: lsmeans compatibility with mixed() and type 2 tests #122henrik
KeymasterOkay, my bug fix did not work for models with interactions. I have provided a new bug fix that should work now, simply download the newest github version.
The last error (
Error in calculation of the Satterthwaite's approximation. The output of lme4 package is returned
) however has nothing to do withafex
. This is a problem oflsmeans
. You need to switch how the df are calculated:lsm.options(lmer.df = "Kenward-Roger") # set df for lsmeans to KR lsm.options(lmer.df = "Satterthwaite") # the default (does not work in some cases) lsm.options(lmer.df = "asymptotic") # the fastest, no df
July 31, 2017 at 09:53 GMT+0000 in reply to: lsmeans compatibility with mixed() and type 2 tests #120henrik
KeymasterCan you please post example code that triggers the error? I never use type II tests so I do not understand what you are up to. If I can recreate the error I will most likely be able to fix it in the github version.
July 28, 2017 at 13:45 GMT+0000 in reply to: lsmeans compatibility with mixed() and type 2 tests #115henrik
KeymasterThat is a indeed a bug when passing
mixed
objects withtype=2
andmethod="LRT"
(or"PB"
). It is now corrected in the development version ofafex
which can be installed via:devtools::install_github("singmann/afex@master")
If it is not possible to use the development version, a work-around along the following lines should work:
require(afex) data(md_16.4) (mixed2 <- mixed(induct ~ cond + (1|room:cond), md_16.4, type=2, method="LRT")) lsmeans(mixed2$full_model[[1]], specs = c("cond")) lsmeans(mixed2$full_model[[1]], ~cond) lsmeans(mixed2$full_model[[1]], pairwise ~ cond)$contrast
As you can see, we simply directly pass the full model. The problem for
type=2
is that in case in which nested model comparisons are used (e.g., whenmethod = "LRT"
) thefull_model
slot is a list of lists, so we need to use the first element of the slot.Thanks again for the report.
henrik
KeymasterI cannot see the figure, you need to add it in another way.
Are you sure you are using the latest version of
afex
andR
? Does it also occur with other methods (e.g.,method = "S"
)? If yes to both, this is an error I haven’t seen so far and I will not be able to help you without reproducing it (i.e., I will need the data).henrik
KeymasterI am not sure why you need the
anova()
function, as the return output ofmixed()
should already give you theF
andp
values (although you can useanova()
to get a specific ANOVA table). Are you sure that all the variables are of the correct type (i.e., factors are in fact factors, you can check this withstr()
)? Are there any warnings?
Perhaps it would be best to post the output frommixed()
.Regarding the simulation, I think about one where all your fixed effects have no effect. You simulate data from the Null model. So you generate data that is structurally identical to your data in all regards (same factor levels and number of participants), but has no effect besides the random effects, and then fit it with the model you use here. You do this a lot of times (e.g., 1000) and see how often the p-value is significant. Should be 5% of the times. Similar as done here: singmann.org/mixed-models-for-anova-designs-with-one-observation-per-unit-of-observation-and-cell-of-the-design/
-
This reply was modified 5 years, 6 months ago by
-
AuthorPosts