Estimated marginal means don't match SPSS

Home forums ANOVA Estimated marginal means don't match SPSS

Tagged: ,

This topic contains 2 replies, has 2 voices, and was last updated by  statmerkur 1 year ago.

  • Author
    Posts
  • #163

    statmerkur
    Participant

    I followed your ANOVA vignette and passed the afex_aov object to lsmeans to obtain the marginal means of my within-subjects factor. The emmeans are consistent with the SPSS emmeans but the standard errors and thus the pairwise comparisons don’t match:

    R

     factor_within   lsmean       SE    df lower.CL upper.CL
     w1            273.8519 6.325405 28.92 260.9134 286.7903
     w2            263.5556 6.325405 28.92 250.6171 276.4940
     w3            266.4074 6.325405 28.92 253.4689 279.3459
    
     contrast  estimate       SE df t.ratio p.value
     w1 - w2  10.296296 3.310553 48   3.110  0.0094
     w1 - w3   7.444444 3.310553 48   2.249  0.0875
     w2 - w3  -2.851852 3.310553 48  -0.861  1.0000

    SPSS

     factor_within  lsmean    SE lower.CL upper.CL
     w1            273.852 6.779  259.860  287.842
     w2            263.556 5.639  251.917  275.194
     w3            266.407 6.502  252.988  279.827
    
     contrast  estimate    SE p.value
     w1 - w2     10.296 3.064   0.008
     w1 - w3      7.444 3.836   0.192
     w2 - w3     -2.852 2.962   1.000

    This is the R code I used:

    # library("haven")
    # library("reshape2")
    library("afex")
    library("lsmeans")
    
    d <- data.frame(id = gl(27, 3),
                    factor_between = gl(3, 27, labels = c("b1", "b2", "b3")), 
                    factor_within = gl(3, 1, 81, labels = c("w1", "w2", "w3")),
                    value = c(296, 270, 281, 208, 207, 199, 313, 292, 278, 323, 296, 331, 
                              227, 224, 226, 290, 327, 323, 224, 233, 221, 296, 282, 286, 304, 
                              298, 281, 305, 273, 278, 287, 279, 271, 282, 272, 285, 251, 235, 
                              236, 243, 250, 285, 231, 224, 223, 295, 278, 269, 313, 266, 280, 
                              243, 249, 246, 267, 262, 247, 311, 291, 304, 262, 253, 248, 272, 
                              265, 270, 239, 229, 249, 234, 233, 220, 327, 302, 291, 270, 253, 
                              260, 281, 273, 305))
    
    # export to SPSS
    # d_spss <- dcast(d, id + factor_between ~ factor_within, value.var = "value")
    # write_sav(d_spss, "d_spss.sav")
    
    set_sum_contrasts()
    a1 <- aov_ez(id = "id",
                 dv =  "value",
                 data = d,
                 between = "factor_between",
                 within = "factor_within")
    
    nice(a1, es = "pes")  # matches with SPSS
    
    print(m1 <- lsmeans(a1, ~ factor_within))  # SEs don't match the SPSS SEs ...
    pairs(m1, adjust = "bonferroni")  # ... therefore pairwise comparisons don't match

    And this is the SPSS syntax I used:

    DATASET NAME DataSet1 WINDOW=FRONT. 
    GLM w1 w2 w3 BY factor_between 
      /WSFACTOR=factor_within 3 Polynomial 
      /MEASURE=value 
      /METHOD=SSTYPE(3) 
      /EMMEANS=TABLES(factor_within) COMPARE ADJ(BONFERRONI) 
      /PRINT=ETASQ 
      /CRITERIA=ALPHA(.05) 
      /WSDESIGN=factor_within 
      /DESIGN=factor_between.

    Is there something wrong with my R code or why do I get these diverging results?

  • #171

    henrik
    Keymaster

    As can be seen from the output, the difference is whether or not one wants to assume equal variances for the follow-up tests (as does lsmeans) or nor (as does SPSS).

    One can almost replicate the results from SPSS in R, by running independent analyses for each difference. I demonstrate this here for the w1 - w2 contrast:

    library("tidyr")
    
    dw <- spread(d, factor_within, value)
    dw$d1 <- dw$w1 - dw$w2
    
    summary(lm(d1 ~ 1, dw))
    # Call:
    # lm(formula = d1 ~ 1, data = dw)
    # 
    # Residuals:
    #     Min      1Q  Median      3Q     Max 
    # -47.296  -6.296  -1.296   8.204  36.704 
    # 
    # Coefficients:
    #             Estimate Std. Error t value Pr(>|t|)   
    # (Intercept)   10.296      3.016   3.414  0.00211 **
    # ---
    # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    # 
    # Residual standard error: 15.67 on 26 degrees of freedom 

    Note that to get the standard errors, I rely on the fact that a within-subject contrast of two means is simply the test on whether the difference differs from zero. And this can be tested with a linear model with an Intercept only. I do not know how to get the standard error with t.test. As can be seen, the estimate is again identical and now the standard error is pretty close to the one reported (this also holds for the other two contrasts). I am not sure why the standard error is not identical, because I did a similar analysis some time ago in which it was. Maybe this has something to do with your specific SPSS call (e.g., Polynomial). In any case, the p-value then still needs to be corrected.

    What should be preferred is not a simple question. One can make arguments for both decisions.

    Independent tests (t-tests or ANOVAs depending on the follow-up test) is the approach recommended in the Maxwell and Delaney stats book (which is a great book). IIRC, their main argument is basically that the equal variance assumption is likely false in real psychological situations.

    I do not think this is a very strong argument (why?). In contrast, I think that in cases with limited data the positive effect of pooling on precision of the variance estimate seems to be a stronger argument for pooling. That is why I usually use lsmeans on the ANOVA result. In addition, it makes everything quite easy because one can program all tests directly on the ANOVA output.

    As many time in statistics, this is one of the situations where you as the user has to make an informed decision.

  • #172

    statmerkur
    Participant

    This makes perfect sense and I can think of both situations where using lsmeans would be useful and where it wouldn’t. So in case of unequal cell sizes, should the dependent t-tests still match the SPSSs EMMEANS or how can one do an equivalent analysis in R then?

You must be logged in to reply to this topic.

 

Author: statmerkur