afex - Analysis of Factorial Experiments
Factorial experiments are at the heart of many fields with a strong experimental methodology such as experimental Psychology. The traditional method to analyze these experiments are analysis of variance (ANOVA) and analysis of covariance (ANCOVA). Although these methods may seem "somewhat oldfashioned" (Dalgaard, 2007, p. 2) compared to mixed models, they are still very popular and seen as the default analysis strategy in a wide variety of fields. Somewhat surprisingly, R lacks convenient functions for performing these analysis on unbalanced data or designs with repeated-measures factors. This seems especially unsatisfactory as the commercial statistical packages such as SPSS and SAS offer simple and powerful functionalities for performing those analyses.
afex tries to fill this gap (similar to package ez by Mike Lawrence) by providing functions that allow for conveniently running different types of ANOVAs and ANCOVAs using a common interface and replicating the default behavior of SPSS and SAS, namely using Type 3 sums of squares. Therefore, afex should be specifically interesting for new users of R who need to perform ANOVAs and want to replicate their results from SPSS or SAS. When learning a new statistics software replicating established results is probably a common strategy that allows one to get accustomed to the new software while being safe side that errors will be spotted (i.e., if the results cannot be replicated).
Additionally, afex offers one feature that goes beyond replicating the default behavior of commercial statistical software. When there is more than one observation per cell of the design and individual (common when using repeated-measures/within-subjects factors), afex automatically aggregates the data even allowing to specify the aggregation function (the default is the mean, but any other function, e.g., median is possible).
afex offers the convenience function
mixed() to obtain p-values for all effects in a linear-mixed model (see documentation of mixed available via
?mixed for more information). For an introduction to mixed modeling for experimental data see Judd, Westfal & Kenny (2012, who also recommend using the Kenward-Rogers approximation for degrees of freedom), Barr, Levy, Scheepers, and Tily (2013; for recommendations on how to specify the random effects structure), or Baayen, Davidson and Bates (2006; unfortunately they do not use p-values).
Finally, afex offers function
compare.2.vectors(x, y) for conveniently comparing two vectors
y using a t, Wilcoxon and permutation test.
afex is available from CRAN. Just install from within R with:
and can be installed via:
install.packages("afex", repos="http://R-Forge.R-project.org", type = "source")
After installation you need to load afex every time you start R by:
Important note: One needs to have at least pbkrtest version 0.3-6 if one wants to obtain p-values for GLMMs using parametric bootstrap (method = "PB") (time of writing, 11.10.2013).
Please see the help files in the package for detailed instructions:
?aov.car for help with ANOVA
?mixed for help with mixed models.
Note that afex was heavily influenced by the ez package from Mike Lawrence.
Reporting unexpected behavior or questions
When reporting unexpected behaviors, bugs etc. in the packages, PLEASE supply:
- A reproducible example in terms of a short code fragment.
- The data. The preferred way of sending the data "mydata" is to copy and paste the result from running dput(mydata).
- The result of running the sessionInfo() function.
(This text is stolen from here)
Most of these questions are about the more popular function
mixed() for (G)LMMs. If something is missing, please send me a mail.
- What does the warning "
failure to converge in 10000 evaluations" mean?
This message indicates that the optimizer did not find a suitable solution or optimum (i.e., the parameters, goodness of fit, deviance, AIC, BIC, ... cannot be trusted). This is simply a sign that the default number of maximum iterations which the optimizer uses is not sufficient which can easily happen even with medium sized datasets (especially with complex random structures). To avoid this message one only needs to increase the number of iterations that are allowed for the optimizer via the control argument, e.g. for a LMM:
control = lmerControl(optCtrl = list(maxfun = 100000)) or for a GLMM
control = glmerControl(optCtrl = list(maxfun = 100000)). One could also even use larger numbers here if necessary. See also the next question.
- How does
mixed()deal with the situation when the model does not converge?
mixed() doesn't do anything in these case, but pass the warnings through (unless a cluster is used via the
cl argument). You need to assure that you pass the appropriate control arguments (i.e.,
mixed which are passed down to
(g)lmer. Hence, if there are no warnings when
mixed finishes, there have been no warnings from the fitting routine (i.e., all models converged).
mixed returns an object (a list of class
"mixed") whose third element is a list with all the submodels fitted. (The second element of said list is the full model). Each of those submodels, at least in the new version of lme4, contains a slot which also contains fitting information such as nonconvergence or other warnings. The following returns the warnings for a model
lapply(m1[], function(x) x@optinfo$warnings)
mixed()also work if you pass it an already fitted model as first argument and not a formula?
This behavior is not supported and is known to fail when fitting is distributed using the
cl argument. Furthermore, it does not save computational time as the full model is fitted again in all cases. Hence, I strongly recommend to always use a call similar to the original (g)lmer call that includes the data and formula with arguments for
method (and potentially
- Why does afex set the default contrasts to effect coding (
afex is intended for factorial experiments, i.e., orthogonal factors. For those, there is, as far as I have read, little use for other codings. If so, I am really interested to hear the arguments. If you do not want this setting simply reset the default factors via the following command, all afex functions should still try to use effects coding unless told not to do so:
Finally, afex is open source, if you do not like it, branch afex and delete this. I think it is an extremely useful default, way more useful the the default treatment or dummy coding!
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. doi:10.1016/j.jml.2007.12.005
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. doi:10.1016/j.jml.2012.11.001
Dalgaard, P. (2007). New Functions for Multivariate Analysis. R News, 7, 2–7.
Fox, J., & Weisberg, S. (2011). An R Companion to Applied Regression. Thousand Oaks, Calif.: SAGE Publications.
Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103(1), 54–69. doi:10.1037/a0028347
O’Brien, R. G., & Kaiser, M. K. (1985). MANOVA method for analyzing repeated measures designs: An extensive primer. Psychological Bulletin, 97(2), 316–333. doi:10.1037/0033-2909.97.2.316
Wickham, H. (2007). Reshaping Data with the reshape Package. Journal of Statistical Software, 21(12), 1–20.