Reporting results for designed experiments in applied biology
A great majority of designed experiments in applied biology (especially field experiments) is aimed at comparing several treatment levels on a randomised block design (or similar), with few replicates (normally 3 to 5). In the end, biologists are usually interested in reporting the results by way of a table/graph of observed means, together with some appropriate measure of uncertainty. The presence of this latter measure is to be regarded as mandatory for publication in all scientific journals, though its selection may pose a few problems and become an obstacle during the reviewing process.
How do we make a reasonable choice? Two things should be kept in mind: (1) for each treatment level we have a sample of measures taken from a wider population: it is the population that matters, not the sample! (2) Both the mean and the uncertainty measure should preferably be on the same measurement unit. Bearing this in mind, the choice mainly depends on what distributional assumptions we are willing to make about the population that generated our dataset. Here are some simple suggestions.
If we do not want to/cannot make any assumptions about the original population and we would only like to describe the variability of each sample around the mean, we should better report the standard deviation (SD) for each sample.
If we can assume that our samples come from normal populations and we would like to describe the variability of the means, that could be expected by repeatedly performing the same experiment, we'd better report the standard error for each mean (SE). This latter is a good aid to express the repeatability of results beyond the observed sample, though it is not a good measure of the variability of data around the mean (the SE is always lower than the SD). Therefore, we may consider adding to the table/graph the numerosity of samples (n), so that everybody can go back from the SE to the SD.
If we can assume that the sampled populations are normal and with similar variances (homoscedasticity) we will probably go for an ANOVA. In this case, we can describe the variability of the means (as in #2) by using the pooled standard error of a mean (SEM), derived from the residual mean square in ANOVA. One and the same value is ok for all means, if the design is balanced! Also in this case, we may consider adding the numerosity of samples, to help readers go back and determine the corresponding SD.
In the same situation as above (normality and homoscedasticity), we might like to guide the reader to making comparisons among treatment levels. To this aim, we might like to replace the SEM with the pooled standard error of a difference (SED). Reporting the degrees of freedom for SEDs may be recommended, to help readers calculate a critical difference for each comparison (Least Significant Difference or Honest Significant Difference). This latter display has some limitations, which should be carefully considered. First of all, the adoption of pairwise comparisons is inelegant whean dealing with a quantitative series of treatment levels (that should be preferably compared on a regression settings). Secondly, in case of unbalanced data, there will be different SED values for each mean, depending on which comparison we intend to make.
What should we do when we have adopted some stabilising transformation prior to ANOVA?
Linear models (ANOVA and regression) are routinely used in applied biology. Whenever normality and homoscedasticity of residuals cannot be assumed, we may go for the adoption of some sort of stabilising transformations (logarithm, square root, arcsin square root…), prior to ANOVA. In this case, the above suggestions still hold if we intend to show the means of transformed data. However, such a display may hinder the clarity of results, as the original measurement unit is lost. What should we do, then? Here are some further suggestions.
As usual, we can present the means of original data with SDs (as in #1 above). This is clearly less than optimal, if we want to suggest more than the bare variability of the observed sample. Furthermore, please remember that the means of original data may not be a good measure of central tendency, if the original population is strongly 'asymmetric' (skewed)!
The best option in this case is to show back-transformed means (i.e. if we have done, for example, a square root transformation, we can take the means of transformed data and square them). Back-transformed means 'estimate' the medians of the original populations, which may be regarded as better measures of central tendency for skewed data.
To complete our work, we should add to each back-transformed mean its back-transformed standard error, by using the so-called delta method. I'll clear this up with an example.
An example with counts
Consider the following dataset, that represents the counts of insects on 15 independent leaves, treated with the insecticides A, B and C (5 replicates):
Level 1 2 3 4 5 Mean
A 448 906 484 477 634 589.8
B 211 276 415 587 298 357.4
C 50 90 73 44 26 56.6
This variable represent a count and, as expected, variances for the treatment levels A, B and C are sensibly different. A logarithmic transformation can do the trick here and produce a normal and homoscedastic new dataset. Therefore we take the log-transformed variable and submit it to ANOVA:
Analysis of Variance Table
Df Sum Sq Mean Sq F value Pr(>F)
Level 3 448.19 149.396 946.66 1.624e-14 ***
Residuals 12 1.89 0.158
If we were to report the means of the log-transformed variable (log-Means), we might follow the above suggestion and show:
Level log-Means
A 6.343
B 5.815
C 3.985
SEM 0.178
where the SEM is obtained by taking \( \sqrt{0.158/5} \). Unfortunately, we loose clarity: how many insects did we have on each leaf?
A possible way out of this is to back-transform the above log-Means (for example \( exp(6.343)=568.499 \)) and use the delta method to back-transform the standard error. This is straightforward: (1) take the first derivative of the back-transform function [in this case the first derivative of exp(X)=exp(X)] and (2) multiply it by the standard error of the transformed data. For the case of level A, it is: \( exp(6.343) \times 0.178 = 101.19 \). Therefore, data might be presented as follows:
Level Back-transformed
Means (SE)
A 568.5 (101.19)
B 335.1 (59.68)
C 51.88 (9.57)
Far clearer, isn't it? If we had done a square root transformation, the back-transform function would be \( X^2 \). The first derivative would be \( 2 \times X \) and this should be multiplied by the SEM to get a back-transformed standard error. If you want to know something more about the delta-method you might start from my post here. Some collegues and I have discussed this issues in our paper
'Current statistical issues in Weed Research' (Onofri A., Carbonell E., Piepho H.-P., Mortimer A.M. & Cousens R.D., 2010. Weed Research, 50, 5-24).
In all cases, whatever measure of uncertainty you might like to use, do not forget to state it clearly and be consistent throughout the paper!