Saturday 8 February 2014

The delta method for dummies

Reporting variances (or standard deviations) for all estimates is normally regarded as a basic requirement for scientific publications. In my work as consultant I am very frequently asked such questions: “I have the measure Q with variance q and the measure W with variance w. What is the variance of exp(Q + W)?”

I recently wrote a post relating to linear transformations (see here), but I feel that being able to menage nonlinear transformations is much more useful. Frequentist statistics offer an approximate solution to this problem, that is normally known as the 'delta' method.

The basic idea is to create a linear approximation to our nonlinear function and use the former to approximate the variance, by using the same methods as given in my above mentioned post. Indeed, every nonlinear function G (close to linear, differentiable) of X can be locally approximated by the tangent line on a point \(\mu\) (in the close vicinity of X), with slope equal to the first derivative G'(mu) (Taylor series approximation to first order):

\[G(X) \simeq G(\mu ) + G'(\mu )(X - \mu )\]

Having transformed a nonlinear function into a linear function, we can approximate the variance by using:

\[Var[G(X)] \simeq [G'(\mu)]^2 Var(X) \]

In words: if I have a nonlinear (close to linear) function G(X), where X is a random variable with small variance and normal errors, the variance of G(X) is approximated by knowing the first derivative G' and the variance of X

Example 1: exp(X)

To calculate the derivatives, we can use an R function:

D(expression(G(x), “x”)

In this case:

D(expression(exp(b0)), "b0")
## exp(b0)

If we have:

X <- 3.5
VarX <- 0.15
exp(X)
## [1] 33.12
VarX * exp(X)^2
## [1] 164.5

In R there is a shortcut function to calculate the delta method, from the car package. The syntax is:

library(car)
deltaMethod(object = c(X = X), g = "exp(X)", vcov = matrix(VarX, 1, 1))
##        Estimate    SE
## exp(X)    33.12 12.83

By using this function we obtain the standard error, i.e. the standard deviation of the sampling distribution. By squaring the above SE (12.82556) we see that the results are obviously the same.

Example 2: 1/X

The first derivative is:

D(expression(1/x), "x")
## -(1/x^2)

Delta standard errors may be obtained as:

X <- 3.5
VarX <- 0.15
1/X
## [1] 0.2857
VarX * ((-1/X)^2)^2
## [1] 0.0009996

With R:

library(car)
deltaMethod(object = c(X = X), g = "1/X", vcov = matrix(VarX, 1, 1))
##     Estimate      SE
## 1/X   0.2857 0.03162

Example 3: log(X)

The first derivative is:

D(expression(log(x)), "x")
## 1/x

Delta standard errors may be obtained as:

X <- 3.5
VarX <- 0.15
log(X)
## [1] 1.253
VarX * (1/X)^2
## [1] 0.01224

Example 4: Inverse logit function

First derivative is:

D(expression(exp(X)/(1 + exp(X))), "X")
## exp(X)/(1 + exp(X)) - exp(X) * exp(X)/(1 + exp(X))^2

Delta standard errors may be obtained as:

X <- 3.5
VarX <- 1.5
exp(X)/(1 + exp(X))
## [1] 0.9707
VarX * (exp(X)/(1 + exp(X)) - exp(X) * exp(X)/(1 + exp(X))^2)^2
## [1] 0.001214

The extension to the case of two random variables is straightforward: we need the partial derivatives in each variable, plus all covariances

\[ var[g(x, y,..)] = D S D^T \]

where D is the matrix of partial derivatives with respect to all the variables, S is the variance-covariance matrix of all variables.

Example 5: X Y

First derivatives are:

D(expression(X * Y), "X")
## Y
D(expression(X * Y), "Y")
## X

Delta standard errors may be obtained as:

X <- 3.5
Y <- 4.2
VarX <- 1.5
VarY <- 1.2
CoVar <- 0.8
X * Y
## [1] 14.7
(Y^2) * VarX + (X^2) * VarY + 2 * X * Y * CoVar
## [1] 64.68

With R:

library(car)
sigma <- matrix(c(VarX, CoVar, CoVar, VarY), 2, 2, byrow = T)
deltaMethod(object = c(X = X, Y = Y), g = "X*Y", vcov = sigma)
##       Estimate    SE
## X * Y     14.7 8.042

Example 6: X/Y

First partial derivatives are:

D(expression(X/Y), "X")
## 1/Y
D(expression(X/Y), "Y")
## -(X/Y^2)

Delta standard errors may be calculated as:

X <- 3.5
Y <- 4.5
VarX <- 0.9
VarY <- 0.8
CoVar <- 0.5
X/Y
## [1] 0.7778
(1/Y)^2 * VarX + (-(X/(Y^2)))^2 * VarY + 2 * (1/Y) * (-X/(Y^2)) * CoVar
## [1] 0.02993

With R:

library(car)
sigma <- matrix(c(VarX, CoVar, CoVar, VarY), 2, 2, byrow = T)
deltaMethod(object = c(X = X, Y = Y), g = "X/Y", vcov = sigma)
##     Estimate    SE
## X/Y   0.7778 0.173

Example 7: exp(X + Y)

Partial derivatives are:

D(expression(exp(X + Y)), "X")
## exp(X + Y)
D(expression(exp(X + Y)), "Y")
## exp(X + Y)

Delta standard errors may be obtained as:

X <- 3.5
Y <- 0.5
VarX <- 0.15
VarY <- 0.08
CoVar <- 0.05
exp(X + Y)
## [1] 54.6
(exp(X + Y)^2) * VarX + (exp(X + Y)^2) * VarY + 2 * exp(X + Y) * exp(X + Y) * 
    CoVar
## [1] 983.7

With R:

library(car)
sigma <- matrix(c(VarX, CoVar, CoVar, VarY), 2, 2, byrow = T)
deltaMethod(object = c(X = X, Y = Y), g = "exp(X+Y)", vcov = sigma)
##            Estimate    SE
## exp(X + Y)     54.6 31.36

Two final notes relating to nonlinear transformations are very important:

  1. The estimated variance is always approximate.
  2. If the original variables are gaussian, the transformed variable is normally not gaussian.

The propagation of measurement errors

Measures are usually divided into two groups: direct and derived. In this latter case we have two separate direct measurements (Q and W) and need to calculate a linear combination Z = AQ + BW where A and B are two generic coefficients. We already know that Q and W are affected by random measurement errors (and thus they are random variables) and we may wonder how such errors propagate to Z.
We'll start by answering a simpler question, i.e. “I have the estimate Q with variance q, what is the variance (var) of A + B Q?” (A and B are two generic coefficients).
The answer is very simple:
\[ var(A + BQ) = A^2 var(Q) = A^2 q \]
and it is very easy to prove. I'll give you just an example: let's consider three values (e.g. 12, 15, 19) with mean equal to 15.33 and variance equal to 12.33. If we transform each of the three values by multiplying it by 3 and adding 5, we can easily verify that the variance of the transformed sample is 32 x 12.3333 = 111.
And what if we have two estimates Q and W, with variances respectively equal to q and w, and we want to estimate the variance of A Q + B W?
In this case we need to know whether the two quantities Q and W are independent or, on the contrary, what their covariance (COV) is. It can be easily proved that:
\[ var(AQ + BW) = {A^2}var(Q) + {B^2}var(W) + 2AB cov(XY) \]
A formal proof can be found in every statistics book, e.g. in Sokal and Rohlf (1981) at pag. 818. We'll show this by an R example:
Q <- c(12, 14, 11, 9)
W <- c(2, 4, 7, 8)
a <- 2
b <- 3
var(a * Q + b * W)
## [1] 35.58
a^2 * var(Q) + b^2 * var(W) + 2 * a * b * cov(Q, W)
## [1] 35.58
This implies that for two uncorrelated variables the variance of the sum equals to the sum of the variances.
Furthermore, if we remember the following equality:
\[r = \frac{cov(Q,W)}{\sqrt {var(Q)var(W)} }\]
where r is the correlation coefficient, then:
\[cov(Q,W) = r sd(Q)sd(W)\]
where sd is the standard deviation. We can modify the equation above accordingly, if we would like to use r, instead of the covariance.
And what if we have more than three quantities to combine?
The above equation can be generalised as (in words): the variance of the sum of n random variables (X1, X2, X3, … Xn) with n coefficient (B1, B2, B3, …, Bn) is equal to the product of the square of each coefficient by the variance of the corresponding variable, plus twice the product of each couple of coefficient by the covariance of the two corresponding variables.
This may be easily posed in matrix notation as:
var[A X] = ASAT
where X is the vector of random variables to be combined, A is the vector of coefficients. For example, if we want to assess the variance of the combination 3 + 6 + 4, where all coefficients are equal to one, variances are all equal to 0.5 and covariances are all equal to 0.75, we can do this by the following code:
X <- matrix(c(3, 6, 4), 3, 1)
A <- matrix(c(1, 1, 1), 1, 3)
A %*% X
##      [,1]
## [1,]   13
(Sigma <- matrix(c(0.5, 0.75, 0.75, 0.75, 0.5, 0.75, 0.75, 0.75, 0.5), 3, 3, 
    byrow = T))
##      [,1] [,2] [,3]
## [1,] 0.50 0.75 0.75
## [2,] 0.75 0.50 0.75
## [3,] 0.75 0.75 0.50
A %*% Sigma %*% t(A)
##      [,1]
## [1,]    6
Two final notes relating to linear transformations are very important:
  1. The estimated variance is always exact.
  2. If the original variables are gaussian, also the transformed variable is gaussian.