# Richardson Extrapolation In this section we introduce the mathematical machinery of *Richardson's method*. ## Motivation Say you want to compute the value of some function $g: \mathbb{R}_+ \to \mathbb{R}^{m\times n}, h \mapsto g(h)$ as $h \to 0$; however, $\lim_{h\to\infty} g(h)\neq g(0)$. We can approximate the limit by evaluating the function at values close to zero on a computer. The error of our approximation naturally depends on $g$. In certain cases it is possible to express this error in a specific way, in which case we can improve upon the order of our error using Richardson's method. ### Example Lets start with an easy case where $f: \mathbb{R} \to \mathbb{R}$ is the function of interest. Using central differences we can approximate $f'$ at some point $x \in \mathbb{R}$ by $g(h) := \frac{f(x+h) - f(x-h)}{2h}$. Note that $g(h) \to f'(x)$ as $h \to 0$ if $f$ is differentiable at $x$; however, $g(0)$ is not defined and hence in particular unequal to $f'(x)$. To quantify the error of using $g(h)$ instead of $f'(x)$ we can rely on Taylor's Theorem (assuming that $f$ has a Taylor representation): $$ f(x+h) &= f(x) + f'(x)h + f''(x)\frac{h^2}{2} + f'''(x)\frac{h^3}{6} + \dots\\ f(x-h) &= f(x) - f'(x)h + f''(x)\frac{h^2}{2} - f'''(x)\frac{h^3}{6} - \dots\\[1em] \implies& f(x+h) - f(x-h) = 2hf'(x) + 2\frac{h^3}{6} f'''(x) + 2\frac{h^5}{5!} f^{(5)}(x) + \dots \\ \implies& g(h) \stackrel{def}{=} \frac{f(x+h) - f(x-h)}{2h} = f'(x) + h^2 \frac{f'''(x)}{3!} + h^4 \frac{f^{(5)}(x)}{5!} + \dots \\ \implies& g(h) = f'(x) + \sum_{i=0}^{\infty} a_i h^{2+2i} = f'(x) + \mathcal{O}(h^2) $$ where $\mathcal{O}(\cdot)$ denotes the Landau notation. Richardson's method can be used to improve the error rate $\mathcal{O}(h^2)$. ## General case In general Richardson's method considers sequences that can be written as: $$ g(h) = L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi,} $$ where $L \in \mathbb{R}$ denotes the limit of interest, $\theta$ the *base order of the approximation* and $\phi$ the *exponential step*. Allthough Richardson's method works for general sequences, we are mostly interested in the sequences arising when estimating derivatives. ### Example (contd.) For standard derivative estimates we have | Method | $L$ | $\theta$ | $\phi$ | | -------------- | ------- | -------- | ------ | | forward diff. | $f'(x)$ | 1 | 1 | | backward diff. | $f'(x)$ | 1 | 1 | | central diff. | $f'(x)$ | 2 | 2 | ## Richardson Extrapolation From the above table we see that, in general, central differences have a lower approximation error $\mathcal{O}(h^2)$ than forward or backward differences $\mathcal{O}(h)$. > **Question**: Can we improve upon this further? Let us evaluate $g$ at multiple values $h_0, h_1, h_2, \dots$, where it will turn out to be useful to choose values $h, h/2, h/4, h/8, \dots$ given some prechosen $h > 0$. More generally $\{ h_n \}_n, h_n = h/2^n$ for $n \in \mathbb{N}$. This allows us to write $$ g(h) &= L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi}\\ g(h/2) &= L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi} \frac{1}{2^{\theta +i \phi}}\\ g(h/4) &= L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi} \frac{1}{4^{\theta +i \phi}}\\ &\vdots $$ Now approximate the $g(h_n)$ by dropping all elements in the infinite sum after $i=1$ and collect the approximation error using the term $\eta(h_n)$: $$ g(h) &= \tilde{g}(h) + \eta(h) := L + \sum_{i=0}^{1} a_i h^{\theta +i \phi} \\ g(h/2) &= \tilde{g}(h/2) + \eta(h/2) := L + \sum_{i=0}^{1} a_i h^{\theta +i \phi} \frac{1}{2^{\theta +i \phi}}\\ g(h/4) &= \tilde{g}(h/4) + \eta(h/4) := L + \sum_{i=0}^{1} a_i h^{\theta +i \phi} \frac{1}{4^{\theta +i \phi}}\\ &\vdots $$ Notice that we are now able to summarize the equations as $$ \begin{bmatrix} g(h) \\ g(h/2) \\ g(h/4) \end{bmatrix} = \begin{bmatrix} 1 & h^\theta & h^{\theta + \phi} \\ 1 & {h^\theta}/{2^\theta} & {h^{\theta + \phi}}/{(2^{\theta + \phi})} \\ 1 & {h^\theta}/{4^\theta} & {h^{\theta + \phi}}/{(4^{\theta + \phi})} \\ \end{bmatrix} \begin{bmatrix} L \\ a_0 \\ a_1 \end{bmatrix} + \begin{bmatrix} \eta (h)\\ \eta (h/2) \\ \eta (h/4) \end{bmatrix} $$ which we write in shorthand notation as $$ (\ast): \,\,\, g = H \begin{bmatrix} L \\ a_0 \\ a_1 \end{bmatrix} + \eta \,. $$ From looking at equation ($\ast$) we see that an improved estimate of $L$ can be obtained by projecting $g$ onto $H$. ### Remark To get a better intuition for ($\ast$) consider $H$ in more detail. For the sake of clarity let $\theta = \phi = 2$. $$ H = \begin{bmatrix} 1 & h^2 & h^4 \\ 1 & h^2/2^2 & h^4/2^4 \\ 1 & h^2/4^2 & h^4/4^4 \\ \end{bmatrix} = \begin{bmatrix} 1 & h^2 & h^4 \\ 1 & (h/2)^2 & (h/2)^4 \\ 1 & (h/4)^2 & (h/4)^4 \\ \end{bmatrix} $$ Hence $H$ is a design matrix constructed from polynomial terms of degree $0,2,4,\dots$ (in general: $0,\theta, \theta + \phi, \theta + 2\phi,\dots$) evaluated at the observed points $h, h/2,h/4,h/8, \dots$. In other words, dependant on the step-size of the derivative ($h$), we fit a polynomial model to the derivative estimate and approximate the true derivative using the fitted intercept. The usual estimate is then given by $\hat{L} := e_1^T (H^T H)^{-1} H^T g$ which is equal to $e_1^T H^{-1} g = \sum_{i} \{H^{-1}\}_{1,i} g_i$ in case $H$ is regular. ## Did we improve the error rate? Let us first consider the error function $\eta: h \to \eta (h)$ in more detail. We see that $$ \eta(h) = g(h) - \tilde{g}(h) = L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi} - (L + \sum_{i=0}^{1} a_i h^{\theta +i \phi}) = \sum_{i=2}^{\infty} h^{\theta +i \phi} = \mathcal{O}(h^{\theta +2 \phi}) \,. $$ Now consider the case where $H$ is regular (which happens here when $H$ is quadratic). We then have, using ($\ast$) $$ g = H \begin{bmatrix} L \\ a_0 \\ a_1 \end{bmatrix} + \eta \implies H^{-1} g = \begin{bmatrix} L \\ a_0 \\ a_1 \end{bmatrix} + H^{-1} \eta $$ To get a better view on the error rate consider our ongoing example again. ### Example (contd.) With $$ H = \begin{bmatrix} 1 & h^2 & h^4 \\ 1 & (h/2)^2 & (h/2)^4 \\ 1 & (h/4)^2 & (h/4)^4 \\ \end{bmatrix} $$ we get $$ H^{-1} = \frac{1}{45} \begin{bmatrix} 1 & -20 & 64\\ -20/h^2 & 340/h^2 & -320/h^2\\ 64/h^4 & -320/h^4 & 256/h^4 \end{bmatrix} $$ Further, since for central differences $\theta = \phi = 2$ we have $\eta (h_n) = \mathcal{O}(h^6)$ for all $n$ and thus: $$ H^{-1} \eta = H^{-1} \begin{bmatrix} \eta(h) \\ \eta (h/2) \\ \eta (h/4) \\ \end{bmatrix} = \begin{bmatrix} \mathcal{O}(h^6) \\ \dots \\ \dots \\ \end{bmatrix} \implies \hat{L} = \{H^{-1} g \}_1 = L + \mathcal{O}(h^6) $$ And so indeed we improved the error rate.