In a previous blog post we explained how testing new algorithms is difficult. We discussed the forward error (how far from the actual solution are we?) and the backward error (what problem have we actually solved?) and how we'd like the backward error to be close to the unit roundoff, u.
For matrix functions, we also mentioned the idea of using identities such as sin2A + cos2A = I to test algorithms. In practice, rather than I, we might find that we obtain a matrix R close to I, perhaps with ||R-I|| ≈ 10-13. What does this tell us about how the algorithms for sin A and cos A are performing? In particular, does it tell us anything about the backward errors? We've just written a paper which aims to answer these questions. This work is an output of NAG's Knowledge Transfer Partnership with the University of Manchester, so we thought we'd blog about it here.
Let's consider the identity exp(log A) - A = 0. Suppose that when we evaluate the left-hand side in floating point arithmetic we get a nonzero residual R rather than 0. We'll assume that this residual is caused by some backward errors E1 and E2 so that exp(log(A + E1) + E2) = R. We'd like to investigate how big R can be when E1 and E2 are small, so we expand the left-hand side in a Taylor series to linear order. After a bit of algebra, the result is a linear operator relating R to E1 and E2: R = L(E1, E2). The operator is different for each identity considered, but it always involves the Fréchet derivatives of the matrix functions in the identity (the full gory details, including formulae for the linear operators associated with various identities, are in our paper).
We now have two options. The first option is to estimate the norm of the linear operator. This enables us to determine the maximum value of ||R|| consistent with backward errors of size u. The second option is to use the linear operator to explicitly estimate E1 and E2. This works just as well, but it is more expensive.
Our paper contains several examples demonstrating our approach in practice, so we'll just pick one here, involving the matrix exponential. The 10 x 10 Forsythe matrix is
where we set the parameter u = 2-53 ≈ 10-16 (the unit roundoff). It's known that the Schur-Parlett general-purpose matrix function algorithm struggles with this matrix, whereas algorithms based on scaling and squaring have no such problems. Does our approach reflect this behaviour? Using the identity eA e-A = I, the maximum normwise residual consistent with backward stability is found to be 7.1 x 10-15. The Schur-Parlett approach gives a normwise residual of 7.7 x 10-11, but the scaling and squaring algorithm gives a residual of 2.2 x 10-15. Consistent with this, the backward error estimates are 2.5 x 10-11 and 4.5 x 10-16 respectively. So our approach has correctly identified the preferred algorithm.
There are some subleties which we haven't mentioned (for example certain matrices and identities can produce misleading cancellation effects) but we now have a useful method for testing matrix function algorithms when high-precision arithmetic isn't available. The relevant routines for computing matrix functions and Fréchet derivatives are available in chapter F01 of the NAG Library.