Last year I wrote a blog post about NAG’s work on parallelising the computation of the matrix square root. More recently, as part of our Matrix Functions Knowledge Transfer Partnership with the University of Manchester, we’ve been investigating parallel implementations of the Schur-Parlett algorithm .
Most algorithms for computing functions of matrices are tailored for a specific function, such as the matrix exponential or the matrix square root. The Schur-Parlett algorithm is much more general; it will work for any “well behaved” function (this general term can be given a more mathematically precise meaning). For a function such as
using the Schur-Parlett algorithm should be quicker than computing the individual components separately. This algorithm made its first appearance in Mark 23 of the NAG C Library. Our parallel improvements will be found in Mark 24 of the C Library, next year.
At the heart of the algorithm lies a Taylor series (which is truncated when it has converged sufficiently). For example:
Before this however, the matrix A is converted into upper triangular Schur form. This serves two purposes. It reduces the number of arithmetic operations required to evaluate the Taylor series. It also gives us the eigenvalues of A. The Taylor series will converge fastest when the eigenvalues lie close together. The clever bit of the algorithm is to rearrange the Schur form of A, grouping similar eigenvalues together. The diagonal part of the matrix is then split into multiple different sized sub-matrices according to this grouping. A Taylor series is evaluated for each of these diagonal blocks. The off-diagonal blocks are then computed by solving the ‘Parlett recurrence’. This is all very confusing and is best explained by the diagram below:
Here we can see the original matrix (in black), the triangular Schur form (in red), the blocking strategy (in blue) and finally the computation of the off-diagonal blocks (in green). The exact distribution of eigenvalues and therefore the number and size of diagonal blocks is totally dependent on the input matrix.