Storing Banded Matrices for Speed
IntroductionA real n-by-n matrix A will, in general, be defined by its n2 elements that would need to be stored for computations to be performed using A. If n is very large, this can amount to a lot of storage.
Fortunately, many real world applications use sparse matrices that are comprised mainly of zero-valued elements; the number of defining elements that need to be stored is typically some fixed multiple of n (30n, say). A special case is the banded matrix:
While this is appealing in terms of being able to reconstruct readily in our mind's eye the original matrix, it is less well suited to solving problems (such as AX = B) as quickly as possible. The problem is that most matrix computations rely, for their speed, on very fast kernel operations (written by the likes of Intel or AMD) working on smallish matrix blocks; this requires matrix blocks to be stored as you would expect for a general matrix, while in B above the block elements have become higgledy-piggledy. What is required is a storage format that keeps some of the block structure of the original matrix A
The Simplest Case - to ease you inTo begin to describe an alternative storage scheme that preserves block structure, it is probably best to begin with the simplest scenario ... a symmetric matrix with bandwidth k and dimension n = (k+1)m (that is, n is an integer multiple of k+1)
Once you partition a lower band (including main diagonal) into triangles as above, the possibilities for improved storage forms become immediately clear. The B triangular blocks can either be slid up or slid across to the right to complete a square (k+1 by k+1) block with a neighbouring A triangular block. To make it easier to convert from LAPACK storage format, I will go for sliding the B blocks up, giving:
What can you do with these blocks?Those in touch with their linear algebra side will now wonder what can be done with the banded matrix stored in this form. In particular, they will whisper the words Cholesky decomposition. Don't worry, it is just the first of two steps in solving AX = Y, a set of simultaneous equations. Without going into the nitty-gritty, I will just say that the lower triangular banded L formed in this decomposition can be partitioned exactly as above, and it can work with each triangular block in turn, overwriting them as it goes. The second stage of solution, backsubstitution, then works with the triangular blocks of L to return the solution X
How do I put A into this strange storage format?You couldn't be expected to do this yourself. On the other hand it would be easy to write a utility routine that converted from the LAPACK style format to the above triangular block style; and this can be done in-place since columns are preserved. The mapping between elements of the blocked storage matrix (C, say) and the elements of A is also quite straightforward:
Cqj = Aij for j = 1,...,n and i = j,...,j+k, where q = mod(i-1,k+1) + 1.This makes life easy when Aij is a function of i and j.
The next simplest case -- to lessen your easeIn the simplest case everything fitted just nicely, but for more general n there are going to be some leftovers at the bottom-right of A. A picture tells the story ...
And the mapping above tells us where to put the left-overs ...