Thursday, 23 June 2011

Storing Banded Matrices for Speed

Introduction

A real n-by-n matrix A will, in general, be defined by its n2 elements that would need to be stored for computations to be performed using A. If n is very large, this can amount to a lot of storage.



Fortunately, many real world applications use sparse matrices that are comprised mainly of zero-valued elements; the number of defining elements that need to be stored is typically some fixed multiple of n (30n, say). A special case is the banded matrix:

Illustration 1: A banded matrix A, with kl sub- and ku super-diagonals
This banded matrix has elements defined on its main diagonal (the dashed line), on kl sub-diagonals (blue region) and on kusuper-diagonals (green region). The total number of non-zero elements is (kl+ku+1)n minus a few corner points. What we would really like to do is solve problems using this matrix, like AX = Y, but storing only the non-zero elements. Various storage formats emerged from various packages, but the most familiar to people solving such problems is that adopted by the linear algebra package LAPACK (and also used extensively in the NAG Libraries). This works by storing the non-zero elements of A in a rectangular, (kl+ku+1) by n, matrix B, such that the diagonals of A become rows of B while the columns of A remain as columns of B:

Illustration 2: The matrix A stored in LAPACK banded storage format as matrix B.

While this is appealing in terms of being able to reconstruct readily in our mind's eye the original matrix, it is less well suited to solving problems (such as AX = B) as quickly as possible. The problem is that most matrix computations rely, for their speed, on very fast kernel operations (written by the likes of Intel or AMD) working on smallish matrix blocks; this requires matrix blocks to be stored as you would expect for a general matrix, while in B above the block elements have become higgledy-piggledy. What is required is a storage format that keeps some of the block structure of the original matrix A

 

The Simplest Case - to ease you in

To begin to describe an alternative storage scheme that preserves block structure, it is probably best to begin with the simplest scenario ... a symmetric matrix with bandwidth k and dimension n = (k+1)m (that is, n is an integer multiple of k+1)
Illustration 3: Symmetric banded matrix with lower half partitioned into triangular blocks.

Once you partition a lower band (including main diagonal) into triangles as above, the possibilities for improved storage forms become immediately clear. The B triangular blocks can either be slid up or slid across to the right to complete a square (k+1 by k+1) block with a neighbouring A triangular block. To make it easier to convert from LAPACK storage format, I will go for sliding the B blocks up, giving:

Illustration 4: Storage of symmetric band matrix preserving block structure.
The zero paranoiacs might want to remove the trailing zeros above by storing the last A3 triangle in Rectangular Full Packed (RFP) format [see http://www.nag.co.uk/numeric/fl/nagdoc_fl23/xhtml/F07/f07intro.xml#recomm_32a], but this may be a little over-the-top.

 

What can you do with these blocks?

Those in touch with their linear algebra side will now wonder what can be done with the banded matrix stored in this form. In particular, they will whisper the words Cholesky decomposition. Don't worry, it is just the first of two steps in solving AX = Y, a set of simultaneous equations. Without going into the nitty-gritty, I will just say that the lower triangular banded L formed in this decomposition can be partitioned exactly as above, and it can work with each triangular block in turn, overwriting them as it goes. The second stage of solution, backsubstitution, then works with the triangular blocks of L to return the solution X



How do I put A into this strange storage format?

You couldn't be expected to do this yourself. On the other hand it would be easy to write a utility routine that converted from the LAPACK style format to the above triangular block style; and this can be done in-place since columns are preserved. The mapping between elements of the blocked storage matrix (C, say) and the elements of A is also quite straightforward:
Cqj = Aij for j = 1,...,n and i = j,...,j+k, where q = mod(i-1,k+1) + 1.
This makes life easy when Aij is a function of i and j.

 

The next simplest case -- to lessen your ease

In the simplest case everything fitted just nicely, but for more general n there are going to be some leftovers at the bottom-right of A. A picture tells the story ...

Illustration 5: Symmetric banded matrix (lower half) with left-overs.

And the mapping above tells us where to put the left-overs ...

Illustration 6: Block storage pattern for symmteric banded matrix for general n.

 

Next?

This stuff is all very well when we have symmetric matrices, but what about non-symmetric matrices like that in Illustration 1? I'll leave that till the next blog (very soon). Lawrence

3 comments:

  1. This looks great Lawrence. Will you be posting explicit demonstrations showing how much this storage scheme can improve code execution time?

    ReplyDelete
  2. Hi Mike,
    my next blog in the series will give the basic idea for
    general banded matrices. Following that, the third in the series will show some experimental results for solving a symmetric linear system.
    Lawrence

    ReplyDelete
  3. Eagerly awaiting the next article in the series...

    ReplyDelete

NAG moderates all replies and reserves the right to not publish posts that are deemed inappropriate.