### Storing Banded Matrices for Speed

## Introduction

A real*n*-by-

*n*matrix

*will, in general, be defined by its*

**A***n*

^{2}elements that would need to be stored for computations to be performed using

*. If*

**A***n*is very large, this can amount to a lot of storage.

Fortunately, many real world applications use sparse matrices that are comprised mainly of zero-valued elements; the number of defining elements that need to be stored is typically some fixed multiple of

*n*(30

*n*, say). A special case is the banded matrix:

*k*sub-diagonals (blue region) and on

_{l}*k*super-diagonals (green region). The total number of non-zero elements is

_{u}*(k*minus a few corner points. What we would really like to do is solve problems using this matrix, like

_{l}+k_{u}+1)n**, but storing only the non-zero elements. Various storage formats emerged from various packages, but the most familiar to people solving such problems is that adopted by the linear algebra package LAPACK (and also used extensively in the NAG Libraries). This works by storing the non-zero elements of**

*AX = Y**in a rectangular, (*

**A***k*) by

_{l}+k_{u}+1*n*, matrix

*, such that the diagonals of*

**B***become rows of*

**A***while the columns of*

**B***remain as columns of*

**A***:*

**B**While this is appealing in terms of being able to reconstruct readily in our mind's eye the original matrix, it is less well suited to solving problems (such as

*) as quickly as possible. The problem is that most matrix computations rely, for their speed, on very fast kernel operations (written by the likes of Intel or AMD) working on smallish matrix blocks; this requires matrix blocks to be stored as you would expect for a general matrix, while in*

**AX = B***above the block elements have become higgledy-piggledy. What is required is a storage format that keeps some of the block structure of the original matrix*

**B**

**A**## The Simplest Case - to ease you in

To begin to describe an alternative storage scheme that preserves block structure, it is probably best to begin with the simplest scenario ... a symmetric matrix with bandwidth*k*and dimension

*n*= (

*k*+1)

*m*(that is,

*n*is an integer multiple of

*k+1*)

Once you partition a lower band (including main diagonal) into triangles as above, the possibilities for improved storage forms become immediately clear. The

*triangular blocks can either be slid up or slid across to the right to complete a square (*

**B***k+1*by

*k+1*) block with a neighbouring

*triangular block. To make it easier to convert from LAPACK storage format, I will go for sliding the*

**A***blocks up, giving:*

**B****triangle in**

*A*_{3}*Rectangular Full Packed*(RFP) format [see http://www.nag.co.uk/numeric/fl/nagdoc_fl23/xhtml/F07/f07intro.xml#recomm_32a], but this may be a little over-the-top.

### What can you do with these blocks?

Those in touch with their linear algebra side will now wonder what can be done with the banded matrix stored in this form. In particular, they will whisper the words*Cholesky decomposition*. Don't worry, it is just the first of two steps in solving

*, a set of simultaneous equations. Without going into the nitty-gritty, I will just say that the lower triangular banded*

**AX = Y***formed in this decomposition can be partitioned exactly as above, and it can work with each triangular block in turn, overwriting them as it goes. The second stage of solution,*

**L***backsubstitution*, then works with the triangular blocks of

*to return the solution*

**L**

**X**

### How do I put A into this strange storage format?

You couldn't be expected to do this yourself. On the other hand it would be easy to write a utility routine that converted from the LAPACK style format to the above triangular block style; and this can be done in-place since columns are preserved. The mapping between elements of the blocked storage matrix (*, say) and the elements of*

**C****is also quite straightforward:**

*A***for**

*C*_{qj}= A_{ij}*j = 1,...,n*and

*i = j,...,j+k*, where

*q = mod(i-1,k+1) + 1*.

*is a function of*

**A**_{ij}*i*and

*j*.

## The next simplest case -- to lessen your ease

In the simplest case everything fitted just nicely, but for more general*n*there are going to be some leftovers at the bottom-right of

*. A picture tells the story ...*

**A**And the mapping above tells us where to put the left-overs ...

This looks great Lawrence. Will you be posting explicit demonstrations showing how much this storage scheme can improve code execution time?

ReplyDeleteHi Mike,

ReplyDeletemy next blog in the series will give the basic idea for

general banded matrices. Following that, the third in the series will show some experimental results for solving a symmetric linear system.

Lawrence

Eagerly awaiting the next article in the series...

ReplyDelete