Why is writing good numerical software so hard?

People sometimes say to us "Why does NAG continue to exist? Surely all the good numerical algorithms have already been devised and implemented?". Well, this question ignores the fact that people keep finding new kinds of problem to solve. It also forgets that we need to make our current software continue to work robustly and efficiently as new hardware and software infrastructures develop - we need our code to run on all the platforms and in all the environments that our customers use.

Making the NAG library work with a new compiler is always a non-trivial task (and sometimes it can seem more like a nightmare - just ask some of my colleagues!) It may be because we've got some code that's not yet been tested in a wide enough variety of environments. Or it may be because compiler optimization switches can reorder our code in ways that make the floating-point arithmetic behave in a way that we hadn't anticipated (which could be the compiler's fault or could be our fault, depending). But whoever is to blame, we have to fix it.

When you are dealing with complicated numerical codes, tracking down a problem might take a lot of hard work with a debugger (or with print statements if you're unlucky enough to be in an environment where a debugger just won't work). And, what look like even the simplest operations might turn out to be a lot more complicated than you'd expect.

As an example, take the method of dividing one complex number by another. Languages like Fortran have in-built support for complex numbers and complex arithmetic, but not all do, and so the NAG Library does contain some routines to help.

Performing a complex division is simple in principle. If the numbers are called X and Y, in general they are composed of real and imaginary parts, let's call them X.re and X.im, Y.re and Y.im. Then to compute the value Z = X / Y, just multiply both the top and bottom halves of the quotient by the complex conjugate of Y (i.e. the complex number you get when you negate the imaginary part of Y). Since we're multiplying the top and bottom by the same thing, there's no net effect on the answer, but the clever part is that when you multiply a complex number by its own conjugate, the answer is real. That means that Y * conjugate(Y) is real, and the complex division operation is reduced to an (easier) complex multiplication followed by a few real divisions. The complex multiplication is done by cross multiplying (in real arithmetic) the real and imaginary parts of X and Y, then adding and subtracting appropriate pieces to form the real and imaginary parts of the result. Counting all the floating-point operations up, this turns out to be 6 multiplies, 3 additions or subtractions, and 2 divides - a total of 11 operations - quite significant compared to a single real division, but still not a lot of work.

So - it can't be all that hard to get it right can it?

Wrong. We need to take some care here. For example, if the complex numbers X and Y are very large (and I'm talking about larger than you might ordinarily meet - say about the size of or bigger than the square root of the largest number you have on your computer), there's a chance that although the number Z = X / Y is a perfectly reasonable number, some of those intermediate 11 computations could cause arithmetic overflow. You'll get garbage as a result - not good. People are not happy if they divide a large number by itself and get something not equal to 1!

The implementation of complex division that had been in the NAG Library since the early 1970s took care to avoid unnecessary overflow and underflow by doing some cunning scaling and rearranging into 4 multiplies, 4 adds or subtracts, and 3 divides. The algorithm is documented here in the Mark 22 NAG Library Manual


and I'd never given it much thought - I always assumed this algorithm was fine.

Well, we recently discovered that the method used by routine A02AC was not as perfect as I'd thought! The reason? Old age, really. The algorithm was based on a method by Robert Smith, published in Communications of the ACM way back in August 1962, and this had been developed in the dark days before the IEEE standard for floating-point arithmetic was designed in the 1980s (Standard for Binary Floating-Point Arithmetic (IEEE 754-1985)).

The IEEE standard introduced the concept of denormalized numbers - tiny but non-zero numbers which have less precision than regular numbers. It turned out that if you fed these into the NAG routine, you might still end up with garbage. For example, dividing the number X which has real and imaginary parts equal to the same denormalized number would return not the expected value 1, but a complex number composed of real part infinity and imaginary part NaN (NaN stands for "Not a Number"). Yuk.

Does it matter? Probably not much - not many people would call a NAG routine for something as "simple" as complex division (they'd rely on their compiler) and even if they did the chances are they would not be using denormalized numbers. But, a curse of working at NAG is that as soon as you find out about a problem like this you feel obliged to do something about it, and that's what my colleague Mat did. He located newer published methods which did take account of denormalized numbers, by introducing extra scaling operations, and based a revised version of the NAG routine on one of those. I won't bore you with the details of it here - suffice it to say that the newer method is a bit more complicated, but deals with the problem nicely, and it will debut in the next version of the NAG Library.

This simple case illustrates that even for the most basic numerical software things can go wrong if you don't take care. Imagine what it's like for something complicated like the quadratic formula! (I'm only half joking - safely finding the roots of a quadratic equation takes a frighteningly large number of lines of code).


Popular posts from this blog

Implied Volatility using Python's Pandas Library

C++ wrappers for the NAG C Library

ParaView, VTK files and endianness