I've written in this blog before about the problems of Wandering Precision - where the results computed by a program are not consistent, even when running exactly the same program on the same machine with the same data several times in a row.

At SC12 in Salt Lake City a couple of weeks ago I took part in a "Birds of a Feather" session organised by Intel, where problems like these, associated with bitwise reproducibility of results, were discussed. On the panel, apart from myself, were representatives of Intel's Math Kernel Library, The MathWorks (producers of MATLAB), German engineering company GNS, and the University Of California, Berkeley.

We all gave brief presentations discussing the impact of non-reproducible results on our work or on our users, and ways around the problem. It turned out that most of

our presentations were remarkably similar, involving tales of disgruntled users who were not happy about accepting such varying results. This is true even when the same

users completely understand the fact that numerical software is subject to rounding errors, and that the varying results they see are not incorrect - they may be equally good approximations to an ideal mathematical result. But they just don't like inconsistency - in fact, often they would prefer to get an answer that is accurate to fewer digits of precision so long as it is consistent, rather than inconsistent but slightly more accurate results.

As it happens, we do have some control over these inconsistent results, which are largely due to optimizations introduced by clever compilers that are designed to take advantage of fast SIMD instructions like SSE and AVX that are available on modern hardware. By using appropriate compiler flags on the Intel Fortran and C compilers, for example, we can avoid these optimizations, at the cost of making the code run up to 12 or 15% slower (according to Intel).

For NAG Libraries, we've decided what we're going to do in future. Most NAG Library products are distributed in two variants - one which is based on fast vendor library kernels (like MKL) and one that is not, but consists of all-NAG versions of routines like BLAS and LAPACK. Typically we expect the all-NAG variant to run slower than the MKL-based variant, so what we plan to do for the next Marks of our libraries is to compile the all-NAG variant library avoiding the SIMD optimizations, but compile the MKL_based library still to use them. That way, we hope to get the best of both worlds - our users can choose whether they want better consistency, or better performance.

The first NAG Library product to be affected by this decision will be Mark 24 of the NAG Fortran Library, which is scheduled for release to users sometime in the first part of 2013.

Interesting. But what about SMP libraries, where we tend to assume the associative law hols when adding up the results from different processors? Any notes of this panel - it would be nice to have something to cite?

ReplyDeleteYes, of course SMP does also throw a spanner in the works, and can lead to similar problems of reproducibility (which can also sometimes have the effect of masking real errors).

DeleteI don't think any panel notes are available yet, but I've asked Intel.

Thanks for this Mick. As usual, you've made us all better informed. It'll be interesting to see how many users choose the more consistent version at Mark 24.

ReplyDeleteBTW, is NAG Toolbox for Matlab MKL based or not? In other words, are results from Matlab using NAG reproducible?

ReplyDeleteWith respect to RobMNAG's question: I am pretty sure ALL major banks will use non-MKL version. Banks are required to report consistent numbers.

Thanks.

Some versions of the NAG Toolbox use MKL and some ACML. None of the current versions of the Toolbox (they are all at Mark 23) were compiled with the flags that would guarantee bitwise reproducibility, so we'll need to think carefully about what we do at Mark 24.

Delete