### NAG Routines in Different Precisions

Written by David Sayers, NAG Principal Technical Consultant

How would you like to call a NAG routine and have the precision it computes in determined by the precision of the arguments? So, for example, your program might contain:
r = s15adf(x,ifail)
If x were single precision then a single precision would be returned to r. If x were double or quadruple precision then the corresponding double or quadruple computations would be performed.
I can imagine a number of scenarios where this facility might be useful, but I invite your comments on how common these might be:
1. Single precision might be useful as once again single precision calculation becomes faster than double precision. So the user might want to trade accuracy for speed.
2. Secondly, single precision used to be the natural interface to many graphics packages. I am not sure whether this is still the case. Would it be useful to have the capability for this reason?
3. Double precision, the precision now offered, is generally ‘about right’ for most computations, but sometimes extra precision is needed for the trickiest calculation. Under these circumstances quadruple precision might be necessary. How common is this requirement amongst our users?
4. If a computation is made with two different precisions and the resulting answers are broadly similar then a certain level of confidence might be gained from these computations. If on the other hand the results are very different this may indicate that the original problem was ill-conditioned. The ability to do this test easily might be something our users want. Do you think this is so?
From the developers point of view modern Fortran makes it easy to provide these facilities:

Module dks1

Interface dks
Module Procedure dks_1, dks_2, dks_3
End Interface dks

Contains
Subroutine dks_1(x)
Real (Kind=1) :: x

Print *, 'single precision'
Return
End Subroutine dks_1

Subroutine dks_2(x)
Real (Kind=2) :: x

Print *, 'double precision'
Return
End Subroutine dks_2

Subroutine dks_3(x)
Real (Kind=3) :: x

Print *, 'quadruple precision'
Return
End Subroutine dks_3

End Module dks1

Program main
Use dks1
Real (Kind=1) :: x1
Real (Kind=2) :: x2
Real (Kind=3) :: x3

Call dks(x1)
Call dks(x2)
Call dks(x3)
Stop 'ok'
End Program main

(Here the Kind convention is the natural one for the NAG Compiler; others might use a ‘Byte convention’ i.e. the kind number matches the number of bytes of storage. Thus some compilers use ‘Real (Kind=4)’ to denote single precision and ‘Real (Kind = 8)’ to denote double precision.
My implementation colleagues will point out that instead of checking over 3000 results per implementation they would have over 9000 if we were to provide this capability. How appreciative would our users be if we were to do this? I am sure that I wouldn't manage to convince my colleagues to take on this extra work without evidence that our customers would like it. So if you think this is worthwhile then please let us know.

### Comments

1. Thanks to the huge difference in speed between single and double precision calculations in accelerators such as GPUs, the use of single precision has become more prevalent recently.

What worries me, however, is that an awful lot of people seem to say 'Oh, I'm sure single precision will be fine' without spending the time to properly check. This leads to the very real possibility of them getting the wrong answer very VERY quickly.

A great middle ground are mixed precision methods (e.g. http://www.mpi-inf.mpg.de/~strzodka/projects/double/) where one uses single precision where such is 'good enough' before switching to double precision to finish the calculation off. If done correctly, the result will be accurate to double precision but with a nice speed boost.

Quadruple precision is also increasing popular and I get a few requests a year asking about it. The best solution in MATLAB is The Multiprecision Computing Toolbox which has very fast quad precision routines http://www.advanpix.com/2013/01/20/fast-quadruple-precision-computations/

In short, I think its worth it to include both single and quadruple precision in the library. Of course, if you could throw in arbitary precision, we'd be in heaven but you'd have even more test cases to worry about!

1. When I first started at NAG, many years ago now, single precision was appreciably quicker than double precision computation and we produced two implementations for the VAX VMS system. The single precision version had names ending in ‘E’ and the ‘standard precision’ was double and so had routines ending in ‘F’. This was OK because the VAX arithmetic was pretty good. At the behest of two companies, one working in aircraft design, the other in the nuclear industry, we produced implementations for the IBM with its arithmetic. I never felt very confident about those single precision routines…

As chips evolved the superior performance of single precision, at the cost of numerical precision, diminished and there was little demand for single precision. Chips and architectures have now changed. As you point out GPU computing is an example where single precision has again become popular. The pendulum is beginning to swing back perhaps? My BLOG was trying to ascertain what interest there might be in these alternative precisions.

As you may be aware, my colleagues who work on the NAG Library and LAPACK have already acknowledged this change and Mark 23 of our library already contains mixed-precision Cholesky solvers, where single precision calculations is performed for its speed and combined with double precision to get the greater accuracy. We may well see this trend continue.

As your remark users might feel that single precision is ‘good enough’ when it isn’t. If both precisions were available, they could test this belief and incidentally get useful information on the conditioning of their problem. I would still recommend double precision as the ‘standard precision’ though and I would expect our example programs to remain unaltered, shepherding users down this route.

I get the occasional request for quadruple precision. My tendency is to be slightly sceptical, but there are occasions when it would be useful.

The point of my BLOG was to assess the likely demand for either or both of an extra single or double precision facility within the library. If demand still remains limited then I can’t see NAG undertaking the extra work, but for anyone who has a genuine need we could produce a bespoke version of a routine on a consultancy basis at a very reasonable cost. Very little of our source code would need to change; only the verification of the results would be an issue.

Genuine multiple precision is much more of a challenge and I have experienced little demand for this. We might extend our A02 chapter on arithmetic to provide multiple precision arithmetic though. Would this be of interest?

2. I can comment on item 3, and indirectly on 4:

3. I have been using routine E04UCF in quadruple precision for about 15 years with great success. The routine finds the minimum of a function of many variables. The reason for the need for "REAL*16" is that the values of the variables (and their contributions to the objective function) can differ my many, many orders of magnitude.

I've never had the feeling that I am one among many users with similar requirements, although I do see occasional queries about the use of extended precision on the forums of the compiler manufacturers (specifically Intel).

4. I recently had a need for a "mixed precision" calculation for a reason that might also apply to other users and problems: when using E04UCF in quadruple precision to solve a problem with many variables (>3000) the program was slowed to a crawl in a service routine that appeared only to be manipulating elements of large arrays.

The routine could not be parallelised, but reverting to double precision in this service routine appeared to give good (and much quicker) results. Fortunately this problem needed only a few extra digits of precision to the normal 16 or so attained in double precision, and not the 32 of quadruple precision. Otherwise the approach might not have worked.

I did not investigate the exact reason for the slowdown (much greater than the usual factor of 5 to 10 from running in quadruple precision). However, the fact that I encountered the problem does suggest that there may be more to creating a library of routines to run in either double or quadruple precisions, at the choice of the user, than just making use of the interface example in the blog entry.

1. Hi Simon,

I remember your need for quadruple precision and this partly motivated this blog. I was attempting to gauge the interest in ‘alternative precision’ versions of the library.

You are of course correct to say that once again the pendulum has swung in favour of computing with ‘mixed precisions’. At the moment only a few Nag routines that reflect LAPACK software take advantage of this. I am sure that Nag would be interested in getting a little more detail about the specific routine. In particular it would be interesting to see whether a double precision implementation would be speeded up by having an auxiliary using REAL rather than DOUBLE PRECISION arithmetic and whether there are any major drawbacks to this.

We are emerging from a period when DOUBLE PRECISION and REAL arithmetic had similar costs, so most NAG library code is DOUBLE PRECISION. Now however I agree with you that we should consider very carefully whether mixing the precisions can give worthwhile improvements. Our code base is so large that I can’t imagine this happening with old code on a wholesale basis unless we get specific pointers from users such as yourself, so I appreciate your comments. I am sure that new materials will use mixed precisions wherever it is appropriate to do so.

We will be grateful for any information you can send us.

Many Thanks,
David

#### Post a Comment

NAG moderates all replies and reserves the right to not publish posts that are deemed inappropriate.