Wednesday, 15 May 2013

Calling the NAG C Library from kdb+

Kx Systems was founded in 1993 to address a simple problem: the inability of traditional relational database technology to keep up with escalating volumes of data. This problem is even more acute today, where data records, especially in the financial industry, number in the trillions. To provide users to fast data, the founders of Kx have developed kdb+ and vector-processing programming language 'Q'.

Kdb+ has the ability to quickly process data and for more complex analysis it has access to other programming languages. To call the NAG C Library from kdb+, a wrapper must be placed around the NAG routine, taking care when passing data types. The wrapped functions must then be compiled into an external library. Below is an example for calling a NAG regression routine:

#include <nag.h>
#include <nagg02.h>
#include <string.h>
#include <nag_stdlib.h>
#define KXVER
3          // Using version 3, this must come before #include "k.h"
#include"k.h"              // kx header file containing K structures

K regression(K x, K y)          //The two arrays we will input from kdb+
{
       double *wt=0;
       NagError fail;            
       INIT_FAIL(fail);
       K kout;                         //Array that we will return to kdb+
       kout=ktn(KF,7);            //Allocate kout - it will have 7 elements. KF indicates
                                          // it is an array of floating point numbers
       F*out=kF(kout);            //Set pointer to kout - comes after kout is allocated

       F*(x_input)=kF(x);         //Need to set pointers to input arrays into NAG
       F*(y_input)=kF(y);       
    
       nag_simple_linear_regression( Nag_AboutMean, x->n, x_input, y_input, wt, &out[0], &out[1], &out[2], &out[3], &out[4], &out[5], &out[6], &fail );

       if(fail.code != NE_NOERROR)
              return r0(kout),krr(ss(fail.message));    //Free memory and return any
                                                                         //NAG fail to Q
       return kout;
}

In production level code, we should really check array bounds and data types. If you would like, you can add the following checks:

     if(x->n != y->n) return krr("data not same length");
     if(x->t != -KF || y->t != -KF) return krr("input data wrong type");

We must then compile the above code into a sharable library:
gcc -shared -fPIC nag_regression.c -I/[NAG_Directory]/include  [NAG_Directory]/lib/libnagc_nag.so -o nag_regression.so

Before we start the Q language, make sure the library can be located:
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:[path_to_nag_regression.so]

Finally, we load the library in Q and test the NAG example program:
q)nag_reg:`nag_regression 2:(`regression;2)
q)nag_reg[1.0 0.0 4.0 7.5 2.5 0.0 10.0 5.0;20.0 15.5 28.3 45.0 24.5 10.0 99.0 31.2]
7.598166   7.090489   6.685763   1.322359   .8273433   965.2454   6

Outputs from the regression are the regression coefficients, the standard errors, the R-squared, sum of squares, and degrees of freedom (which all agree with the NAG example).

Interested in other NAG routines in kdb+? Email support@nag.com and let us know! Many thanks to the support team at Kx Systems.

Thursday, 28 March 2013

Evaluation of non-linear expressions using Excel

NAG provides VB headers for all the routines in the library allowing them to be called easily from Microsoft Excel given a little knowledge of Visual Basic. Along with the headers, some Excel examples are distributed with the product. You may also find demo worksheets for a variety of routines on the NAG web site.

Combining the ease of use of Excel and the power of NAG routines is a great way of creating simple, flexible interfaces for solving difficult problems. Typically you write VBA code to handle the reading and writing of data to the Excel worksheet and the passing of parameters to the NAG routines. Once this VBA code has been written the problem parameters can be easily changed and tweaked from the worksheet without having to alter any code again.

Routines which take only simple parameters such as scalar values, vectors and matrices can have all their arguments input via the Excel worksheet however for routines that require non-linear expression such as those routines with callback functions it isn’t so straightforward. One method is to simply write the callback in VBA however you lose the interactivity gained from using Excel as every time you want to change the problem you would have to edit the VBA code. This isn’t a big deal but there is another way!

The VBA function evaluate() can be used for just this purpose and, along with a few tricks, allows  natural input of non-linear expressions via the worksheet. The demo for the NAG optimization routine e04jcf takes advantage of this method. As you can see the objective function is expressed naturally on the sheet along with the simple input parameters required for the routine e04jcf:


Those with a keen eye will have picked up on the fact that the objective function is described in terms of X1…X4 but the initial estimation is given in cells B11…B14. The trick is that once the initial estimate has been read from the worksheet and stored in a VBA array, it gets copied back to the sheet in cells X1…X4 before the NAG routine e04jcf is called. Of course you could describe the objective function in terms of which-ever cells you like. It should be noted that this is not the most efficient method due to the extra copying of data that is required so it may not be suitable if your problem is large.

This technique can be exploited with many other routines in the NAG Library such as root finding, quadrature, systems of non-linear equations and many more. Once a worksheet has been created, it can be used by anyone who is familiar with Excel even if they do not know VBA.

Tip: You can hide column X by right-clicking the column header and selecting hide or set the font colour to white if you don’t want the intermediate values displayed.

The demo for e04jcf can be downloaded from our Excel page.

Friday, 15 March 2013

3 Years Later

Do you remember those episodes of Friends that consisted entirely of flashbacks - presumably the cast were too busy with other duties that week so couldn't film, or the writers too busy to think up a real story ...

The NAG Blog has been running for just over 3 years now. In that time we have had 118 posts on a wide variety of numerical computing topics including HPC, .NET, customer engagement, GPUs, optimization, and much more.

The most popular posts (pageviews) have often been the most technical ones - including:

On the HPC theme, there have been posts discussing factors of 1000x in performance, describing supercomputing to friends, family and politicians, and the relationship between supercomputers and ice cubes.

I hope you have found our blog to be a nice mix of useful content, occasional fun, and topical thoughts. Do you have a favourite NAG Blog post - or more importantly - is there a topic you would like one of our numerical computing experts to write about? From talking to customers we know that you’d like to see more technical articles which we are always working on, but we also would like to see more contributors to the blog. So if you have something to say about numerical algorithms or HPC that relates to NAG and the work that we do, get in touch and maybe we can collaborate.

Our other social networks – the NAG Linkedin Group and @NAGTalk on Twitter have grown to be fairly engaged spaces. Our aim with all our groups is to create a friendly space where users and staff can readily exchange ideas, innovate and share relevant stories. The Linkedin Group is a vetted group which means that we keep it free from spam and recruiters. Do join if you want to connect with other NAG users, collaborators and developers.

Wednesday, 27 February 2013

NAG Routines in Different Precisions

Written by David Sayers, NAG Principal Technical Consultant


How would you like to call a NAG routine and have the precision it computes in determined by the precision of the arguments? So, for example, your program might contain:
r = s15adf(x,ifail)
If x were single precision then a single precision would be returned to r. If x were double or quadruple precision then the corresponding double or quadruple computations would be performed.
I can imagine a number of scenarios where this facility might be useful, but I invite your comments on how common these might be:
  1. Single precision might be useful as once again single precision calculation becomes faster than double precision. So the user might want to trade accuracy for speed.
  2. Secondly, single precision used to be the natural interface to many graphics packages. I am not sure whether this is still the case. Would it be useful to have the capability for this reason?
  3. Double precision, the precision now offered, is generally ‘about right’ for most computations, but sometimes extra precision is needed for the trickiest calculation. Under these circumstances quadruple precision might be necessary. How common is this requirement amongst our users?
  4. If a computation is made with two different precisions and the resulting answers are broadly similar then a certain level of confidence might be gained from these computations. If on the other hand the results are very different this may indicate that the original problem was ill-conditioned. The ability to do this test easily might be something our users want. Do you think this is so?
From the developers point of view modern Fortran makes it easy to provide these facilities:

Module dks1

      Interface dks
        Module Procedure dks_1, dks_2, dks_3
      End Interface dks

    Contains
      Subroutine dks_1(x)
        Real (Kind=1) :: x

        Print *, 'single precision'
        Return
      End Subroutine dks_1

      Subroutine dks_2(x)
        Real (Kind=2) :: x

        Print *, 'double precision'
        Return
      End Subroutine dks_2

      Subroutine dks_3(x)
        Real (Kind=3) :: x

        Print *, 'quadruple precision'
        Return
      End Subroutine dks_3

    End Module dks1

    Program main
      Use dks1
      Real (Kind=1) :: x1
      Real (Kind=2) :: x2
      Real (Kind=3) :: x3

      Call dks(x1)
      Call dks(x2)
      Call dks(x3)
      Stop 'ok'
    End Program main

(Here the Kind convention is the natural one for the NAG Compiler; others might use a ‘Byte convention’ i.e. the kind number matches the number of bytes of storage. Thus some compilers use ‘Real (Kind=4)’ to denote single precision and ‘Real (Kind = 8)’ to denote double precision.
My implementation colleagues will point out that instead of checking over 3000 results per implementation they would have over 9000 if we were to provide this capability. How appreciative would our users be if we were to do this? I am sure that I wouldn't manage to convince my colleagues to take on this extra work without evidence that our customers would like it. So if you think this is worthwhile then please let us know. 

 

Wednesday, 16 January 2013

Matrix Functions in Parallel


Last year I wrote a blog post about NAG’s work on parallelising the computation of the matrix square root. More recently, as part of our Matrix Functions Knowledge Transfer Partnership with the University of Manchester, we’ve been investigating parallel implementations of the Schur-Parlett algorithm [1].
Most algorithms for computing functions of matrices are tailored for a specific function, such as the matrix exponential or the matrix square root. The Schur-Parlett algorithm is much more general; it will work for any “well behaved” function (this general term can be given a more mathematically precise meaning). For a function such as
using the Schur-Parlett algorithm should be quicker than computing the individual components separately.  This algorithm made its first appearance in Mark 23 of the NAG C Library. Our parallel improvements will be found in Mark 24 of the C Library, next year.
At the heart of the algorithm lies a Taylor series (which is truncated when it has converged sufficiently). For example:
Before this however, the matrix A is converted into upper triangular Schur form. This serves two purposes. It reduces the number of arithmetic operations required to evaluate the Taylor series. It also gives us the eigenvalues of A. The Taylor series will converge fastest when the eigenvalues lie close together. The clever bit of the algorithm is to rearrange the Schur form of A, grouping similar eigenvalues together. The diagonal part of the matrix is then split into multiple different sized sub-matrices according to this grouping. A Taylor series is evaluated for each of these diagonal blocks. The off-diagonal blocks are then computed by solving the ‘Parlett recurrence’. This is all very confusing and is best explained by the diagram below:
Here we can see the original matrix (in black), the triangular Schur form (in red), the blocking strategy (in blue) and finally the computation of the off-diagonal blocks (in green). The exact distribution of eigenvalues and therefore the number and size of diagonal blocks is totally dependent on the input matrix.

Wednesday, 2 January 2013

Our own Hall of Fame – the NAG Life Service Recognition Award

A bit of background…

NAG was established as a not-for-profit organisation and remains so, which means it has no shareholders or investors to answer to. All surpluses made are ploughed back into R&D. NAG is ‘owned’ by its members of which are people that founded NAG, current and past employees, collaborators and friends of the company. This all serves to make NAG a pretty unique place to work which might be a reason for the longevity of serving staff (NAG also happens to be an Investors in People Gold company).

It’s certainly not rare for those of us working here to see our colleagues reach the milestone that could be considered ‘lifetime service’. Many staff work at NAG for the majority of their careers and continue to do so well into their ‘golden years’.

Rewarding greatness…

During NAG’s 40th anniversary celebrations in 2011, a NAG ‘Hall of Fame’ was established to recognise outstanding efforts and commitment to the company by a member or collaborator of NAG. The Life Service Recognition Award has now been bestowed twice.

Wednesday, 28 November 2012

Bitwise Reproducibility with the NAG Libraries

I've written in this blog before about the problems of Wandering Precision - where the results computed by a program are not consistent, even when running exactly the same program on the same machine with the same data several times in a row.

At SC12 in Salt Lake City a couple of weeks ago I took part in a "Birds of a Feather" session organised by Intel, where problems like these, associated with bitwise reproducibility of results, were discussed. On the panel, apart from myself, were representatives of Intel's Math Kernel Library, The MathWorks (producers of MATLAB), German engineering company GNS, and the University Of California, Berkeley.

We all gave brief presentations discussing the impact of non-reproducible results on our work or on our users, and ways around the problem. It turned out that most of
our presentations were remarkably similar, involving tales of disgruntled users who were not happy about accepting such varying results. This is true even when the same
users completely understand the fact that numerical software is subject to rounding errors, and that the varying results they see are not incorrect - they may be equally good approximations to an ideal mathematical result. But they just don't like inconsistency - in fact, often they would prefer to get an answer that is accurate to fewer digits of precision so long as it is consistent, rather than inconsistent but slightly more accurate results.

As it happens, we do have some control over these inconsistent results, which are largely due to optimizations introduced by clever compilers that are designed to take advantage of fast SIMD instructions like SSE and AVX that are available on modern hardware. By using appropriate compiler flags on the Intel Fortran and C compilers, for example, we can avoid these optimizations, at the cost of making the code run up to 12 or 15% slower (according to Intel).

Wednesday, 24 October 2012

Now the games are over – what’s our legacy? How can we inspire a generation?

London 2012 Olympic Park
I really didn’t expect to love the Olympics as much as I did. I’m not really into sport and have never taken much interest before, but this time I was absolutely enthralled. London, being the host city, obviously had a bearing on my enthusiasm and enjoyment, but what now the games are over? A lot of talk centred around the Olympics 2012 ethos of ‘inspiring a generation’ and this got me thinking about NAG’s legacy and how we too can inspire.

NAG has had a busy year as well; we've managed to produce significant new releases of our four main numerical software products, the C and Fortran Libraries, the Library for SMP & Multicore and the Toolbox for MATLAB®. If you study the history of the Library you start to understand how truly amazing it is. So is the Library NAG’s legacy? Well, funnily enough, our legacy is right there in each of the new releases. That’s why the NAG Library is amazing. It’s an ever changing, ever improving set of codes that have been developed by people that work here or contributed by well-known and maybe not so quite well-known numerical analysts, statisticians and computer scientists from all around the world.

Back in the day – 1970 to be precise – Brian Ford, NAG’s Founding Director, was inspired with a vision of a collective inter-university numerical library and set about creating the first NAG Library. Mentored by the legendary Jim Wilkinson, Brian established NAG as an organisation that was, and still is not-for-profit. Brian said of Jim,
“he gave us his invaluable numerical linear algebra software, and his contacts".
Brian's key ideals were voluntary collaboration and quality in every phase of the activity. Over the years mathematical and statistical routines have been contributed by some highly regarded people:

Friday, 12 October 2012

The making of “1000x” – unbalanced supercomputing

I had a bit of a rant in my article published at HPCwire this week - “Chasing1000x: The future of supercomputing is unbalanced”.

The gist of my rant is that the supercomputing community pays great attention to the next factor of 1000x in performance – and I firmly agree that next 1000x is highly valuable to the HPC community and the wider economy. But, we should give equal attention to 1000x in other areas, notably ease-of-use and growth of the user-base. And, critically, give equal peer recognition to those promoting that growth and pursuing ease-of-use and ease-of-access, not reserve all our “visionary” accolades for those figuring out the details of the first exascale computers.

However, I planted an appropriate pun in the title of the article itself. The obvious meaning, in the context of the article, is that the future of supercomputing is unbalanced with respect to the focus on performance versus community growth etc. However, the double meaning should be readily recognizable to anyone active watching or developing the HPC technology roadmaps. The future of supercomputers is unbalanced – i.e., the broad summary of the many technology roadmaps out there is that future supercomputers (at all scales) will be less balanced in some key performance attributes than current technology.

Friday, 28 September 2012

The NAG SMP and Multicore Library on the Cloud

In my last post we looked at the NAG serial library on Amazon's EC2. We noticed that the CPU utilization was not anywhere close to the capacity I had on the Cloud. Below, I have loaded the NAG SMP and Multicore Library on Amazon's EC2 in hopes of utilizing all the virtual cores.
Routines Tested
The EC2 instance used was a High-CPU, c1.xlarge instance (7GB of memory and 8 Virtual Cores). The SMP Library contains 204 tuned and a total of 337 enhanced routines for use on Multicore machines from which I tested f11me(Sparse Matrix Factorization) and g02bn(Kendall/Spearman rank coefficient). While I had a maximum of 8 cores available, I decided to increase the number of threads beyond this, just to see the result. Below you will see how the time taken scales as the number of threads increases (click to enlarge):


Both routines scale well as you increase the number of threads, but f11me takes longer with 12 threads as opposed to 8! I suspect the slowdown is a result of dependencies between threads. In order for some to start, they may have to wait on other parts of the matrix factorization to finish. G02bn on the other hand doesn't require communication between threads so each one can run independently. This routine slightly benefited from running on 12 vs. 8 threads.

Wednesday, 29 August 2012

NAG on the Cloud: Part 2

In case you missed Part 1 of NAG on the Cloud, I discussed calling the Library on Windows Azure using the C# library and Microsoft's Cloud Numerics. We now turn to a different cloud provider. Amazon's Elastic Cloud Computing (EC2) allows the user to buy or rent instances of virtual computers and pay for only what you use. Below, I'll discuss how to put the NAG Library on EC2 and give some initial performance tests.

Creating an Instance 
There are a plethora of videos on creating Amazon EC2 instances. If you are going to install and run the NAG Library, make sure you start a Virtual Private Cloud when creating the instance. Private Clouds can be created at no additional cost on EC2 and remain compatible with NAG's Kusari License management system. I decided to create a couple tests:

My EC2 Console Instances

The above instances vary in size, memory, and type of OS running. When the instance is created, you can gain access to it via ssh. To load the library onto EC2, just download the .tgz file of your choice from the NAG website and secure copy it up there:

$scp -i VPCkey.pem fll6a23dfl.tgz ubuntu@50.112.131.89:/home/ubuntu

Here VPCkey.pem is a key file that only allows my computer to access this Cloud instance. Once the Library is copied, we can unzip it and the installation instructions remain the same as if the computer were right in front of you!