As a non FORTRAN programmer to be able to run some tests with the new NAG Library for SMP and Multicore I decided to choose a few simple tests that I could run using both CPUs of my laptop. I created some .NET console applications, wrappers around the FORTRAN routines that were able to pass all the arguments to the routines in a .NET fashion and implemented a time counting function. There are many parts of the Library parallelized, including quadrature, partial differential equations, interpolation, curve and surface fitting, linear algebra, correlation and regression analysis, multivariate methods, random number generators, time series analysis, sorting and special functions. I chose routines from the correlation, curve and surface fitting and random number generation chapters.

The results were calculated on a Intel® Core™2 Duo Prozessor P8700 (2.53GHz,1066MHz,3MB) machine with Windows Vista as operating system and Microsoft Visual Studio 2008 (32‐bit project) as compiler. Each function was used to solve a large enough problem to allow for parallelism. Each test case was run with one or two threads. The number of threads was set using the system variable ‘OMP_NUM_THREADS’ in the Visual Studio console window.

A ‘QueryPerformanceCounter’ function, was implemented in all the examples to calculate the time which the routines needed for calculating the result. Below is the class implemented for each of the examples which can be used by first creating the constructor HiPerfTimer and then using the methods Start, Stop and Duration:

HiPerfTimer pt = new HiPerfTimer(); pt.Start(); //routine pt.Stop(); pt.Duration(); internal class HiPerfTimer { [DllImport("Kernel32.dll")] private static extern bool QueryPerformanceCounter( out long lpPerformanceCount); [DllImport("Kernel32.dll")] private static extern bool QueryPerformanceFrequency( out long lpFrequency); private long startTime, stopTime; private long freq; // Constructor public HiPerfTimer() { startTime = 0; stopTime = 0; if (QueryPerformanceFrequency(out freq) == false) { throw new Win32Exception(); } } // Start the timer public void Start() { Thread.Sleep(0); QueryPerformanceCounter(out startTime); } // Stop the timer public void Stop() { QueryPerformanceCounter(out stopTime); } // Returns the duration of the timer (in seconds) public double Duration { get { return (double)(stopTime - startTime) / (double)freq; } } }After compiling the examples with the C# compiler option ‘csc’, the number of threads is first set to the value 1 and then set to the value 2. The only result which is printed is the time which the routine needs for calculating the result.

**Correlation**

NAG routine C06PKF calculates the circular convolution or correlation of two complex vectors of period n.

In this example to complex vectors are build up with a period of n (n=5000000) and the correlation and the circular convolution are calculated.

C: \C06PKF\C06PKF>csc c06pkf.cs C:\ C06PKF\C06PKF> set OMP_NUM_THREADS=1 C:\C06PKF\C06PKF>c06pkf Duration: 5.74 sec C: \C06PKF\C06PKF>set OMP_NUM_THREADS=2 C: \C06PKF\C06PKF>c06pkf Duration: 2.98 sec

**Random Number Generation**

NAG routine G05YKF generates a quasi-random sequence from a log-normal distribution. It must be preceded by a call to one of the initialization routines G05YLF or G05YNF. The number N (N = 10000000) of quasi-random numbers of dimension IDIM (IDIM = 4) are passed and also the mean (IDIM) of the underlying Normal distribution for each dimension and the std (IDIM) standard deviation of the underlying Normal distribution.

C: \ G05YKF\G05YKF>csc c06pkf.cs C: \G05YKF\G05YKF>set OMP_NUM_THREADS=1 C: \G05YKF\G05YKF>g05ykf Duration: 3.52 sec C: \G05YKF\G05YKF>set OMP_NUM_THREADS=2 C:\ G05YKF\G05YKF>g05ykf Duration: 2.06 sec

**Approximation**

NAG Routine E02CAF forms an approximation to the weighted, least-squares Chebyshev series surface fit to data arbitrarily distributed on lines parallel to one independent coordinate axis. It determines a bivariate polynomial approximation of degree k in x and l in y to the set of data points , with weights , for and . That is, the data points are on lines , but the x values may be different on each line. The polynomial is represented in double Chebyshev series form.

N, the number of lines on which data points are given, was set to 1000

K, the required degree of x was set to 100 and

L, the required degree of y was set also to 100.

Also the fitting polynomial was evaluated at the data points using the routine E02CBF.

C:\E02CAF\E02CAF >csc e02caf.cs C:\ E02CAF\E02CAF >set OMP_NUM_THREADS=1 C:\ E02CAF\E02CAF >e02caf Duration: 4.77 sec C:\ E02CAF\E02CAF >set OMP_NUM_THREADS=2 C:\ E02CAF\E02CAF >e02caf Duration: 2.48 sec

Fig 1: Timings for the SMP enabled NAG routines with one or two threads

My experience with presenting these examples at different clients has been very positive, but also interesting! “Warum verwenden Sie diese Bibliothek in .NET? Das ergibt doch keinen Sinn, da wir nur in Fortran programmieren!” “Why are you using the Library in .NET and not directly in Fortran? This doesn’t make any sense, we are coding only in Fortran!”

Others were excited to learn and understand they might use their second processor!

My conclusions are as follows

- Never show a C# program to HPC Computer Programmers who love Fortran and think languages such as C are radical!
- NAG Library for SMP and Multicore really is easy to use and for those who are current NAG Fortran Library users the transition really is painless. Users just need to take care of the number of threads used.