Thursday, 29 April 2010

Waiting for software and school children

I spent a few hours with colleagues at a major national laboratory earlier this week talking about national security, the role of software and school children. On the surface, they would seem to have little connection but as I drove back to the office the connections started to form. I invite you to follow along. In North America, Europe and other major economies throughout the world we spend huge sums of money in an attempt to detect and prevent national border intrusions and terrorist activities. In the process we've created a massive industry to x-ray, scan and monitor everything that moves in airports and elsewhere. In the process, we've put hundreds of thousand of people into jobs that apparently weren't needed 40 years ago, spent billions in taxes and worst of all, thrown sand into the gears of both business productivity and leisure time by a process ostensibly designed to keep us safe. We've created a huge "tax" in time and money when the real problem is to predict, detect and prevent a few "outlier" events from occurring. My question: where's the software? Isn't the promise of software to be able to predict and detect things on which we have mind-numbing amounts of data? In some respects, the situation in security is much like the situation in HPC: the hardware is constantly evolving and relatively easy to buy and software is difficult and time-consuming to write, validate and document. You were wondering where the children came in? Ah yes. While I had lunch in the lab's common dining area, two bus loads of primary school children on a field trip to the lab arrived to eat their lunch. No doubt they had spent the morning oohing and ahhing over particle accelerators and superconducting magnets. I couldn't help but think how we could get them interested in working on the software that is part of the fabric of modern society. Every time I go through the "remove coat, shoes, laptop, liquids, mobile phone and walk slowly through the scanner dance" I wonder if we in software will find a solution or if we'll have to wait for them to do it for us.

Thursday, 22 April 2010

Plotting Lambert's W Function

For this post I wanted to come up with something a little easier on the eye than my previous offerings (Loading DTDs using DOM in Python, Colorised svn diffs and How do you like your Fortran 95 interfaces?).

Here you go:




I've been developing code for computing complex values of Lambert's W function, and I thought I'd share the Riemann-surface plots above, which are, from top to bottom, Re(W), Im(W) and abs(W) for the branches -1 (Blues), 0 (Greens) and 1 (Oranges). The route I've taken to get the plots has been a good learning experience for me.

First approach

My initial idea was to write a mex wrapper to my algorithmic Fortran code, along the lines of what happens for the NAG Toolbox for MATLAB. I also wanted to see if I could provide a vectorized interface, as native MATLAB functions have.


Whew, it's not much fun debugging mex files, is it?


The main thing I learned from this experience was to use mxCreateNumericMatrix to create my function's return values: in this case the W value, the residual error and the fail flag.
/* prhs[2] is the array of input zs */
dims = mxGetDimensions(prhs[2]);
plhs[0] = mxCreateNumericMatrix(dims[0], dims[1], 
                                mxDOUBLE_CLASS,
                                mxCOMPLEX);

if (nlhs >= 2)
{
  plhs[1] = mxCreateNumericMatrix(dims[0], dims[1],
                                  mxDOUBLE_CLASS,
                                  mxREAL);
  
  if (nlhs >= 3)
    {
      plhs[2] = mxCreateNumericMatrix(dims[0], dims[1],
                                      mxINT_CLASS,
                                      mxREAL);
    }

}
Other approaches I had tried, such as populating local mxArray *s which were only assigned back to the plhs arrays at the end (and if required) caused all sorts of upset to MATLAB when it exited my mex routine.

With the working code, I managed to get some good plots in a MATLABy way. I felt a little discontent though, and then I remembered the advice of my old (and sadly departed) A-level statistics teacher, about how it's better to be able to solve a problem two ways than one.

Second approach

I thought back to Mike Croucher's PyNAG tutorials on calling the NAG C Library from Python, and decided it might be fun to call my algorithmic Fortran code from Python and use matplotlib to do the plotting. While I was at it I decided I'd see if I could get Fortran 2003's C Interoperability to help me make the call to Fortran from Python more straightforward; in the 'unassisted' case, calling Fortran from Python is very similar to calling Fortran from C, with the usual headaches that entails.

The primary hoop I had to jump through to allow C interoperability was to map Fortran character strings to arrays:
...
   USE, INTRINSIC :: iso_c_binding, ONLY : c_char
   ...
   CHARACTER (kind=c_char), DIMENSION (6), &
      INTENT (IN) :: in_str
   CHARACTER (kind=c_char), DIMENSION (10), &
      INTENT (OUT) :: out_str
   CHARACTER (len=SIZE(in_str)) :: local_in_str
   CHARACTER (len=SIZE(out_str)) :: local_out_str
   INTEGER :: i
   ...
   DO i = 1, SIZE(in_str)
      local_in_str(i:i) = in_str(i)
   END DO
   ...
!  Here's where all the real stuff happens. Operate on
!  local_in_str and local_out_str.
   ...
   DO i = 1, SIZE(out_str)
      out_str(i) = local_out_str(i:i)
   END DO

END
Then with a similarly-vectorized and Pythonic wrapper around my Fortran call I generated the plots above using
from mpl_toolkits.mplot3d import \
     Axes3D
from matplotlib import \
     cm
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = Axes3D(fig)
n = 100
lower = -1
upper = -lower
threshold = 0.001
X1, Y1 = np.meshgrid(np.linspace(lower, upper, n),
                     np.linspace(lower, -threshold, n))
X2, Y2 = np.meshgrid(np.linspace(lower, upper, n),
                     np.linspace(threshold, upper, n))
cmaps = {-1: cm.Blues,
         0: cm.Greens,
         1: cm.Oranges}

# Loop over some branches
for b in [-1, 0, 1]:
    w = nag_LambertW(X1 + 1j*Y1, b=b, fail=1)[0]
    ax.plot_surface(X1, Y1, w.imag,
                    rstride=1, cstride=1, cmap=cmaps[b])
    w = nag_LambertW(X2 + 1j*Y2, b=b, fail=1)[0]
    ax.plot_surface(X2, Y2, w.imag,
                    rstride=1, cstride=1, cmap=cmaps[b])

plt.show()
I had to do two separate subplots for each branch to avoid matplotlib interpolating across the discontinuity at the branch cuts. Being a bit of a matplotlib newb, I'm now wondering
  • is there a better way of telling matplotlib to avoid discontinuities?
  • is it possible to apply a colour map continuously across two subplots?
  • doesn't plt.title('Title') work with surface plots?

Thursday, 15 April 2010

New product launch = busy, but happy marketing department!

Today is an exciting day for NAG. The launch of the latest release of the NAG Library for SMP & multicore, Mark 22.

A new product launch (or release) is always a busy time for any marketing department. Months of planning and preparing new literature, presentations, collating benchmark data, new advertising campaigns, exhibition stands etc, and then comes the fun part - announcing to the outside world.

As with all marketing activities, there's a plethora of best practice guides in achieving a successful product launch. Our goal in the marketing department at NAG is to ensure that our news reaches every single user, potential user and specific media.

How do we achieve this goal?

The relatively recent emergence of new communication methods, blogging, microblogging and social media have transformed how we communicate with our audience and has made this launch different from those previous to it. What's so great about these new channels to market is that we can communicate and engage directly and instantaneously with those most important to us as an organisation. This is priceless.

Using these new methods in conjunction with more traditional marketing such as press releases and email blasts will serve to create some of the "buzz" that's needed to achieve early mindshare. However, even with the slickest of launch operations, if the product itself falls short of expectation and fails to impress then all the marketing effort is wasted.

Thankfully at NAG, our products reflect our reputation for quality and correctness. All new numerical routines have to undergo stringent quality checks. This vital testing process results in a portfolio of world-class products that simply wouldn't reach launch if they weren't something we could be proud of.

The NAG Library for SMP & multicore launched on 15th April 2010.

Friday, 9 April 2010

Scratch

How early can you be taught to program? A thought that has struck me since my wife gave birth to our first born son a month ago. There is no denying that the thought processes that have to be learnt in order to program are invaluable and transferrable abilities. Skills such as logical thinking, reasoning, planning, modularity, that are applicable in all forms of jobs not just ones involving programming.

Having only learnt to program three years ago, I wish I had had more of a chance to program in my early years. So if ever my son is interested in programming I might start him off with a kids programming language called scratch that I heard about in a magazine.

Scratch is a pictorial drag and drop language that feels a bit like lego in style. Different types of variables are different shapes and only fit in the right shaped holes. It's not a very powerful language, but a good educational tool for the youngsters as it is easy to use, understand, and they can create a lot of fun interactive programs very easily.

I’m all for encouraging young ones to learn useful skills, and who knows, these new prodigy scratchers may well be updating and maintaining the NAG library in years to come.

Thursday, 1 April 2010

Easter egg or not?

Hi, I'm Marcin, technical support engineer here at NAG and I've recently encountered an interesting problem. I coauthored a recent paper on option pricing with the NAG Library (Option pricing in Excel). My coauthor Jeremy subsequently blogged about it (Extending, imitating and collaborating). Now it’s time for the next step.

While working on a demonstration of NAG random number generators in Excel and their application in option pricing using the Monte Carlo method I noticed a considerable discrepancy in results while using unscrambled Sobol quasirandom numbers. Well, to start with, how do you price a European option when the underlying asset follows geometric Brownian motion? Basically for each random number you simulate what the price of the asset will be in, say T=1 year, moving in one step from the starting time point t=0 to the end point t=T, calculate the payoff, and the average value of all payoffs will be our desired option price (Monte Carlo methods for option pricing). I’m not going to get into the details of Monte Carlo method, that will be covered in a paper I'm currently working on. The idea is that this simulation works very well and gives good convergence to the result obtained using the close-form formula, no surprises here. The stock prices have log-normal distribution, as expected. This is illustrated in the figure below. This histogram groups the stock prices from 20000 simulations.



One could however want to price Asian options, which are path-dependent. The payoff of such options depends not on the final price of the asset (as in European options), but on the average price of the asset throughout the life of the option. We therefore need to generate a path of stock prices, so we’ll move from t=0 to t=T not in one step as previously, but in many steps, say 252 (this is the average number of trading days per year). In practice we generate 252 random numbers, generate a path of the asset price and for the final step we calculate the payoff. This is done for every Monte Carlo simulation, so as you’ve probably noticed we need 252 times more random numbers than in case of the European option. The quantity is not an issue, the Sobol sequence is long enough. What happens is that in case of such pathwise Monte Carlo simulation the distribution is not log-normal, it doesn’t actually resemble any distribution, as can be seen below.



Why is that? Why do the quasirandom numbers fail? The simulation using the same data, but pseudorandom numbers instead of quasirandom, does return stock prices that are log-normally distributed (below).



What is going on here? Well, I know the answer, it took me some time to realise, and it would have taken more time if not the help of my colleagues Jacques and Robert. Do you, dear reader of this blog, know why it is like this? Is this just an easter egg, or an interesting property of quasirandom numbers? Happy Easter!