Wednesday, 22 December 2010

ParaView, VTK files and endianness

Recently, I've been doing some work with ParaView, a rather nice open-source visualization application that's aimed at the display and analysis of large data sets. It has a wide range of visualization functionality, an interface that allows for interaction via its graphical user interface, or through scripting, and a distributed architecture. Executable versions are available for a variety of platforms, or you can download the source and compile it yourself. I'm currently running the Windows 64bit distribution on my laptop, and have also been building the source on HECToR, with a view to trying the application out on that machine, and using it for the analysis of some of the large data sets generated there.

ParaView uses the open-source Visualization Toolkit (VTK) for data processing and rendering; this consists of a core C++ class library and interpreted interface layers for Tcl/Tk, Java and Python. Much of VTK's functionality - specifically, its visualization techniques and modelling methods - is available from within ParaView, and it's also possible to extend the application to support other VTK classes by (for example) providing an XML description of the interface.

The first thing the user of any visualization system wants to do is to read their own data into it (of course, this will also be the last thing they ever do with the system, if it proves to be too recalcitrant). Doing this requires some understanding of the type of data that the system can process, and the translation of the different components of the user's data into those types. Generally speaking, it's usually most straightforward for the user to write their data to a file in a format which is supported by the system (the alternative approach - more useful when, for example, the the user has a lot of data files in a particular format - is to extend the system to support the reading of that format).

ParaView can read files in several formats, including so-called VTK Legacy files (so named because the format was introduced in an earlier version of VTK; since then, it's been supplemented by a more flexible XML-based format). Its structure is pretty straightforward, and well-documented. Here's the first part of an example Legacy file, which contains scalar data values located on a regular 3D mesh:

# vtk DataFile Version 2.0
3D example
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 61 61 101
ORIGIN 0.0 0.0 0.0
SPACING 0.05 0.05 0.05
POINT_DATA 375821
SCALARS volume_scalars float 1
LOOKUP_TABLE default

The coordinates of the nodes on this kind of mesh (which VTK calls a Structured Point dataset) are fully specified by providing values for the parameters given by the keywords DIMENSION, ORIGIN and SPACING on each axis; the other file keywords are described elsewhere, along with an account of the other types of mesh and the other data structures which VTK supports. More specifically here, the keyword ASCII specifies that the data values (which come next in the file) are written in this format. Selecting this option ensures that the entire file is human-readable and portable from one machine to another, but it might not be the best choice if the data file is very large (for the usual reasons of space and speed). Switching to BINARY makes the file more compact and allows it to be written and read more quickly.

Here's where I ran into a (slight) problem: although ParaView read the the ASCII version of my file and displayed it correctly, this broke when I switched to the binary version (more accurately, ParaView read the file in happily but the values which it found there were wrong). After some poking around in the documentation, I realised the problem was that VTK - and hence, ParaView - always expects the binary data in the Legacy file to be stored with the most significant byte first - also known as the big-endian representation. (To be strictly honest, I couldn't find this point specified unambiguously in the otherwise-excellent VTK file formats guide, but it was emphasised more explicitly elsewhere). However, my file had been output by a program in which the data was stored with the least significant byte first - i.e., in the so-called little-endian fashion. The representation used by the program to store the data depends on both the operating system and hardware architecture for which it's been compiled (for example, it's little-endian for Windows on x64 - which is what I'm currently using - but big-endian for Solaris on SPARC), and this representation is preserved when the data is output as a raw stream of bytes - as is done, for example, by the C fwrite() function.

Having realised the problem, the solution was clear: swap the order of the bytes around in each data value before writing it out. There are many ways to do this, but I found this document helpful. It proposes a framework for handling several data types which is intended to be portable across both little- and big-endian architectures (note also that the framework receives much attention here as alternatives which are possibly more robust, general and/or portable are suggested). Since my data is stored as floats, I only needed a small part of the framework; here's the C function I used:

float FloatSwap( float f )
{    
   union
   {
      float f;
      byte b[4];
   } dat1, dat2;

   dat1.f = f;
   dat2.b[0] = dat1.b[3];
   dat2.b[1] = dat1.b[2];
   dat2.b[2] = dat1.b[1];
   dat2.b[3] = dat1.b[0];
   return dat2.f;
}

and here's how I used it when writing out the data values:

float *data, val;
FILE *fp;

/* Generate the data values, open the file and write the header.  */
...

/* Output the data values.  */
index = 0;
for( k=0; k<nz; k++ )
   for( j=0; j<ny; j++ )
      for( i=0; i<nx; i++ ) {
         /* Swap the bytes in this data value, write it out.  */
         val = FloatSwap(data[index]);
         fwrite((void *)&val, sizeof(float), 1, fp);
         index++;
      }

fclose(fp);

Finally, here's the visualization which I created in ParaView. The dataset is the probability density function corresponding to the so-called 3dz2 atomic orbital, and it's visualized using volume rendering and an isosurface, which highlights its characteristic shape: two lobes and a torus (I once heard the profile of rather overweight chemistry teacher being compared unflatteringly to this).

Thursday, 16 December 2010

Half-way through HECToR

HECToR, the UK national supercomputer service, had its third birthday in October. NAG, as many readers will know, provides the Computational Science and Engineering (CSE) support for the service, helping users with application problems, and with porting and tuning their codes to make them as efficient as possible. We also provide an extensive programme of training courses throughout the UK covering both basic and advanced topics. To date we have had over 900 attendees on these courses and delivered them in 16 locations. We are currently putting together our programme for next year, upgrading our course materials in response to the latest hardware upgrade (from a Cray XT6 to an XE6), and developing material for some new topics that we haven't addressed before.

A novel aspect of the HECToR service is the Distributed CSE Service (DCSE) which funds dedicated resources to work on specific codes. Those resources can come from within the research community itself, or from specialist teams (including our own HPC team), and to date we have funded more than 46 years of effort for 48 projects.

DCSE projects have addressed a wide range of issues, but two themes recur frequently:

  1. Adopting shared-memory techniques. This is necessary to share data between multiple cores on the same socket where there is not enough memory for each core to have its own copy, and also to make efficient use of the increased number of cores per node.
  2. More efficient I/O. Reading and writing data is often a major bottleneck in applications and parallelising the I/O and using libraries that compress data efficiently can deliver impressive performance improvements.
In the last three years HECToR has gone from 2 cores per node to 24, and from a total of 11,328 cores (on the original XT4) to 44,544 (on the new XE6). However memory per core has dropped from 3GB to 1.33G, and clock speed has dropped from 2.8GHz to 2.1GHz. These changes mirror those that we are seeing more generally in our industry, namely more cores running more slowly and with less memory. Our experiences on HECToR demonstrate that software needs to be adapted to make efficient use of newer hardware. The good news is that this sort of software engineering work is often very cost-effective - many of the DCSE projects have saved more money in HECToR resources than they cost in labour. Hopefully this will lead to more and better scientific outputs in the future.

Thursday, 25 November 2010

Christmas Comes Early

Christmas came early for one little girl at NAG Ltd. Being in charge of office services has its ups and downs (hands down a blocked loo or physically being up a ladder changing light bulbs) are part of my normal day. But a call from the local fire brigade certainly cheered me up. Since the fire service has stopped its annual building inspections we are now reliant on paying somebody to come in every year to carry out a risk assessment and for some reason these people are always rather bland and lacking in making their work (although serious) fun. Anyone who was here two years ago and was lucky enough to have training on the use of fire extinguishers would know this has to have been the best training course ever; it was a laugh from start to finish. The request was to familiarise themselves with the layout of the building so if there was ever a real emergency and they would have some background knowledge. They had promised to arrive in a full rig and I had expected one or two officers, but four was hitting the jackpot! We had a nice chat over a quick cuppa about fire exits (staff should use the designated fire exits and not the main entrance), number of hydrants on site, lighting etc and then we did a tour of the building. Being only 5ft 1 ½” (5ft 6” in heels) I felt really small as they were all over 6ft but they were a really nice bunch who do a fabulous job and hopefully will only ever be here is a more social capacity rather than a fully fledged work one.

Monday, 22 November 2010

Now you see it...

A few weeks ago, I presented the HECToR scientific visualisation training course to a group of researchers at the University of Reading. This course looks at the use of visualisation as a tool for the improved understanding of numerical data (as produced, for example, by calculations or simulations run on HECToR). We started by exploring the different types of data - characterizing each according to its structure, dependencies and dimensionality - before reviewing the different visualisation techniques (such as contouring, particle tracing, volume rendering) that are applicable to each type. Some examples of the techniques, used to display a variety of data types from several application areas, are given in the figures below.

Next, we briefly reviewed a few applications for performing visualisation, of which there are a plethora in both the public domain and commercial arenas. Some of these have comparable functionalities - thus, for example, Figure 1 was produced using IRIS Explorer, ADVISE was used for Figure 2, and Figure 3 was created with ParaView. The plot in Figure 4 (which illustrates the use of a routine from the NAG Toolbox for MATLAB®) was made using MATLAB.

Figure 4 can be used to demonstrate another aspect of the training course - namely, the highlighting of good and bad practice in visualisation. Depending on the resolution of the display medium for the figure, it is usually apparent that the blue solid curve is easier to see than the dashed green one. This may not necessarily be a significant effect, although you might be reminded of other instances you've seen in presentations where - for example - a yellow curve has been rendered more or less invisible by being displayed against a white background, or a blue curve against black. In general, it can be surprising how little attention is apparently paid to issues such as clarity and reproducibility in visualisations by users who've spent a lot of time generating and checking their results, only to see their impact lost or impeded because of a poorly-designed display.

As alluded to above, one of the things that should be taken into account when designing a visualisation is the use of colour. Indeed, Figure 3 also illustrates how care should be taken when selecting colours: many viewers will find its juxtaposition of the red surface against a green background difficult to look at (try reading green text on a red field - or vice versa - for a further illustration of this). The way in which we perceive colour is a complex subject, but one salient point is that, whilst our eyes see absolute colours, our brains perceive differences in colours - i.e., the appearance of a colour depends on its surroundings. This apparently surprising fact underpins the workings of a whole range of optical illusions, of which my current favourite is the so-called same colour illusion:

Looking at Figure 5, it can be difficult to see that the two orange circles are the same colour - although, bearing in mind the discussion above, most people would probably accept it. But what's much harder to believe (although it's also true) is that the squares which surround them are also the same colour as each other. The reason we perceive them as different is that they have different surroundings (and also because of the way in which our brains try and compensate for the effect of shadows).

Demonstrations of illusions such as this (and the examples of good and - mostly - bad practice in visualisation) made the training session quite a lively one. In fact, I suspect that - notwithstanding the undoubted fascination that can be exerted by the study of visualisation techniques and applications - this was the part of the presentation that lingered longest in the minds of many of the attendees. I could be wrong, though.

Wednesday, 17 November 2010

Fortran Fame

I was standing on the NAG booth here at SC10 yesterday, muttering to myself about the difficulty of maintaining an internet connection in the exhibition hall. The organisers claim that there's more bandwidth available at SC10 than is used by any but the five largest countries in the world - I think they're exaggerating because I can never get any of it.

Anyway, along came a dapper fellow wearing a baseball cap who started a converstaion about the way standards were being implemented in the NAG Fortran compiler. I asked whether he was interested in using Fortran. "Well, I used to be", he said. Belatedly I looked down at his name badge. It turned out to be Tom Lahey, founding father of the Lahey Fortran compiler. What could I do but shake his hand and bow.

It's a hard life.

Every year in November NAG staff attend SC, formerly known as SuperComputing, an annual showcase where computer hardware and software vendors can display their goods and services, along with a huge technical agenda covering all aspects of HPC and scientific computing.

The show moves around different cities in the United States. This year SC10 is in New Orleans, Louisiana, and running from 14th to 21st November, with around 10,000 registered attendees. It's the 23rd outing since the first one which took place in Orlando, Florida in 1988. NAG has attended every single one of these events, and is one of just a handful of outfits to do so.

New Orleans was founded in 1718 by French speculators interested in making money from trade and finance, and named after the then French Regent, Philippe, Duke of Orléans. In the mid-eighteenth century it was ceded to the Spanish. They're the ones who actually created most of the characteristic architecture in the heart of the city now known as the French Quarter. In 1801 it returned to French control, only to be sold to the United States a couple of years later as part of the "Louisiana Purchase". In more recent times New Orleans was badly hit by Hurricane Katrina, and there's a nice museum detailing all the sad events of that calamity next door to St. Louis' Church in downtown New Orleans.

So, SC is always a great place to find out about the latest developments in computing technology, talk to NAG customers, meet old friends and make new ones, and this year is no exception. By the way, if it's not too late and you want to come to talk to us, the NAG booth is number 3131 in the exhibition hall.

The trouble is the parties. They start on the Sunday evening, with the "Exhibitors' Party". This is open to anyone with an X on their name badge, meaning that they're working one of the booths in the exhibition hall. The food is piled high, and there's enough drink to float the Queen Mary.

Monday evening is the "Opening Night Party", which is at the nearby Metropolitan Night Club. A jazz band plays the hits of Fats Domino, Louis Armstrong and other New Orleans greats. The food is piled high, and there's enough drink to float the Queen Mary.

Tuesday evening. Two big parties going on, one hosted by Cray and AMD which takes place in a ballroom at the Marriott Hotel, and one hosted by SGI and Intel, which is at Pat O'Brien's Bar on Bourbon Street in the French Quarter. In the interests of vendor neutrality, of course NAG staff must attend both parties. O'Brien's has another great jazz quartet playing, and is famous for "The Hurricane", a drink containing rum and a secret mix of fruits and spices. It's lethal, and guess what, there's enough of it to float the Queen Mary.

There are more parties to come. So, although we have the compensation of being able to learn more about what NAG users want in the way of fast numerical software, and finding out what our contemporaries are working on, the fact is, we've all got blisters on our feet from walking between the different parties.

It's a hard life. But someone's got to do it.

Thursday, 11 November 2010

Looking forward to meeting people with new ideas at Supercomputing.

I’m someone who enjoys talking with people at exhibition booths and the International High Performance Computing event, which is being staged next week (http://sc10.supercomputing.org/ ), will provide good opportunities.

Here is a chance to spend time with many current and future users of NAG products and services as well as with NAG collaborators. Attendees at this conference and exhibition work in such a variety of different areas that the conversations are always fascinating and informative.

Meeting with people face-to-face inevitably triggers more new ideas, and results in innovative solutions being created faster, than any electronic communication method.

I look forward to meeting some of you there (NAG have Booth #3131 – right at the heart of the main hall).

Friday, 5 November 2010

Creative Programming

As I was being asked to write this blog post my brain awoke from it's number crunching stupor and I realised that I couldn't think of anything to write. The past few hours of sitting, programming in front of a computer seemed to have wiped out the artistic half of my mind.

I know at least for myself that staring at a monitor seems to obliterate all of my thoughts until I've finished the click clacking and click clicking of my current task and I can finally put blessed ink to physical paper to plan my next step.

So in order to re-enliven my thinking and my creativity, I sought a nice picture of a stunning scene. There's nothing like the beauty of creation to awaken something deep in your heart.

As I was quickly sketching the above picture to get the other half of my brain warmed up to it's full creative potential, I started wondering about creativity in programming.

Firstly there is the beauty of the written code itself, the ideas encapsulated within it and the intelligence of the design. Secondly there is the created product of what the non-technical viewer might see, such as a website, 3D artwork, mathematical art, 3D movies etc.

There is creativity in all that we do, and no less in programming. It is very satisfying to have finally cracked a problem successfully, and even more so if you can look at the flow and structure of your program and know that it is good and pleasing.

But despite the beauty that can be found in computer generated output or even in code (if you are able to comprehend it) I think I will always have to turn to a beautiful, fiery, ruby red sunset, or the playful sound of a nearby splashing stream as a way of awakening my heart.

(photo kindly contributed by Paul Wilson from his flikr account)

Saturday, 30 October 2010

Comparing HPC across China, USA and Europe

In my earlier blog post today on China announcing the world's faster supercomputer, I said I'd be back with more later on the comparisons with the USA, Europe and others. In this morning's blog, I made the point that the world's fastest supercomputer, in itself, is not world changing. But leading supercomputers, critically matched with appropriate expertise in programming and using them, togther with the vision to ensure use across basic research, industry and defence applications can indeed be strategically beneficial to a nation - including real economic impact.

There are plenty of reports and studies describing the strategic impact of HPC within a given organisation or at national levels (some are catalogued by IDC here), so let's take it as a premise for the following thoughts.

With this in mind, there are some comparisons to be made between the approaches to supercomputing across the USA, Europe and China.

The USA has long enjoyed near total dominance of the hardware technology underlying the leading supercomputers. The USA has invested repeatedly in ensuring that American supercomputer manufacturers have the technology to deploy the world's largest supercomputers. The last time the USA lost the public leadership crown of fastest supercomputer, a huge investment in Amercian supercomputer technology followed. As well as national support for the development, the strong implicit requirement for USA organisations to "buy American" ensures a continued USA supercomputer manufacturing industry. As a result of the sheer size of the country, the USA has a large HPC R&D community. There is also significant usage of HPC in American industry and additional support for this through government initiatives like INCITE.

Despite the recent growth of Bull, and their highly-rated supercomputers and in-house HPC expertise, Europe has not seen it essential to have a home-grown supercomputer manufacturing industry (beyond the component level R&D which is strong in Europe). Europe has always highlighted its expertise in applied HPC software development and in software-related HPC R&D as its distinguishing strength on the international stage. European organisations have always bought supercomputers from around the world - and since they are likely to continue to enjoy access to products from around the world, have no need to develop independent supply from wholly European sources.

But the news of China's investment in hardware, software and people (and stated ambition of independence of supply) should make a clear message to USA, Europe and others that they cannot rely on their continued leadership of HPC. And thus the economic benefits of HPC might soon be driving Chinese growth rather than the European or American economies.

There is one other potentially killer advantage that China might have. All the predictions of the path to the next levels of supercomputer performance (Exascale) involve major changes in the technology - much greater levels of parallelism (seeing this now with GPUs), memory performance challenegs, resilience, etc. As Dave Turek of IBM has said before, China is mostly not hindered by legacy code - they can start anew with the best HPC ideas and methods of today, looking to the future. In USA and Europe, our obsession with "protecting our investment" in established applications, means we first have to figure out how to get from yesterday's software technology to current methods, then to the future. Is "protecting our investment" actually constraining our future?


[More on legacy code and revolutions vs evolution in a blog coming here soon ...]

Friday, 29 October 2010

Why does the China supercomputer matter to western governments?

There is a lot of fuss in the mainstream media (BBC, FT, CNET, even the Daily Mail!) the last few days about the world's fastest supercomputer being in China for the first time. And much ado on Twitter (me too - @hpcnotes).

But much of the mainstream reporting, twitter-fest, and blogging is missing the point I think. China deploying the world's fastest supercomputer is news (the fastest supercomputer has almost always been American for decades, with the occasional Japanese crown). But the machine alone is not the big news.

Imagine that China announced a new prototype passenger aircraft, half the cost of the latest Boeing or Airbus. It has 50% greater fuel efficiency too. And an order of magnitude greater predicted reliability statistics. That would be major news. Sure it uses a lot of US designed components too.

But what if China announced this new aircraft wasn't just a prototype. It was a commercially available product now. And they have the capacity to make lots of them - faster than Boeing or Airbus. And they have a plan to train huge numbers of future aircraft maintenance engineers, aerodynamic designers, structural engineers, etc. In other words, China can not only build a world-beating passenger aircraft, but it is building the capability to do so without US designed components in the future. And it is building the expertise capacity to be a world leader in aircraft maintenance.

That would be very important.

And, while we are not quite there yet, that is where this China supercomputer news is going. American scientists and HPC professionals have been calling for rounded investment in people and software not just hardware for years. Europe has been proud of it's relative HPC software expertise, but the recent IDC led EU HPC strategic recommendations report shows that much more investment is needed.

The mass commentary talks about the Chinese hardware milestone. But public material from Chinese experts also talks about a plan to deploy several top supercomputers, to train huge numbers of HPC programmers, to invest in applications development and commercial use of HPC and to develop end-to-end nationally independent supercomputing technology.

If that happens, then China will have the ability to develop that super aircraft industry. And automobile. And the many household products that are designed with supercomputers. And materials science. And ...

Get the point? It's not the world's fastest supercomputer that matters most. It's not just national pride. It's the ambition and comphrensive plan behind the world's leading supercomputer that matters.




[More later today on the comparison with USA, EU and others]

Friday, 22 October 2010

Carbon Footprint

In these austere times I am always on the lookout for ways to reduce our expenditure without reducing our service to customers. NAG have just been awarded a matched funding grant from http://www.sustainableroutes.co.uk/ for £1000 to help us reduce our carbon footprint and we intend to use the grant (and some) to improve our video conferencing and cut down on some travelling costs and CO2 for meetings and training. Currently we are looking at some state of the art video conference equipment which we hope to start trialling with our Manchester office. There are some other useful tips and grants available on the Sustainable Routes website so I would recommend go taking a look.

Tuesday, 19 October 2010

Source-level debugging of Python in Emacs

To help me track problems in Python code (yes, even sometimes in my own...) I usually rely on good-old print/trace debugging. Owing to Python's speed of interpretation and execution, this is a pretty convenient approach. Python does have its own interactive debugger though— pdb—for those odd occasions when it's desirable to poke about in a program while it's running. The debugging mode in Emacs even supports pdb by default, but there's a little snag: you probably don't have a pdb in your path, so M-x pdb will just fail. Solution? Add a pdb script to your path and make it executable
#!/bin/sh
pdb_path=`python -c "import pdb, sys; sys.stdout.write(pdb.__file__ + '\n')"`
exec python ${pdb_path} "$@" 

Thursday, 7 October 2010

2010: A Retail Store Odyssey

Stanley Kubrick’s film “2001: A Space Odyssey” was the epic 1968 science fiction film that explored human evolution, technology and artificial intelligence with both realism (and surrealism) and remains one of the top films of all time. In it, two astronauts battle the computer HAL for control of their spaceship and for their lives while investigating a series of strange monoliths left from an earlier civilization. For many years and for many people, the film has been symbolic of our struggle to master, and not be mastered by, computers. Kubrick and his co-author Sir Arthur C. Clark were both brilliant and far ahead of their time. In many respects, they still are.


In 2010 we are certainly wrestling with computers that occasionally seem to get the better of us. In our world, computers are ubiquitous and the software behind them has a pervasive impact on our daily lives but hardly in the way Kubrick and Clark envisioned. Consider this absolutely mundane sequence of events, at least in the developed world. We hop into our car to go shopping for groceries. As we turn the key one or more microprocessors start, employing sophisticated software to optimize efficiency, performance and emissions. Our cars talk to us, connecting phone calls routed through a cellular network. They entertain us with hundreds of channels from a satellite radio connection and give us visual and voice directions to where we want to go. All of this amazing hardware comes to life through the software that makes it work. And, of course the software makes considerable use of mathematics to accomplish what it does for us. So, you may ask, what’s the reference to a “retail odyssey”? We haven’t even gotten to the grocery store yet.

In my view, the most amazing thing at the Sainsbury I frequent in the UK or the Dominick’s near home isn’t the check out where they let me scan my items and coupons and pay with my credit card, all with a few taps of the touch-screen. It’s the realization that I’ve just walked through a store with literally thousands of unique items to meet my needs, each residing in a database linking the bar code on the package with a price, an inventory level, cost, supplier and even a “loyalty card” database that permits analysis of which shoppers bought which products and in which combinations. While you are pondering this miracle of modern technology, ponder this question: who set the price of the 2-pound package of Folgers coffee at the end of Aisle 2 and how did they do it?

The answer, if it’s not already obvious, is sophisticated software from companies like NAG partner DemandTec (NASDAQ: DMAN) whose demand management software is helping retailers and manufacturers worldwide optimize revenues, prices and inventories. We’ve worked with DemandTec since 2004, providing them sophisticated mathematical and statistical software to enable their application to help retailers manage demand.

One of the benefits of working with cutting edge companies like DemandTec is that we get to participate in events such as I have been involved with lately. NAG has partnered with DemandTec to sponsor the Chicago Regional scholarship competition in a national event called the DemandTec Retail Challenge. In it, high school seniors in teams of two get to apply their problem-solving and mathematical skills as pricing analysts managing an assortment of products in a grocery store. They set the price, order inventory and make decisions about promotions in a simulation-based competition with other teams in the Chicago area. The eventual winners will have maximized profitability and successfully communicated their approach to the problem to experts in the field. The winners earn a college scholarship and the right to compete in the national version of the contest at NASDAQ in New York City in January 2011. The national champions get a significant additional scholarship and the right to ring the closing bell for the trading day. For those of us at NAG it’s both a way of giving back to the community and helping the next generation apply their academic skills to real-world problems. From the conversations I’ve had with them thus far I suspect that the computer HAL would be no match for them.

Thursday, 30 September 2010

The Shoulders of Giants

Last week NAG held its 34th AGM and prior to the business part of the meeting we had the pleasure of hearing a lecture by Professor Nick Higham of the University of Manchester on the subject How and How Not to Compute the Exponential of a Matrix. The lecture was filmed and can be viewed here. Matrix functions are useful in a number of areas, in particular for describing the solutions of certain types of differential equations. NAG is working with Nick to include his algorithms in future marks of the Library.

One of the things I particularly enjoyed about Nick's talk was the way that he set the subject in its proper historical context, starting with the work of 19th Century mathematicians such as Sylvester and Cayley. Working at the cutting edge of computing technology its easy to forget how much of what we do depends on work that goes back hundreds of years. It wasn't always this way: Fermat, having claimed to have found a proof of what became known as his last theorem, added the waspish comment "and perhaps, posterity will thank me for having shown it that the ancients did not know everything".

A few years ago SIAM interviewed a number of prominent people who had been active in the early days of numerical analysis and scientific computing, including NAG's own Brian Ford. You can find transcripts of the interviews and related material on SIAM's website. Reading these transcripts is illuminating, particularly to a (relative) youngster like me, especially given the number of references to NAG's early history that they contain. I hope that SIAM keeps up this activity, and continues to collect personal reminiscences from prominent people in the field.

Thursday, 23 September 2010

HTML5: Possible implications for technical documentation

A New Version of HTML

As you may have heard, HTML5, a new version of HTML, the main markup language for the web is under development. This is the first version for quite some time, HTML4.01, the current version, was released in December, 1999. There was an XML version, XHTML 1.0, XHTML (Released in January 2000 and revised August 2002), but it did not contain any new features.

Much of the buzz around HTML5 concentrates on new features such as video (as an alternative to using flash plugins etc) and canvas (a JavaScript API for 2 dimensional graphics), but in this article I want to look at the features that impact on the kind of technical, mathematical documents used by NAG.

SVG and MathML

SVG is an XML format for describing scalar graphics, using a similar graphics model to PostScript or PDF, but using XML syntax and CSS for styling, just as for HTML. It has been around for some years, but HTML5, for the first time, specifies how it works in the context of HTML (rather than XML). In a related move, Internet Explorer 9 will support SVG. (Other common browsers have supported it for some years.)

Many of the function plots in the NAG documentation are generated from gnuplot. Gnuplot can save to SVG as well as the EPS and PNG outputs that we currently use. Also the remaining diagrams are currently stored as EPS (for use in PDF versions of the document) and EPS can usually be converted without any loss of information to SVG.

There are many advantages to using a scalable format for plots, especially documents that are read on smaller devices such as tablets (or phones), but also printing at higher resolution; being able to scale and zoom in as required without the loss of quality inherent in scaling bitmap images is a big win.

The other XML-derived language supported natively by the HTML5 parser is MathML. MathML should be familiar to most NAG users as it has been used in the XHTML version of our documentation for years. However being able to use MathML in HTML should hopefully simplify the installation of this documentation, and also the prominence given to MathML by its inclusion in HTML5 will also hopefully encourage the remaining browser vendors, who do not natively support it, to implement mathematics rendering in their browsers.

Current State of (Future) Browsers

All the common desktop browsers are moving towards supporting HTML5 in the near future. The following sections list the state of SVG and MathML rendering in some common cases.

Internet Explorer 9 Beta

IE 9 finally adds SVG support to Microsoft's Internet Explorer browser. Design Science's MathPlayer may still be used to get very high quality rendering of MathML. There are currently some problems in interfacing MathPlayer and IE 9, and some of the markup that worked in IE 8 no longer works, however this may be fixed before the full product comes out, or a few lines of IE-specific javascript, as used in the example below, may be used to work around the main problems.

Warning: IE 9 Beta, unlike all the other test releases of browsers mentioned here will replace your existing Internet Explorer installation. If you wish to test IE 9 rendering without removing your IE 8 (or earlier) installation make sure that you install the Platform Preview rather than the Beta. The platform preview releases use the IE 9 rendering engine but without all the normal browser menus and facilities, and it does not replace your existing browser.

IE 9 Beta 1 was tested.

Firefox 4.0 Beta

Firefox 4 uses the new HTML5 parser which automatically places MathML elements in the MathML namespace and SVG elements in the SVG namespace. The rendering of MathML and SVG in HTML 5 is then essentially exactly as has been available in earlier Firefox releases in XHTML documents, but now also available in HTML.

Firefox 4 Beta 6 was tested.

WebKit Nightlies

A Nightly release is a public binary release that is essentially just a snapshot of the current development build. As such it is expected to have bugs and unimplemented features. However these releases, which show the development state of the core rendering engine used by both Chrome and Safari browsers show that it has a reasonably well developed SVG renderer, and that MathML support, while very new, is improving. (These WebKit browsers have very good CSS support so until the native MathML is ready for use it will probably be possible to use a CSS rendering of MathML in these cases, as is done for the current NAG XHTML documentation).

WebKit-r67637, from 16th September was tested.

An Example

We do not yet have any definite plans or dates to move towards using HTML5, it depends a lot on the timing of the above test releases becoming mainstream (which is likely to be very soon) and how long it takes for use of older browsers to die out. (This can take a long time, many people only change their browser when they get a new machine, Internet Explorer 6 (released in 2001 and replaced by IE 7 in 2006 and IE 8 in 2009) is still in widespread use for example.)

As an experiment we have converted one example routine document into HTML5, using inline MathML and SVG. Note in the example all links to external documents have been made to point back to the test document so that it is self-standing. The document has 207 MathML fragments, which are essentially unchanged from the existing XHTML+MathML version of the document, the two plots shown in the example section, which were previously only available as monochrome scalable images in the PDF version of the document, or a coloured PNG bitmaps in the XHTML version, are provided as scalable SVG which is directly included into the HTML file, not referenced using <img src=....

The document is reported as valid HTML5+SVGMathML by the validator.nu online validator.

The document appears to be perfectly usable in Firefox 4 and IE 9. The MathML support in WebKit is still very new, and there also appear to be some problems with the SVG which get rendered with a black background in the Windows nightly build of 16th September. Note however that, as their name suggests nightly builds are unstable test releases and should not be taken as indicative of the final product. The fact that the SVG was fully rendered apart from some colour problems, and that the MathML was recognized, albeit with some rendering problems, is I think a good sign that WebKit based browsers (Safari, Chrome, and many mobile phone browsers) will support these documents in the near future.

Fortran Library Mark 22 Routine Document (D02AGF)
XHML+PNG+MathML Mark 22 Documentation d02agf.xml
PDF Mark 22 Documentationd02agf.pdf
HTML5+SVG+MathML Experimental d02agf.html

Conclusions

HTML5 will be mainstream in currently popular browsers very soon, it will take a few years before its use may be assumed, but it promises to be a big improvement for technical documents.

Both MathML and SVG allow the use of scalable formats freeing the web from the use of inappropriate bitmap images. This simplifies document distribution and should improve the quality both for printing and for use on a wider range of devices, including greatly increased accessibility.

Monday, 13 September 2010

Do you want ice with your supercomputer?

Would you like ice with your drink?” It’s a common question of course. One that divides people – few will think “I don’t mind” – most have a firm preference one way or the other. There are people who hate ice with their drink and those who freak if there is none. National stereotypes have a role to play – in the USA the question is not always asked – it’s assumed you want ice with everything. In the UK, you often have to ask specifically to get ice.

Yet the role of ice in making our drinks chilled is misleading. I once had a discussion with a leading American member of the international HPC community about this. “No ice”, he was complaining as we headed out of a European country, “they had no ice for the drink”.

I don’t get this obsession with ice”, I chipped in. “What?!” He looked at me as if I were mad. “Why do you like your coke warm?

Ah, but that’s just it”, I replied. “I hate warm drinks – I really like my coke chilled. But surely, in this modern world over a century after the invention of the refrigerator, it’s not unreasonable to expect the fluid to be chilled – without the need to drop lumps of solid water into it?

Ah, fair point”, he conceded.

What has this got to do with supercomputing? Perhaps the common thread is that usually we just accept the habitual choices of ways to do things – and don’t often step back to think – “are those the only choices?

Maybe we should step back a little more often and ask ourselves what we are trying to achieve with HPC – and are the usual choices the only ways forward? Or are there different ways to approach the problem that will deliver simpler, better or cheaper performance?

Perhaps your business/research goals mean you need to conduct more complex modelling or you need faster performance. Maybe the drive of computing technology towards many-core processors rather than faster processors is limiting your ability to achieve this. (I have had several conversations recently, where companies are buying older technology because their software won’t run on multicore).

The “ice or no ice” question might be whether or not to upgrade your HPC with the latest multicore processors. But what about the “just chill the fluid” option? Well, how about upgrading the software instead, or as well?

NAG has plenty of case studies to show where enhancements to software have achieved huge gains in performance or capability (e.g., www.hector.ac.uk/cse/reports).

Sometimes buying more compute power is the right answer. Sometimes, extracting more efficient performance from what you have is the answer. Bringing them together - a balance of hardware upgrades and software innovations might well give you the best chance of optimising cost efficiency, performance and sustainability of performance.

Wednesday, 8 September 2010

Working on the ADVISE project

For the past three and a half years, my colleagues and I have been working on ADVISE, a TSB-funded collaborative research project which has been developing a new toolkit for visualization and analysis. Besides NAG, the partners in the project were VSNi and the University of Leeds. VSNi have expertise in statistics, as implemented in their GenStat product, while Leeds have an international reputation for their work in visualization research. As for NAG, we've had some success with IRIS Explorer, a popular visualization toolkit which allows users to construct applications by connecting modules together via a visual programming interface.

We retained that interface in ADVISE (see Figure 1) because it has proved to be a rather intuitive way to create, modify and interact with applications. Thus, in this figure, the user selects modules from the repository on the right and connects them together in the area on the left. The widgets for controlling one of the modules are in the pane on the right at the bottom, whilst messages from the system are displayed in the area at top right.

In a similar spirit of re-use, the visualization and analysis functionality encapsulated within the ADVISE modules has come from porting just about all of the modules from IRIS Explorer into the new environment, and creating new modules that generate and process scripts of GenStat commands. We've used ADVISE to visualize and analyze a variety of data - see, for example, Figure 2, which is a display of some of the results from Christoper Goodyer's simulations of diffusion through skin.

So much for the re-use of old technology, but what's new in ADVISE? Well, its architecture makes use of recent technology developments in web services and distributed computing. This has several advantages, including the fact that it's easier to integrate applications built using ADVISE with the web (for example, running inside a browser), and that it's possible to connect ADVISE applications to other services (which could, for example, act as data sources).

To illustrate this ease of integration, Figure 3 shows a web-based application that's been created in ADVISE for the visualization of air-quality data. The window at the back shows the interface for selecting the location and duration of the data to be visualized, the next window shows that data displayed as a coloured elevated surface and the window in front shows the same data displayed as a 3D histogram. Widgets in the web page (linked to ADVISE modules) give the user a simple interface to the application - for example, allowing control over the type of display (surface or histogram), other parameters associated with the visualization, and selection of the next dataset to be displayed.

If you want to know more about the ADVISE project, or the system we produced, head over to our web page, which contains more pictures of visualizations created with the toolkit together with papers, posters and talk slides from throughout the life of the project. One picture you won't find over there, however, is the one below, which shows the whole ADVISE team (with the exception of Jungwook Seo and Colin Myers from Leeds) in all their glory at the end of the final project review meeting last month. There were (a lot of) other pictures taken at the same event, but this is the only one in which the project members aren't holding glasses.

Friday, 27 August 2010

NAG at Quant Congress, New York

Last month I was in New York with my sales colleagues Mike Modica and Rick Guido from NAG's US office, attending Quant Congress USA. This is a meeting which is devoted to the latest developments in financial derivatives, risk management and the associated use of numerical techniques; NAG has been associated with it for the past five years because our software has found extensive use in the quantitative analysis community. Each day of the conference program opened with a plenary session of two presentations, followed by around twenty talks divided across two streams. The meeting had nearly a hundred attendees, not all of whom attended both days (perhaps it's worth noting that, whilst the technical program undoubtedly had its own attractions and merits, the attendance might have been enhanced still further by the fact that the two days of the meeting saw heavy rain fall on the city - in the middle of an otherwise-sweltering week). NAG was one of five exhibitors at the meeting, and we found that the relatively small size of the event meant that everyone who wanted to view our stand - almost exclusively during the breaks from the talks - was able to.

We were kept busy by a steady stream of stand visitors throughout the event, which made for an interesting couple of days. Most of the queries and comments we received were focussed on technical queries about the functionality of the NAG libraries, with a particular interest, of course, in those routines which have proved valuable in the finance industry. There was also a lot of interest in the work that NAG has been doing on numerical routines for GPUs, partly because of the speedup that has been observed by customers when running Monte Carlo simulations on GPUs - as compared to running on a single CPU - using NAG routines (see this post for more details).

We also had a lot of questions about the NAG Toolbox for MATLAB®, while some users - or prospective users - were interested in being able to call NAG routines from within Python. I had an interesting discussion on this latter topic with Professor Dennis Allison (who is famous for, amongst other things, being one of the founders of Dr Dobb's Journal). I found that providing answers to a few of the most technical queries about numerics in quantitative finance was somewhat beyond my abilities, notwithstanding the recent education I'd received in some aspects of this field from my colleague Marcin Krzysztofik (and our collaborations involving the NAG option pricing routines), but we were able to deal with all outstanding issues by email within a few days of the conference's end.

Whilst in New York, we also made a series of site visits, mainly in the financial sector. During one of these, I was told by the customer that they thought the technical support which they received from NAG was excellent - in fact, they said it was often better than what they received from their own in-house team. Since the support service is one of the things that NAG prides itself on (and which the customer pays for), this was good to hear, particularly as I felt it made up for some less gratifying - but still memorable - experiences I'd had earlier in the trip. These included celebrating my arrival in the city by falling off a barstool - apparently without any provocation at all (apart from the fact that the barroom singer was playing a cover of Tom Petty's "Free Fallin'" at the time). Any damage that my self-esteem may have suffered was, however, assuaged by the (re)discovery of this bar. Always nice to be where everybody knows your name, I think.

Thursday, 26 August 2010

Why quality has always been essential for NAG’s own internal process (and a holiday).

Quality is one of the things that matters a lot to our users.

That is certainly true, but another reality is that the NAG Library could not exist at all without the checks and tests that result in the best quality software. The NAG Library is particularly complex and involves many different people doing very different types of task and activities. The Library is created from intricate actions including:

- Investigating lots of numerical problems and potential approaches and solutions
- Designing hundreds of methods for the various chapters
- Producing example code, data and plots
- Creating installer and user notes
- Writing the detailed documentation and descriptions of routines
- Building the multiple different processor and compiler implementations of the Library
- etc…

The best approach to quality is essential, in making the NAG Library, precisely because it is this multifaceted combination of numerical mathematics, software engineering and evolved process. If it were not for the quality checks, at all points through the product organization, it would be impossible to get all the bits to mesh and work together.
NAG learnt this fact many, many years ago!

NAG quality methods are continuously scrutinised because of this internal need, as much as for users. Levels of peer review of documentation are increased, as new expert knowledge come to the fore. The frequency of automated implementation testing carried out during the development process is escalated, as the availability of test platforms increases. And so on.

The results are self-evident. The quality and reliability of the Library never falters, even as new content is added and further software environments are supported. The NAG quality process benefits all parties.
On a much lighter note I’ve just returned from our family holiday. I, my wife and our three young children flew to Athens, stayed for 3 days to revise the history, and then moved, about 100 nautical miles by ferry, to an island to explore and relax on beautiful beaches...

…in summary the vacation was made up of different sections, and was enjoyed by all five of us, because of the quality and detail that went into the planning and booking. Perhaps it was also made easier since we are all hardened travellers.

This leaves me with a question, both for people and organisations
– ‘Can we all manage just as well with a top quality process but only limited experience?’

I’d be interested to hear your view…

Friday, 20 August 2010

What's in a Name?

Newcomers to NAG often ask about the names of NAG routines. Certainly the names appear strange and unfamiliar at first, but to many NAG users the name encapsulates the heritage of quality and precision offered by NAG.

Originally the NAG Libraries were created by contributors from U.K. academia. Some of these wrote in Algol 60 and others in Fortran. It was agreed to have two libraries of identical content, so that every contribution in one language had a counterpart in the other. Thus both communities were appeased. Early printed documentation included both language versions.

It was clearly desirable to be able to distinguish one language version of an algorithm from the same algorithm implemented in the alternate language and in general this is still desirable if a single program needs to call both instances of the algorithm.

Equally the early developers saw that, as new algorithms were developed, replacement routines would be required with perhaps different calling sequences, so that any naming scheme chosen needed to accommodate the ongoing development of the libraries. Another desirable feature would be to naturally group by name routines that lay in the same field of numerical computation.

With these goals in mind the founders of NAG considered two lines of approach.

The first idea looked at the concept of 'meaningful' names, names which might convey the algorithm being used or the problem addressed. Even today this is difficult, but in the early 1970s standard Fortran did not permit more than 6 characters for any routine name. And NAG certainly did not want to have Fortran and Algol names so radically different from each other that the algorithmic connection was obscured. The problem of updating also needed to be solved.

The alternative looked much more appealing. At that time a modified SHARE classification system had been published and NAG decided to base its routine names on that. This gave rise to the names we now see. Each name is 6 characters long, the final letter denoting the language of implementation. In those days 'F' denoted Fortran and 'A' denoted Algol 60. Subsequently this would be extended as NAG produced both single and double precision libraries in Fortran and as NAG produced other language versions such as C. The first characters, generally the first 3, indicated the broad subject ( or chapter) area to which the algorithm belonged. (Characters in positions 2 and 3 are always digits.) Thus E04WDF is a routine in the FORTRAN Library that lies in the E04 chapter. This chapter deals with 'Minimizing or Maximizing a Function'. The letters 'WD' uniquely identify the algorithm.

Of course modern Fortran no longer has a restriction of only 6 characters for routine names and so it is perhaps worthwhile to consider whether the NAG naming scheme can usefully take advantage of this relaxation.

The debate rages within NAG. Certainly the old problems remain if we wish to have 'meaningful' names, since routines will change with time. Do we describe the problem to be tackled or the algorithm used? If the former then NAG may wish to offer several techniques. In the C06 chapter we offer both Weeks' and Crump's method for Inverse Laplace transforms for example. Problem addressed is evidently not enough to uniquely identify an algorithm. Description of the routine by algorithm alone falls down heavily in the optimisation chapter where 'Sequential Quadratic Programming' and 'Quasi-Newton' and 'Modified-Newton' account for the vast majority of routines. We are thus led towards a combination of the two.

Looking in detail at the effect of combining problem and method into a meaningful name we rapidly encounter the very powerful and flexible routines of the NAG optimisation chapter. Here one routine often addresses several related, but different, problems and has both sparse and dense counterparts. Long names might indeed become very long if they are to convey any useful meaning and not deceive.

We see an early attempt by NAG to provide long, meaningful, names in its C Library. Users will be familiar with nag_opt_lin_lsq to denote the C routine e04ncc. This routine also solves convex qp problems, something the reader would not deduce from this name, yet this is probably its most important role.

Proponents of meaningful long names argue that the solution to this is to have separate routines with only a single purpose. Their opponents argue that it is easy for users to effectively choose their own names for routines, given the variety of options open to them: pre-processors, wrappers and aliasing for example.

What do you think? How many characters in a long name would you tolerate and how would you name NAG routines?

Wednesday, 18 August 2010

A day in the life of...

... a Numerical Analyst:

Wow, this could be fairly boring, but here goes...

I arrive at my desk, slightly dishevelled from my daily cycle to work, whip my laptop out of the draw and unhibernate my old faithful friend. He and I have had some ups and downs over the past year but I am happy to say that we are still on good terms.

Having gone through the usual email checking routine, I ssh (remotely logon) onto one of the local linux work servers, to pick up from where I finished the day before.

And then it's on to slogging at the current routine, improving it's accuracy or speed, checking it works properly, documenting the code, screaming (under my breath) in frustration at the latest problem, or quickly writing a blog post to give my mind a break from the overflows and NaN's (think they are grandmas? Think again) of the numererical analyst's life.

But it's not all slog. One of the highlights for me, is in the cracking of problems. The simple answers, and the not so simple answers that when found, bring a smile of victory to the lips. Another highlight is in automating repetetive tasks, and creating/finding useful tools to speed up the work cycle. As I read in a magazine a while back, that if you do something a few times, it's worth making a quick tool to do the job for you.

At the end of the day, it's back home to my lovely wife, and to feed and bath my young son, only a Numerical Analyst in the back of my mind.

Friday, 30 July 2010

Hey, you sass that hoopy Fortran?

We have to work with lots of different Fortran compilers at NAG. So far at Mark 22 of our Fortran Library we have built using products from six distinct sources (GNU, IBM, Intel®, NAG, PGI and Sun). The list of what we build with naturally changes from Mark to Mark as different vendors fall by the wayside or as commercial considerations preclude us from creating a particular Library implementation.

With the six vendors listed above we are blessed that the base Fortran coverage is Fortran 95. At previous Marks we still supported compilers covering only Fortran 77 (g77 and pgf77 for example). This forced us to jump through all manner of hoops so that we could do some of the really useful stuff from Fortran 90—like, woah, dynamically allocating memory—in a Fortran 77 way.

Some hoops still exist though, even with Fortran 90 code.


For example, we recently ran into the following:
    PROGRAM str_read
!      .. Implicit None Statement ..
       IMPLICIT NONE
!      .. Parameters ..
       INTEGER, PARAMETER              :: wp = KIND(0.0D0)
!      .. Local Scalars ..
       REAL (kind=wp)                  :: rval
       INTEGER                         :: ioerr
       CHARACTER (200)                 :: my_str
!      .. Executable Statements ..
       my_str = '1.0D400'
       READ (my_str,'(E16.0)',iostat=ioerr) rval
       PRINT *, 'IOERR = ', ioerr
    END PROGRAM str_read
Ignoring the fact that the code fragment is a little contextless, just imagine that we're trying to trap overflowing real values being read from a CHARACTER. On five of the six compilers given above, two return a nonzero ioerr from the READ, two carry on happily and set rval to Infinity, and one core dumps! (At the time of writing, I didn't have access to all six compilers.)

Conferring with the Fortran standard (e.g., the draft Fortran 95) we see the dreaded phrase The set of input/output error conditions is processor dependent. (my italics), so we can't even complain to the compiler vendors about this behaviour! We have good relationships with the vendors, so when our building process reveals a compiler bug we'll report it. But as you can imagine, sometimes it can be a battle to achieve what we want across many Fortran platforms in as simple and maintainable a way as possible.

The next level of complication comes from ensuring that what we do in Fortran is callable from non-Fortran environments. We usually take Microsoft Excel as a sufficiently-far-removed example. I won't go into details of the usual issues raised by cross-language programming; however, now we're at a base level of Fortran 95 we've been looking at pushing our envelope more and trying out some Fortran 2003 features, specifically C Interoperability.

M'colleague Nicolas has been working on a suite of image processing routines. In pure Fortran a NAG_IMAGE TYPE exists to hold the pixel values and other metadata for the image. To make this passable from outside Fortran we're using Fortran 2003. It's understandable with new features in the language that there should be a settling-in period, but it's no exaggeration to say that every compiler we tried did, at some point, fall over on our new code. Thanks to the hard work of the compiler developers on the various teams we're now at a stable state, but there are still some vendors who are lagging behind with Fortran 2003 and for which we have to exclude the image-processing code from our testing.

Looking forward (!) to Fortran 2008, I'm interested to see how the new special-function INTRINSICs (BESSEL_J*, BESSEL_Y*, ERF* and *GAMMA) will behave across different compilers. In theory these new INTRINSICs mean we'd be able to withdraw a sizeable chunk of our S Chapter. But I wonder how easy that will be?


(The images in this post are from now voyager, but are probably copyright PolyGram Filmed Entertainment.)

Friday, 23 July 2010

A Life Well-Lived: Erwin Ruppenthal (1958 – 2010)

There’s probably some unwritten rule that blogs are off-limits to memorials. If so, I’m happy to break it for our colleague Erwin (or Erv as he was known at home). He died early this week after a long struggle with brain cancer, at home, surrounded by his wife Heidi and sons Alex and Kurt. If we take a closer look at his life we can learn a lot about the value of work and about having a purpose in life greater than earning a paycheck.

Erwin taught me a number of lessons about work and I’ll tell you about some of them so that things he valued might live on in each of us, but first let me tell you a little about him. He came to NAG in early August 1990 to work in IT at our US office. What I’m told was that he was an earnest, hardworking German citizen in his early 30’s who most would describe as shy and self-effacing. It was nine years later that I encountered him on my first day as his boss.

He was a man of two countries and two cultures who embraced both. Following sports was a passion: American baseball, American football and European football featured prominently. In his office you could find emblems of the Chicago Bears and Chicago Cubs along with a Bayern Munich FC flag. I read in his obituary that he almost never missed a Naperville Central high school football game over many years. If you knew him, none of this loyalty and dedication would surprise you because he brought the same qualities to work every day,

I’ve worked with few people who could equal his dedication to his work, our mission as a company and the customers we serve. His loyalty to serving our customers showed in the countless problems he solved late at night and early in the morning from home while monitoring our helpdesk inbox or remote web site. He had no hesitation in coming to me if he felt we weren’t being fair to a customer or living up to our promises. He confirmed in me that our first principle had to be to do the right thing by a customer and trust that they would return the same for us.

He taught me some powerful management lessons though I doubt he ever spent a day in business school. Early on, I would see something we were doing that needed change, blindingly obvious (at least to me). Erwin would sit patiently as I explained, asking questions and listening. At first I thought he was just humoring his boss so I’d go back a week later and repeat the process and come away with the same impression. At some point later I would notice that a change had been made with no fanfare. Erwin would take my idea, however half-baked, improve it and implement it. Therein I learned the value of patience, trust and that overused clichĂ©, empowerment. With Erwin, I mostly needed to plant a seed and get out of the way.

Erwin didn’t spend money frivolously in his personal life, as far as I could tell, but when he did advocate spending it at work, he pursued quality, durability and things which made for better customer service. We have (still) a truly ancient UNIX server that serves as an archive of customer data. Through two office moves and occasional disk failures Erwin kept it going reliably. His care for such things could be summed up by a short story of the first of those moves. The move was from one building to another in an office park in suburban Chicago separated by perhaps two football fields of lawn and street. It was late on a cold, moonlit Friday night in early January 2002. The movers had finished earlier in the evening and all of the staff had gone home. All that was left to do was to power down the servers, move them across the street and re-start them. They were already ancient and Erwin wasn’t about to trust them to the movers so he and I methodically shut everything down and loaded it all on a 4-wheel cart. A few minutes later we were pushing the cart gently down the middle of the deserted street when I jokingly wondered what would happen if a police car came by. He had prepared for everything but that. Even in the small things, he planned carefully and followed through. That server is still running.

I could tell you more but these things tell us the enduring values Erwin brought to work: loyalty to people and a purpose bigger than himself, patience, persistence and reliability. He was more than a co-worker for everyone he touched. He was a true colleague and a trusted friend. Erwin may be gone but his memory and his values live on in each of us.

Monday, 19 July 2010

Time Machines and Supercomputers

I found a Linpack App for the iPhone last week. Nothing special, just a bit of five minute fun. It seems a 3G model achieves about 20 MFLOPS. [Note 1]

What's that got to do with time machines? Well it got me thinking "I wonder when 20 MFLOPS was the performance of a leading edge supercomputer?" Actually, it was before the start of the Top500 list (1993), so finding out was beyond the research I was prepared to do for this blog.

So I thought instead about the first supercomputer I used in anger. As soon as I name it, if anyone is still reading this waffle, you will immediately fall into two camps - those who think I'm too young to be nostalgic about old supercomputers yet - and those who think I'm too old to be talking about modern supercomoputers :-).

It was a Cray T3D.

You're still waiting for the time machine bit ... hang on in there.

My application on that T3D sustained about 25 GFLOPS. Which is about the same as a high end PC of recent years. What this means to me is that anyone who cares to apply the effort today with a high end PC, could get comparable results to that work of 15-20 years ago that needed the supercomputer.

Or, in other words, that supercomputer gave us a 15-20 years time advantage over everyone who didn't have supercomputers - or a few years over others with smaller supercomputers. [Note 2]

That is one of the key benefits of High Performance Computing - the ability to get a result before a competitor - you could say HPC is a time machine for simulation and modelling.

Now for the [Notes] - which actually contain the real story!

Note 1 : It's not really true to say the iPhone 3G can do 20 MFLOPs - all we can say is that particular App achieved 20 MFLOPs on that iPhone 3G. The result is a factor of both the software and the hardware. Better performance can come from optimising the application as much as from buying a more powerful phone.

Note 2 : If fact, even with the same supercomputer, it would be hard for most people to replicate the results - simply because there was as much value in the software (physics, algorithms, performance engineering, implementation, etc) and the associated validation and verification program as there was in the supercomputer.

The supercomputer offered us a time machine. But the attention to performance and scalability in the application enabled us to actually use that time machine to get results faster than others - even if those others used the same supercomputer. And the validation and verification effort meant that we could trust what our time machine was telling us.

Thursday, 8 July 2010

Fantasy Football – a classic Portfolio Optimisation problem

England out the World Cup, German colleagues, customers, collaborators (actually not just Germans, Americans, Scots,…) and so called friends all sending me mocking e-mails and texts.

“I hear OXO are making a new product. The packaging is white with a red cross and they're calling it the laughing stock.”

How could I channel my frustration? NAG Blog to the rescue.

In my last blog I promised to reflect on my early career at NAG which isn’t that long ago compared with some of colleagues.

My commute to the NAG office is longer than I’d like, but there is a positive side…

· Valuable thinking time
· Audio books - I recently enjoyed “No One Would Listen” by Harry Markopolos. The exclusive story of the Harry Markopolos–lead investigation into Bernie Madoff and his $65 billion Ponzi scheme.

Back to “thinking time,” on one commute into Oxford I’d been pondering the previous day’s internal training course. I was recalling my senior technical colleague’s wise words. That wise colleague was David (some of you will have met him). He had given an internal training course to the Sales and Marketing group. His presentation was a simple introduction to the topic of Optimisation and he had touched on modern portfolio theory even explaining the efficient frontier.

Some of you may be lucky enough to have heard this talk before….

Imagine an English ice cream manufacturer. He might run a very profitable business, but recognise that his business is very dependant on long sunny, dry summers. Wishing to diversify and minimise his risk he chooses to start another business thus protecting himself against a cold, rainy summer with an umbrella company.

Subsequently he went on to talk about stocks and shares and the benefits of diversification. This struck a chord with me. I learnt early on in my adult life the importance of diversification. Remember GEC that became Marconi. Where did Marconi go wrong? As a result of shares options I had a portfolio dominated by Marconi and consequently suffered a very large paper loss. Oh, why didn’t I meet David earlier? He went on to speak about how diversification can be achieved by holding a selection of stocks from a spread of geographic regions, industry sectors etc.

Of course this whole training course was aimed at helping us understand NAG’s Optimisation Routines . For those of you wishing to learn more about NAG’s optimisation routines you should refer to the NAG Library Manual and the appropriate chapter introduction.

Anyway, as I drove back into work that morning I had that light bulb moment. Fantasy football is like portfolio optimisation! Well, you can imagine how I sprinted into the office from the work car park eager to share my discovery with David.

Me: “Remember your lecture about optimisation and portfolio optimisation? Fantasy football is like this, isn’t it?”

David: “What is Fantasy Football?”

Me: “Well
·You have a set of rules.
· You have a limited pot of money that you can spend to pick a squad of 15 football players.
·You then have to pick X Goal Keepers, Y defenders, …
· You can only pick Z players from any one team
· Players win points by
o Keeping a clean sheet
o Scoring a goal
oAssisting a goal
· Players lose points by
oConceding goals
o Getting red or yellow cards.”

Pause for thought.

Me: “These are constraints aren’t they?”

Patronising nod from David

Me: “And this is an Integer problem, isn’t it?”

Laughter. David: “Well, at least you were listening last week. Now, go and get on with your work.”

Me: “No, you don’t get it. You’re going to help me program this up and we’ll see how the NAG Optimisers perform! I bet they won’t beat me. I’ve won the office fantasy football competition two years in a row. I would be interested to see how it performs though.”

Two or three days pass. I find a way of getting all the historic data from “The Official Fantasy Game Of The Premier League” into an Excel spreadsheet.

A week later I manage to persuade David to code up one of NAG’s optimisation routines to optimise my portfolio of football players. What does this mean? Well one is supplied with a list of players. Each player has a value and their previous season’s points total is listed. So one might choose a strategy which optimises a squad of 15 players based on a maximum spend of £100 million. A more risky approach might be to assume you will always have 11 fit players so you pick the 4 cheapest players and then look to maximise your return from an optimal 11.

Fortunately NAG supplies Visual Basic Declaration Statements and C Types with its DLLs. This makes these libraries easy to use for VB programmers who could easily code up one of NAG’s Optimisers to pick a Fantasy Football side. Yes, NAG’s Libraries are easy to link to Excel and we include a simple Portfolio Optimisation example on our website.



Let me explain this screen shot.

The two teams that were picked entirely by NAG’s Solvers were David’s “Too Hot” and Sven’s “Old NAG’s Best 15.”
· “Too Hot” was based on picking the four cheapest players (with the highest return) and then the “Optimised 11”
· “Old NAG’s Best 15” was based on optimising the entire 15 i.e. taking no account that only 11 players can be picked and used in each match.

In my next blog I’ll share with you how “Too Hot” and “Old NAG’s Best 15” finished the season.