Wednesday, 22 December 2010

ParaView, VTK files and endianness

Recently, I've been doing some work with ParaView, a rather nice open-source visualization application that's aimed at the display and analysis of large data sets. It has a wide range of visualization functionality, an interface that allows for interaction via its graphical user interface, or through scripting, and a distributed architecture. Executable versions are available for a variety of platforms, or you can download the source and compile it yourself. I'm currently running the Windows 64bit distribution on my laptop, and have also been building the source on HECToR, with a view to trying the application out on that machine, and using it for the analysis of some of the large data sets generated there.

ParaView uses the open-source Visualization Toolkit (VTK) for data processing and rendering; this consists of a core C++ class library and interpreted interface layers for Tcl/Tk, Java and Python. Much of VTK's functionality - specifically, its visualization techniques and modelling methods - is available from within ParaView, and it's also possible to extend the application to support other VTK classes by (for example) providing an XML description of the interface.

The first thing the user of any visualization system wants to do is to read their own data into it (of course, this will also be the last thing they ever do with the system, if it proves to be too recalcitrant). Doing this requires some understanding of the type of data that the system can process, and the translation of the different components of the user's data into those types. Generally speaking, it's usually most straightforward for the user to write their data to a file in a format which is supported by the system (the alternative approach - more useful when, for example, the the user has a lot of data files in a particular format - is to extend the system to support the reading of that format).

ParaView can read files in several formats, including so-called VTK Legacy files (so named because the format was introduced in an earlier version of VTK; since then, it's been supplemented by a more flexible XML-based format). Its structure is pretty straightforward, and well-documented. Here's the first part of an example Legacy file, which contains scalar data values located on a regular 3D mesh:

# vtk DataFile Version 2.0
3D example
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 61 61 101
ORIGIN 0.0 0.0 0.0
SPACING 0.05 0.05 0.05
POINT_DATA 375821
SCALARS volume_scalars float 1
LOOKUP_TABLE default

The coordinates of the nodes on this kind of mesh (which VTK calls a Structured Point dataset) are fully specified by providing values for the parameters given by the keywords DIMENSION, ORIGIN and SPACING on each axis; the other file keywords are described elsewhere, along with an account of the other types of mesh and the other data structures which VTK supports. More specifically here, the keyword ASCII specifies that the data values (which come next in the file) are written in this format. Selecting this option ensures that the entire file is human-readable and portable from one machine to another, but it might not be the best choice if the data file is very large (for the usual reasons of space and speed). Switching to BINARY makes the file more compact and allows it to be written and read more quickly.

Here's where I ran into a (slight) problem: although ParaView read the the ASCII version of my file and displayed it correctly, this broke when I switched to the binary version (more accurately, ParaView read the file in happily but the values which it found there were wrong). After some poking around in the documentation, I realised the problem was that VTK - and hence, ParaView - always expects the binary data in the Legacy file to be stored with the most significant byte first - also known as the big-endian representation. (To be strictly honest, I couldn't find this point specified unambiguously in the otherwise-excellent VTK file formats guide, but it was emphasised more explicitly elsewhere). However, my file had been output by a program in which the data was stored with the least significant byte first - i.e., in the so-called little-endian fashion. The representation used by the program to store the data depends on both the operating system and hardware architecture for which it's been compiled (for example, it's little-endian for Windows on x64 - which is what I'm currently using - but big-endian for Solaris on SPARC), and this representation is preserved when the data is output as a raw stream of bytes - as is done, for example, by the C fwrite() function.

Having realised the problem, the solution was clear: swap the order of the bytes around in each data value before writing it out. There are many ways to do this, but I found this document helpful. It proposes a framework for handling several data types which is intended to be portable across both little- and big-endian architectures (note also that the framework receives much attention here as alternatives which are possibly more robust, general and/or portable are suggested). Since my data is stored as floats, I only needed a small part of the framework; here's the C function I used:

float FloatSwap( float f )
{    
   union
   {
      float f;
      byte b[4];
   } dat1, dat2;

   dat1.f = f;
   dat2.b[0] = dat1.b[3];
   dat2.b[1] = dat1.b[2];
   dat2.b[2] = dat1.b[1];
   dat2.b[3] = dat1.b[0];
   return dat2.f;
}

and here's how I used it when writing out the data values:

float *data, val;
FILE *fp;

/* Generate the data values, open the file and write the header.  */
...

/* Output the data values.  */
index = 0;
for( k=0; k<nz; k++ )
   for( j=0; j<ny; j++ )
      for( i=0; i<nx; i++ ) {
         /* Swap the bytes in this data value, write it out.  */
         val = FloatSwap(data[index]);
         fwrite((void *)&val, sizeof(float), 1, fp);
         index++;
      }

fclose(fp);

Finally, here's the visualization which I created in ParaView. The dataset is the probability density function corresponding to the so-called 3dz2 atomic orbital, and it's visualized using volume rendering and an isosurface, which highlights its characteristic shape: two lobes and a torus (I once heard the profile of rather overweight chemistry teacher being compared unflatteringly to this).

Thursday, 16 December 2010

Half-way through HECToR

HECToR, the UK national supercomputer service, had its third birthday in October. NAG, as many readers will know, provides the Computational Science and Engineering (CSE) support for the service, helping users with application problems, and with porting and tuning their codes to make them as efficient as possible. We also provide an extensive programme of training courses throughout the UK covering both basic and advanced topics. To date we have had over 900 attendees on these courses and delivered them in 16 locations. We are currently putting together our programme for next year, upgrading our course materials in response to the latest hardware upgrade (from a Cray XT6 to an XE6), and developing material for some new topics that we haven't addressed before.

A novel aspect of the HECToR service is the Distributed CSE Service (DCSE) which funds dedicated resources to work on specific codes. Those resources can come from within the research community itself, or from specialist teams (including our own HPC team), and to date we have funded more than 46 years of effort for 48 projects.

DCSE projects have addressed a wide range of issues, but two themes recur frequently:

  1. Adopting shared-memory techniques. This is necessary to share data between multiple cores on the same socket where there is not enough memory for each core to have its own copy, and also to make efficient use of the increased number of cores per node.
  2. More efficient I/O. Reading and writing data is often a major bottleneck in applications and parallelising the I/O and using libraries that compress data efficiently can deliver impressive performance improvements.
In the last three years HECToR has gone from 2 cores per node to 24, and from a total of 11,328 cores (on the original XT4) to 44,544 (on the new XE6). However memory per core has dropped from 3GB to 1.33G, and clock speed has dropped from 2.8GHz to 2.1GHz. These changes mirror those that we are seeing more generally in our industry, namely more cores running more slowly and with less memory. Our experiences on HECToR demonstrate that software needs to be adapted to make efficient use of newer hardware. The good news is that this sort of software engineering work is often very cost-effective - many of the DCSE projects have saved more money in HECToR resources than they cost in labour. Hopefully this will lead to more and better scientific outputs in the future.