Thursday, 16 December 2010

Half-way through HECToR

HECToR, the UK national supercomputer service, had its third birthday in October. NAG, as many readers will know, provides the Computational Science and Engineering (CSE) support for the service, helping users with application problems, and with porting and tuning their codes to make them as efficient as possible. We also provide an extensive programme of training courses throughout the UK covering both basic and advanced topics. To date we have had over 900 attendees on these courses and delivered them in 16 locations. We are currently putting together our programme for next year, upgrading our course materials in response to the latest hardware upgrade (from a Cray XT6 to an XE6), and developing material for some new topics that we haven't addressed before.

A novel aspect of the HECToR service is the Distributed CSE Service (DCSE) which funds dedicated resources to work on specific codes. Those resources can come from within the research community itself, or from specialist teams (including our own HPC team), and to date we have funded more than 46 years of effort for 48 projects.

DCSE projects have addressed a wide range of issues, but two themes recur frequently:

  1. Adopting shared-memory techniques. This is necessary to share data between multiple cores on the same socket where there is not enough memory for each core to have its own copy, and also to make efficient use of the increased number of cores per node.
  2. More efficient I/O. Reading and writing data is often a major bottleneck in applications and parallelising the I/O and using libraries that compress data efficiently can deliver impressive performance improvements.
In the last three years HECToR has gone from 2 cores per node to 24, and from a total of 11,328 cores (on the original XT4) to 44,544 (on the new XE6). However memory per core has dropped from 3GB to 1.33G, and clock speed has dropped from 2.8GHz to 2.1GHz. These changes mirror those that we are seeing more generally in our industry, namely more cores running more slowly and with less memory. Our experiences on HECToR demonstrate that software needs to be adapted to make efficient use of newer hardware. The good news is that this sort of software engineering work is often very cost-effective - many of the DCSE projects have saved more money in HECToR resources than they cost in labour. Hopefully this will lead to more and better scientific outputs in the future.

No comments:

Post a Comment

NAG moderates all replies and reserves the right to not publish posts that are deemed inappropriate.