The making of “1000x” – unbalanced supercomputing

I had a bit of a rant in my article published at HPCwire this week - “Chasing1000x: The future of supercomputing is unbalanced”.

The gist of my rant is that the supercomputing community pays great attention to the next factor of 1000x in performance – and I firmly agree that next 1000x is highly valuable to the HPC community and the wider economy. But, we should give equal attention to 1000x in other areas, notably ease-of-use and growth of the user-base. And, critically, give equal peer recognition to those promoting that growth and pursuing ease-of-use and ease-of-access, not reserve all our “visionary” accolades for those figuring out the details of the first exascale computers.

However, I planted an appropriate pun in the title of the article itself. The obvious meaning, in the context of the article, is that the future of supercomputing is unbalanced with respect to the focus on performance versus community growth etc. However, the double meaning should be readily recognizable to anyone active watching or developing the HPC technology roadmaps. The future of supercomputers is unbalanced – i.e., the broad summary of the many technology roadmaps out there is that future supercomputers (at all scales) will be less balanced in some key performance attributes than current technology.

There may be other options but, based on the current evidence of technology development and indications of acceptable power and costs constraints, the most likely future is as follows.

The peak calculating capacity of processors will keep growing exponentially – an apparition of the industry’s old friend Moore’s Law. (For most scientific/engineering use, this means floating point operations per second - or FLOPS.) The energy efficiency of this peak capacity will also dramatically improve - FLOPS/watt. However, while the performance and capacity of the data side of the system will continue to grow - they will not grow anywhere near as fast as the availability of FLOPS.

Thus, we can assume that future HPC systems - from desktop workstations to the fastest supercomputers - will be abundant in FLOPS but moving data around the system will incur significant (relative) time and energy costs. Getting some data from memory (e.g. to do a calculation with) will cost much more time and energy than doing the calculation itself. The performance penalty (due to latency and bandwidth limits relative to the increase processor speeds) will mean that computers will appear significantly unbalanced compared to current systems.

For more background on the architectural changes, there are many presentations talks worth looking at, including for example several by Jack Dongarra or this by David Keyes.

The key message is that software implementations (including algorithms) required to get the best performance out of these future architectures are likely to be different to those in the bulk of current applications. This is because those current implementations make certain assumptions about the relative costs of FLOPS, bandwidth, latency, etc. (based on today's hardware) - and those assumptions will no longer be valid.

This imbalance will probably affect the performance of applications - at the worst, without modification, current software implementations could run slower on future hardware! I am not suggesting this extreme will be a common occurrence. But it is very likely that, without effort, most current application software will fail to take full advantage of the performance potential of future HPC systems. Quite simply, due to the data movement challenges, they will not be able to use all of those FLOPS thrown at us by Moore's Law.

Thus, to continue achieving major performance improvements ("1000x"), the users of modeling and simulation, the developers of software applications, and the providers of HPC services and systems will all have to investigate the role of software innovation as an integral partner with the hardware speed race.

And the same applies to the core theme of my article in HPCwire (growth in the users of HPC and improvements in the ease-of-use will deliver as much or more to the economic good than chasing the high end performance alone). Just like 1000x in performance, those goals of 1000x users and 1000x ease-of-use also require innovation in the software to be an integral part of the approach.


Popular posts from this blog

Implied Volatility using Python's Pandas Library

C++ wrappers for the NAG C Library

ParaView, VTK files and endianness