Technical computing futures part 2: GPU and manycore success

In my previous blog, I suggested that the HPC revolution towards GPUs (or similar many-core technologies) as the primary processor has a lot in common with the move from RISC to commodity x86 processors a few years ago. A new technology appears to offer cheaper (or better) performance than the incumbent, for some porting and tuning pain. Of course, I’m not the first HPC blogger to have made this observation, but I hope to follow it a little further.

In particular, my previous blog suggested the outcome might be: “at first the uptake is tentative ... but in a few years time, we might well look back with nostalgia to when GPU’s were not the dominant processor for HPC systems” – in other words, hard going initially, but GPU/many-core will “win” eventually. I even ended up with an ambitious promise for my next blog (i.e. this one): “an idea of what/who will emerge as the dominant solution ...

Continuing the basis of using the past to guess the future, my prediction is that the next steady state of HPC processors will be GPU-like/manycore technologies (for most of the FLOPS at least) and, just like the current steady state (x86), those few companies with the strongest financial muscle will eventually own the dominant market share. However, other companies will have pioneered many of the technologies that make that dominant market share possible, enjoying good market share surges in the process.

I can even have a go at predicting some of the path that might get us to the next steady state of HPC architecture. NVIDIA has already shown us that GPUs for HPC are sometimes a good solution – and importantly, that a good programming ecosystem (CUDA) really helps adoption. Over the last year or so, I’d say the HPC community has moved from “if GPUs can work in this case ...” to “how do I make GPUs work across my workload?

As Intel’s Knights processors bring us many-core but with a familiar x86 instruction set, we might learn that getting good performance across a broad range of applications is possible, but critically dependent on software tools and hard work by skilled parallel programmers. AMD’s Fusion with tighter links between CPU & GPU, could show that the nature of the integration between the many-core/GPU unit and the rest of the system (be it CPU, network, main memory etc) will affect not only maximum performance on specific applications, but maybe more importantly the ease of getting “good enough” performance across a range of applications.

I don't know of any GPU/many-core/accelerator announcements from IBM, but it’s always possible IBM will throw in another useful contribution before the dust settles. They were one of the first into many-core processors for HPC acceleration with Cell and they cannot be easily counted out of top end HPC solutions - e.g. the forthcoming Blue Waters (POWER7) and Sequoia (BG/Q) chart-toppers.
But back to my “winner” prediction. When the revolution settles into a new steady state of mostly GPU/many-core for HPC processors, there won’t be (can’t be) critical distinctions between the various products anymore for most applications. Whichever product we consider (whether GPU or x86-based or whatever), many-core is sufficiently different from few-core (e.g. 1-8 cores) to mean that the early winners have been those users who are easily able to move their key applications across to get step changes in cost and performance.

The big winners in the next stages of the GPU/manycore emergence will be those users who can move the bulk of their high-value-generating HPC usage to many-core processors with the most attractive transition (economy and speed) compared to their competitors.

So what about the dominant solution I promised? For the technology to be pervasive, first there must be greater commonality between offerings (I stop short of standardization) so that programmers have at least a hope of portability. Secondly, users need to be able to extract the available performance. Ideally these would mean a software method that makes many-core programming “good enough easily enough” is discovered – and if so, that software method will be the dominant solution, across all hardware.

Or, if the magic bullet is still not market ready, skilled parallel programmers will be the dominant solution for achieving competitive performance and cost benefits - just like it is for HPC using commodity x86 processors today.


Popular posts from this blog

Implied Volatility using Python's Pandas Library

C++ wrappers for the NAG C Library

ParaView, VTK files and endianness