Yet another parallel programing rant. Has the cluster market all but killed parallel tools development?
Years ago there was this ad campaign by the Wendy’s hamburger chain that asked the question Where’s the Beef?. The commercials were rather funny and “Where’s the beef?” has become a way to ask “where is the substance?” or to call attention to the lack thereof. Before GP-GPU, multi-core, and clusters, I have been asking a similar question about HPC development tools. In particular, “Where are the parallel programming tools?” This question has become fundamentally important to the future of computing and the answer is not quite clear.
In the past, vendors learned quickly that in order to sell hardware you need software and to create software you need tools. Every processor has a development platform of some sort. If your market is small, you may have to supply the environment, which might look like a machine code assembler, a C compiler, and a debugger. On the other hand if you sell into a large hardware market, you will be fortunate to have, in addition to your tools, many software vendors that supply various software tools for your hardware. In the x86 world for example, there are too many languages and vendors to list. There are also a huge pile of freely available software tools, of varying quality, from which to choose.
The HPC story has always been a bit different. Back in the day, when you purchased an “integrated” supercomputer (e.g. Cray, Convex) there was a set of sophisticated software tools and compilers that aided software development. These tools were usually part of the system purchase and represented some of the best optimization technology available. When parallel computers and eventually clusters entered the scene, the three key development tools were, a compiler (Fortran or C), MPI, and if you were lucky a parallel debugger. In a way, you were “on your own.”
The change from expensive integrated system to a multi-sourced cluster created a drastic reduction in price, often a factor of 10 or more, but pretty much removed any incentive for commercial parallel programming tools from the component vendors or integrators. Basically, a compiler and MPI libraries (and print statements) were how it was done. It has been well over ten years since clusters hit the scene and with an estimated annual market size getting close to $10 billion, you would think there would be a large incentive to create parallel programming tools. There is some progress, but little has changed over the past two decades.
Admittedly, software tools are a small market and by many standards, the HPC market is “not that big.” There are a small group of developers academic, government, and commercial who actually write and develop parallel software. Many of the top scientific applications have been ported to some form of parallel computer (MPI, multi-core, GP-GPU). The HPC sector is growing and now the rest of the market is going parallel in a big way.
So, “where’s the parallel HPC tools?” There are some. Intel certainly has a vested interest in parallel tools and much of their software focus has been in this direction in recent years. There are other companies working selling/providing tools in this area as well, however, you would be hard pressed to find “stand-alone” companies that exclusively sell HPC tools. I have run into many of these companies in the past, none seem to have survived despite some reasonably good ideas. In a search for current vendors I found the equivalent of a parallel computing ghost town page. It has generally been my experience that parallel software tools are a tough battle, which is partially why the whole parallel computing market is littered with dead companies (this link is from the Internet Wayback machine and may take some time to download).
I attribute the dearth of tools to three issues. The first is lack of economic incentive (i.e. the market is too small and the grad students are cheap). The second is more subtle. In order to sell programming tools you need hard ROI numbers. A good compiler, debugger, or profiler can show a pretty quick return because they work in virtually all cases. Automatic parallelization tools or languages usually work on some cases, but are not as universal and can be a tough sell. And finally, parallel programming is a really hard problem, which is what makes the previous point an issue.
Of course, there is OpenMP, CUDA, OpenCL, MPI, Pthreads, etc., but these tend to move the applications closer to the hardware than before. When I think of parallel tools, I want an application that helps me parallelize my code or allows me to easily express or imply parallelism. And, yes I know about UPC, Cilk, Co-array Fortran, Fortress, Chapel, not to mention, Linda, Haskell, Erlang and everything else. There are several real challenges facing all these efforts. Perhaps the biggest issue is the “type” of parallelism in the market today. There is multi-core, cluster, GP-GPU, or any combination thereof and no unified model for this situation. Although my belief is the GP-GPU will become an embedded “SIMD unit” like the FPU in future processors and handled in some fashion by the compiler.
Make no mistake, the biggest impediment to better tools is the difficulty of parallel programming. It is a tough nut to crack and there does not seem to be any real breakthroughs on the horizon and the “parallel beef” issue is getting bigger. In my estimation it has caused the demise of many companies and is probably the biggest “hold back” for HPC today. We need more ideas and money applied to this problem because very soon a whole lot of other people are going to be asking “Where is the parallel beef?”