Parallel Programming: I Told You So

Will a new feature make C the answer to the parallel programmer's dreams?

Not being one to gloat (cough), I would like to take this opportunity to say, “I told you so.” In case you are wondering what it was exactly I said, I will recap.

I periodically bemoan the state of parallel programming. My experience with parallel programming stretches back to the late 1980′s. At this time, I was working on various parallel programing ideas including Fortran conversion and Logic Programming. The lessons I learned were two fold. First, parallel computing is hard and second we need to re-think how we do things. This opinion is shared by virtually everyone who has worked in HPC over the years. The prognosis is not that great either. Take a look at Greg Pfister’s blog where he writes about comments like Parallelism Needs a Killer Application. By the way, Greg’s other blog posts are worth reading.

In a similar vein, Mike Wolfe of The Portland Group (producers of excellent optimizing compilers) writes in HPCwire about the recent OpenCL fanfare. His article entitled Compilers and More: OpenCL Promises and Potential provides a fair assessment of OpenCL. I like to think of these articles as “buckets of cold water thrown on the masses” who have been whipped up by various media types. Every time I hear the words “parallel programming”, “solution”, “simple”, and “breakthrough” used in a sentence, I usually know it is some kind of sales pitch. In the case of OpenCL, I have read too many times that OpenCL is a simple unifying way to program parallel GPU’s and host processors. Of course, things like CUDA and OpenCL are welcome additions to any GPU programmers tool box. They help quite a bit and are important technologies. Do they solve the general parallel programming challenge? Not really and neither does anything else.

Before I move to the second issue, I want to state that my definition of the “parallel programming challenge” is to allow a programmer to write one program and have it run with reasonable efficiency on multi-core, GP-GPU, or clusters. A tall challenge if there ever was one. I don’t think it is impossible, but it might take re-thinking how we currently do things.

I believe part of the rethinking we need to do is about problem expression and execution. For instance, I have been waving the Functional Programming Flag for a while. In addition, I believe dynamic execution can solve a lot of problems. That is, the program will need to decide how to do things at run-time and not compile-time. The basic idea is to use a functional approach to naturally express parallel parts of a program and create a queue of parallel work. When the program runs, the queue is executed depending on the type of resources available. If there is one core, then the queue is resolved one task at a time, if there are many cores, the queue is solved in parallel. I have explored this idea in a series of three articles I wrote several years ago. ( You Can’t Always Get What You Want, The Ignorance is Bliss Approach, and Explicit Implications of Cluster Computing).

Of course, I don’t propose that I am the only one considering this approach to a parallel computing. Indeed, I have always assumed there are those more adept than me working on variations of this idea. Recently, I was delighted to read that Apple has open-sourced a project called Grand Central Dispatch (GCD). Essentially, Apple believes GCD is a better way to manage multi-core parallelism than using threads and has included it in the latest version of Mac OS X (10.6, Snow Leopard).

What is GCD exactly? As always, a little background is needed. GCD is based on work queues and “C blocks”. Most people get the work queue idea, but “C blocks” are a somewhat new concept for most programmers. C blocks are not a standard feature of the language, but look like function definitions with one big difference — they can capture the state from their surrounding context and save it (read-only) within the block for later execution. Like C function definitions, they can take arguments, and declare their own variables internally.

When a block expression is executed both a reference to the code within the block and a snapshot of the current state of local stack variables at the time of its invocation is created. The code is not executed, but a type-safe opaque value is created that can be assigned to variables, passed to functions, and otherwise treated like a normal language value. One of the way to use blocks is to place parallel blocks in a queue.

GCD implements two types of queues, Dispatch and Operation. The dispatch queues are a C-based mechanism for executing custom tasks. A dispatch queue executes tasks either serially or concurrently but always in a First-in-First-out order. A serial dispatch queue runs only one task at a time, waiting until that task is complete before dequeuing and starting a new one. By contrast, a concurrent dispatch queue starts as many tasks as it can without waiting for already started tasks to finish.

Operation queues are more complex than the First-in-First-out dispatch queues and take other factors into account when determining the execution order. Primary among these factors is whether a given task depends on the completion of other tasks. You can configure dependencies when defining your tasks and use them to create complex execution-order graphs for your tasks in operation queues.

If you are a Mac user, you can play with GCD using libdispatch. The libdispatch project consists of a user space implementation of the Grand Central Dispatch API. In order to implement the full API for Grand Central Dispatch, C compiler support for blocks is required. Contributions to the project will be continually evaluated for possible inclusion in future releases of Mac OS X. The sources are available under the terms of the Apache License, Version 2.0 in the hope that they might serve as a launching point for porting GCD to other platforms.

If GCC gets block support then libdispatch will probably be available to the Linux crowd as well. Apple’s motivation for opening-up GCD is probably to gain more support for what has become a key, but non-standard technology in their kernel.

Here is my take on these developments. Aside from the fact that C is not a functional language, but blocks do retain state, recasting a program using a “work queue” model offers a huge advantage of traditional methods. Executing the queue can be decoupled from the program. Thus, a program can, in theory, run on one core, or one thousand, without having to change any code. Of course, it will run slower on one core, but it will still run. Or, consider a typical compute node with eight cores. Suppose a dynamic program is using all eight cores with a large number of jobs still remaining in the queue. Some simple algorithms can be developed that allow remote cores (other nodes) or even heterogeneous cores (different types of nodes) to be used by the program.

Finally, I’m going to predict that this approach will gain in popularity if C blocks become available in GCC and other compilers. Unfortunately, it will not be due to anything I have written (or will write). Nor will it be due to the queue model, dynamic execution, multi-core, or the open availability of GCD. The biggest driver will be a small, but powerful, addition to the C language called C blocks.

Comments on "Parallel Programming: I Told You So"


You might want to take a look at this blog post – GCD Is Not Blocks, Blocks Are Not GCD – http://www.mikeash.com/?page=pyblog/gcd-is-not-blocks-blocks-are-not-gcd.html

You can use GCD without blocks. You can even compile libdispatch to not support blocks at all.


Looks like you are focusing primarily on native code executing on a single machine. I definitely think most native languages need some serious work in the parallel department. However, most VMs have had threading done well for a long time. Now we have higher order languages like Scala where multi-threading is even simpler. For example, the number of systems to provide parallel processing on the Java VM number in the thousands.

Obviously GCD and parallel C (I like to call them C-Actors) are good things, but let\’s not forget that other technologies out there have been working successfully on a single computer or 10,000 computers for a over a decade.


I only have tiny bit knowledge in parallel programming. To me, it\’s a bit hard to grasp. It looks like IPv6 in network world. Powerful things but few people are willing to touch. Majority still work on IPv4. Same as in programming world, majority still work on VB, Java etc. I believe much improvement needed to integrate into those popular language or it just remains as theory.


Did you have a look at ProActive Programming library http://proactive.inria.fr/ ?
It relies on a formal model which ensure a similar behaviour whatever the object is and provides easy to use Java APIs based on the Active Object concept.

Hey, thanks for the post.Really looking forward to read more. Much obliged.

Thanks for sharing, this is a fantastic blog article.Much thanks again. Great.

nEWuDI It as not that I want to duplicate your web-site, but I really like the style and design. Could you tell me which theme are you using? Or was it especially designed?

great article i like it thanks for sharing it .

“Yet another thing is that while searching for a good on the net electronics retail outlet, look for online shops that are continually updated, trying to keep up-to-date with the most recent products, the most beneficial deals, in addition to helpful information on products and services. This will ensure you are handling a shop which stays on top of the competition and provides you what you ought to make intelligent, well-informed electronics buys. Thanks for the significant tips I have learned through the blog.”

You should take part in a contest for one of the greatest
websites on the web. I am going to highly recommend this blog!

Here is a great Weblog You might Uncover Fascinating that we encourage you to visit.

Leave a Reply