MPI on Multicore, an OpenMP Alternative?

No matter how you cut it, coding for multicore is really just parallel programming.Doug Eadline explains the differences between OpenMP and MPI, when it's smart to use existing code and when it's time to rewrite an application to scale better on multicore systems.

No matter how you cut it, coding for multicore is really just parallel programming. Once you’ve realized that, it’s time to look at the options, whether your existing codebase will scale, or if you need to rewrite your code and how.

As stated in The Multicore Programming Challenge, parallel programming can be difficult. It moves the programmer closer to the hardware and further from their application space or problem. Fortunately, people like rocket scientists have been writing parallel software for quite some time in the HPC (High Performance Computing) sector.

As any good programmer knows, an existing code base can be valuable to current programming projects. First, the possibility of re-using existing code is a major incentive. Also, learning how someone else attacked a similar problem is very valuable.

In the HPC sector, most parallel programs are written using Message Passing Interface (MPI). While MPI is normally used on large computing systems (clusters) it can be also be used on a multicore processor. The “MPI proposition” may seem counter to conventional wisdom as MPI was designed for distributed memory (i.e. each core/processor has it own private memory), whereas OpenMP was designed for shared memory.

The lazy assumption suggests that OpenMP is a better solution because it was designed for shared memory. However, the possibility of re-using an existing MPI code base is worth considering before you spend a month(s) re-inventing the software wheel. Ultimately, the question is really about efficiency. Namely, How does the performance of MPI compare to OpenMP on a multicore system?

The answer to this question is important. If I can re-use MPI codes that work well enough on multicore, then there is no need to (re)write my application using OpenMP. If, on the the other hand, OpenMP or threads provide scaling benefits sufficient enough to justify re-writing the code, then investing the time in re-coding might be in order.

Although your application(s) are always the ultimate test of hardware, a comparison of the same program written in MPI and OpenMP would be interesting. Fortunately for us, the people at NASA (the rocket science guys) have an interest in such things as well. The venerable NAS Parallel Suite is now available in MPI, OpenMP, Java, and HPF.

This enhancement means a head to head comparison of MPI and OpenMP is possible. (I’ll leave the Java and HPF runs as an exercise for the reader). Before we get to the main event however, some background on how OpenMP and MPI differ may be helpful.

OpenMP and MPI Primer

Because native Pthread programing can be cumbersome, a higher level of abstraction has been developed called OpenMP. As with all higher level approaches, OpenMP sacrifices flexibility for the ease of writing code. At its core, OpenMP uses threads, but the details are hidden from the programmer.

OpenMP is implemented as compiler directives in program comments. Typically, computationally heavy loops are augmented with OpenMP directives that the compiler uses to automatically “thread the loop”. This type of approach has the distinct advantage that it may be possible to leave the original program “untouched” (except for comment-directives) and provide simple recompilation for a sequential (non-threaded) version where the OpenMP directives are ignored. (Read the OpenMP Web site to get the complete picture.)

For those who don’t follow software trends, but instead rely on the crack linuxdlsazine columnists to provide them with all the important advances, GCC 4.2 (and later) has support for OpenMP. This is important for the open source crowd, because OpenMP was only available in commercial compilers before GCC 4.2 was released.

GCC 4.2 has not found its way into all distributions, so you may need to download and build it from source if you want to play along with this article. Of course if you have a commercial compiler, it probably already has OpenMP support.

For gcc and gfortran, OpenMP programs can be compiled by including the -fopenmp option. In order to test this new capability, I found an OpenMP version of the ubiquitous matrix multiplication program. I built two versions of the program, one with OpenMP enabled and one without:

$ gfortran -fopenmp -o matmult_omp matmult.f
$ gfortran -o matmult matmult.f

Then I ran the sequential version on an Intel Core 2 Duo system (two cores):

$time ./matmult

real    0m9.079s
user    0m8.988s
sys     0m0.012s

The OpenMP version was run as well. Note that there is a environment variable called OMP_NUM_THREADS that will tell OpenMP binaries how many threads to use. If this is not defined, one thread per CPU (core) is used. Ultimately however, the maximum number of threads may be defined by the program. The OpenMP results for two cores is shown below.)

$ time ./matmult_omp

real    0m4.967s
user    0m9.783s
sys     0m0.018s

The OpenMP version reduced the wall clock time by forty five percent. Astute readers may be wondering, why the user time is almost double the real time. This effect is due to using two cores, i.e. your total CPU time is a sum of the cores your application is uses. As we will see below, the user time can be quite a bit higher than the real time for eight cores.

In contrast to OpenMP, MPI uses a software library to send data from one process to another. Each process has its own memory space and thus MPI is basically a message copying methodology. In addition, MPI makes no distinction where a process runs. It can run on the same machine or on another machine. If one were to time an 8-way OpenMP and MPI program, the following would result (OpenMP is run first.):

$time bin/cg.B
real    1m11.735s
user    9m23.287s
sys     0m2.012s

$time mpirun -np 8 bin/cg.B.8
real    1m16.138s
user    0m0.000s
sys     0m0.004s

In the first case, OpenMP shows a real time of about one minute with user time of almost 9 and a half minutes indicating a good speed up. In the second case, the MPI run shows a comparable real time, but zero user time. This result is easily understood in terms of how MPI jobs are run. The mpirun command starts each separate MPI process and then waits until they are finished, thus no user time. OpenMP jobs, however, share a process space which makes them tractable to the OS.

The Process View

While we are talking about OpenMP and MPI, there’s one big difference between these programming methods in terms of the OS process space. OpenMP programs run as a single process and the parallelism is expressed as threads. This behavior can be viewed quite clearly when using an eight core server (two quad-core processors). For instance, examining a running OpenMP program using top shows only a single process running. (See Figure One)

Figure One: OpenMP program (cg.B) running on eight cores.

In contrast to the OpenMP, MPI actually starts one process per core using the mpirun -np 8 ... command. This situation is shown in Figure Two where an MPI version of the same program is now running. Note the number of processes is now eight. The processor (core) loads are about the same for both, however.

Figure Two: MPI program (cg.B.8) running on eight cores.

One final and subtle point. In OpenMP communication is through shared memory, which means threads share access to a memory location. With MPI programs on SMP systems communication is also through shared memory, but processes send messages by writing from private to shared memory.

Obviously, sharing memory locations seems more efficient than sending copies of memory locations to other processes, but it all depends. In the MPI process model, single processes have exclusive access to all their process memory. For some programs this situation may be more efficient because it is better to copy data (send a message) than to wait for shared memory access. On the other hand, in the OpenMP model, threads can share access to all memory in the process space. In this case, some programs may be much more efficient as the large overhead of copying memory is not needed.

Looking at the Numbers

An eight-core Intel server (two four core Clovertown processors) was used to run the tests. The OpenMP tests used gcc/gfortran version 4.2. The MPI tests used LAM version 7.1.2. The OpenMP and MPI suites have six programs in common and each of these was run five times and averaged (Class B problem sizes were used). The results are given in Mops (million operations per second) in Table One. The percent difference is also shown.

Test OpenMP
gcc/gfortran 4.2
LAM 7.1.2
CG 790.6 739.1 7%
EP 166.5 162.8 2%
FT 3535.9 2090.8 69%
IS 51.1 122.5 139%
LU 5620.5 5168.8 9%
MG 1616.0 2046.2 27%

Table One: Results for the OpenMP/MPI benchmarks. (winning test is in bold)

Tests CG and EP are about the same. Indeed, EP is a good check as both methods should produce a similar result because there is very little communication. OpenMP is the clear winner with FT performance, but MPI does surprisingly better with the latency sensitive IS benchmark. In the fifth test, OpenMP does best with the LU benchmark, while MPI does best with MG. Overall the comparison is a bit of draw.

The results are clear on one point, there is not a definitive winner in this match-up. This result may come as a surprise to those who would assume, OpenMP would easily beat MPI on an multicore machine. (Or any SMP machine for that matter.) Maybe MPI is good enough to stand toe-to-toe with OpenMP for many applications.

In only one case (FT), did OpenMP run away from MPI. In other cases, MPI was a clear winner, and taking the time to convert your code to OpenMP would actually result in a performance loss. The story is far from over, more benchmarks are in order using other hardware and commercial compilers.

Other Things to Consider

Getting back to our question, “do I need to re-code my MPI programs for these multicore thingies?,” the answer is a resounding maybe not. MPI may just be good enough in many cases. Again, more data, and results for your application are needed for more solid recommendations.

Another important question to ask is how scalable your application is. As more processors are added, parallel execution will always hit a point of diminishing returns. This situation means that creating more threads or processes will not improve performance and it may actually hurt performance. The size of your data set may also come into play. One of the advantages of distributed MPI programs is the ability to distribute large data sets over many processors thereby solving problems that would never fit in an SMP memory space.

If you’re considering a writing a new application from scratch, the choice of OpenMP or MPI includes other considerations. OpenMP is designed for shared memory (SMP) machines. As multicore continues to grow the number of processors on an SMP will continue to grow, but OpenMP is not designed to run across multiple machines like MPI.

If you want your application to be portable on clusters and SMP machines, MPI might be the best solution. If, however, you do not envision using more than eight or sixteen cores, then OpenMP is probably one of your best choices if the benchmarks point in that direction. From a conceptual standpoint, those with experience in both paradigms state that using OpenMP and MPI provide a similar learning curve and nuance level. There are no shortcuts or free lunches with OpenMP, or MPI for that matter.

Comments on "MPI on Multicore, an OpenMP Alternative?"

Wonderful story, reckoned we could combine a few unrelated information, nevertheless truly really worth taking a look, whoa did one particular master about Mid East has got far more problerms also.

Ahaa, its fastidious discussion about this bit of
writing at this particular place around this weblog, I have read everything that, so currently me also commenting at this place.

Feel free to visit my web site KevenHGately

Usually posts some incredibly fascinating stuff like this. If you?re new to this site.

I’m gone to convey my little brother, that
he should also go to see this webpage on regular basis to take updated
from most recent information.

Visit my web page: AbdulXHostin

Very couple of web-sites that happen to become in depth below, from our point of view are undoubtedly nicely worth checking out.

Every once in a while we pick blogs that we read. Listed below are the most recent internet sites that we decide on.

The info talked about within the post are several of the most effective readily available.

Usually posts some incredibly intriguing stuff like this. If you are new to this site.

Check beneath, are some completely unrelated sites to ours, nevertheless, they’re most trustworthy sources that we use.

The time to read or check out the material or internet sites we’ve linked to below.

Here is an excellent Weblog You may Discover Intriguing that we encourage you to visit.

I do believe the admin of the web page is actually working hard in favor
of his site, because here every stuff is quality based data.

My weblog … MiloKElshant

One of our guests lately recommended the following website.

Here are some hyperlinks to internet sites that we link to for the reason that we assume they may be really worth visiting.

Below you?ll uncover the link to some sites that we think it is best to visit.

Here is a great Blog You might Obtain Interesting that we encourage you to visit.

I do not even learn how I stopped up below, but I figured this publish
was great. I don’t realize who you’re but certainly you’re visiting
a famous blogger should you aren’t already. Cheers!

Here is my page – IlonaEDedeke

Always a big fan of linking to bloggers that I really like but don?t get a good deal of link enjoy from.

That could be the end of this article. Right here you will locate some web sites that we assume you?ll appreciate, just click the links.

Here are some of the web-sites we advocate for our visitors.

Below you will uncover the link to some sites that we believe you should visit.

Wonderful story, reckoned we could combine a number of unrelated information, nevertheless actually worth taking a appear, whoa did one understand about Mid East has got far more problerms at the same time.

We came across a cool site that you could possibly get pleasure from. Take a look in the event you want.

Every the moment in a while we pick blogs that we study. Listed below are the most current sites that we pick.

Usually posts some really interesting stuff like this. If you are new to this site.

The info mentioned in the post are a few of the very best accessible.

The details talked about in the write-up are several of the ideal available.

Here is a good Weblog You might Locate Interesting that we encourage you to visit.

You are my intake, I have few blogs and rarely run out from to post : (.

We like to honor many other world-wide-web internet sites around the net, even if they aren?t linked to us, by linking to them. Underneath are some webpages really worth checking out.

One of our guests not too long ago suggested the following website.

Here is a good Blog You might Locate Fascinating that we encourage you to visit.

Check below, are some absolutely unrelated web sites to ours, even so, they may be most trustworthy sources that we use.

Always a big fan of linking to bloggers that I love but don?t get a good deal of link enjoy from.

Here is a superb Weblog You might Obtain Exciting that we encourage you to visit.

The information talked about inside the article are some of the top obtainable.

Just beneath, are various totally not connected websites to ours, on the other hand, they may be surely worth going over.

Leave a Reply