Round Two of the OpenMP-MPI Smack-Down

Ready for the HPC MMA battle? Of course I mean a Memory and Messages Assessment

If you have read any of my past columns, you will notice I like to test assumptions and try obvious things. I often find that things do not always work the way people expect. As more and more cores show up in processors, one of the burning questions I have is; “How does the performance of an MPI program on an SMP node compare to a similar OpenMP program on the same node?”

The question is import. Nodes may have 24 or even 48 cores in the near future. Most codes use less than 32 cores, so why bother with the cluster? (That is a subject of another column.) Some may assume that a threaded OpenMP approach may be a slam-dunk in a shared memory (SMP) environment. Of course, I like to test this idea because rewriting code is not the best use of our time.

In a previous column, I had an opportunity to test OpenMP and MPI on a brand new 8-core SMP machine. My interest was to see how well MPI codes worked on an SMP platform. The actual machine was an 8-way Intel server that used two Clovertown processors (4-cores per socket). I also understood that many things other than the programming language (i.e. threads or messages) could effect the result, but I just wanted to get a feel for what would happen.The results were rather interesting (see the column). and there was no clear winner. The OpenMP should “blow away” MPI on an SMP assumption did not hold up.

Recently, I had access to a new 12-core Intel Xeon (dual X5670 processors at 2.93GHz) machine with 48 GB of DDR3 memory and the Intel 5520 chipset. It came preloaded with Red Hat 5 and was running kernel 2.6.18-128.el5. In terms of programming software I used Red Hat gcc/gfortran 4.1.2 (with OpenMP support) and Open MPI version 1.2.7. These are the “stock” versions that came with the install. I decided it was time to get another data point (or points) for my MPI vs. OpenMP tests. This time I had a different CPU, memory architecture, MPI, and compiler (not sure how different the compiler is, however.)

As I did previously, I used the NAS benchmark suite (version 3.2). You can find a description of the tests on the website. The NAS suite has the same programs written in both MPI and OpenMP so an “apples to apples” comparison is possible, although it should not be taken as an exact comparison as it is always possible to optimize for a given language.

The results of the NAS suite are reported in MOP/Second (Million OPerations per Second). The higher the better. With exception of IS (integer sort) the results are really floating point operations per second and represent performance on various math kernels used in aerodynamics. Each test was run three times and the average is reported.

The results are in Tables One and Two below. The winner is in bold and the percent difference between the two scores is given. I first ran the tests using the B level which determines the size of the tests. I used eight cores because most of the NAS MPI tests work best with a power of two for the number of processes. (Some tests require a square power of two; 4, 16, etc., and were not run). There were actually 12 cores available, but trying to use all of these would further reduce the number of possible MPI benchmarks.

Test MPI OpenMP Percent
CG 4342 3297 32%
EP 273 269 2%
FT 8359 9808 17%
IS 518 420 23%
LU 14443 16441 14%
MG 11470 11131 3%

Table One: MPI and OpenMP results for NAS suite B level tests on eight cores. (Million Operations per second, higher is better)

I then ran the C level (bigger problem size) and found that two tests did not run. In this case, I had four remaining tests with good data. As the problem size got bigger, there was no change in the leaders, but some of the differences changed quite a bit.

Test MPI OpenMP Percent
CG 3638 2910 25%
EP 276 267 4%
IS 501 299 68%
LU 13995 15697 12%

Table One: MPI and OpenMP results for NAS suite C level tests on eight cores (Million Operations per second, higher is better)

In comparing with my previous results, we see an interesting flip. First, CG went from being 7% faster with OpenMP (previous results) to an hefty 32% faster using MPI (current results). FT still works best using OpenMP but the gap is now much smaller. Similarly, IS is still way ahead using MPI, but the gap is narrowing, while LU and EP are about the same in terms of differences. Finally, the OpenMP version of MG is working much better and gained quite a bit of ground on the MPI version. Also note the overall improved performance over the Clovertown results.

As in my previous column, I conclude with “it all depends.” There are many variations in terms of compilers, processors, memory architecture, and not to mention your code. The golden rule of HPC, “test your codes,” certainly still holds because many assumptions do not.

I also wanted mention that improving MPI performance on SMP nodes has been recognized by both the Open MPI and MPICH2 teams. each version now employs KNEM, a Linux kernel module enabling high-performance intra-node MPI communication for large messages (i.e. to improve large message performance within a single multi-core node). This can only mean one thing — more testing.

Comments on "Round Two of the OpenMP-MPI Smack-Down"


Did you happen to use numactl?

In my testing, it pairs more naturally with MPI, as each MPI process can be bound to a particular node (-N x) and then have all memory allocations be local (-l); on the Nehalem — and likely any NUMA — systems this can be a big win by ensuring processes stay close to their memory. As MPI has much more explicit ‘my’ memory vs. ‘your’ memory than OpenMP, this type of binding is easy to implement.

I love your blog.. very nice colors & theme. Did you make this website yourself or did you hire someone to do it for you? Plz answer back as I’m looking to construct my own blog and would like to find out where u got this from. kudos

that is the end of this write-up. Right here you

eS7MJ4 ahtorrsakjfz, [url=http://mgmyazimikft.com/]mgmyazimikft[/url], [link=http://rgehkjirzzdl.com/]rgehkjirzzdl[/link], http://jhammfwsjpwl.com/

LKlNv5 Really enjoyed this article.Thanks Again. Awesome.

Hey there, You’ve done an incredible job. I’ll definitely digg it and personally suggest to my friends. I am sure they will be benefited from this website.|

I really enjoy the blog article.Really looking forward to read more. Want more.

Really informative article.Really looking forward to read more. Awesome.

Enjoyed every bit of your blog article.Thanks Again.

Thanks a lot for the article.Much thanks again. Great.

Howdy, i read your blog from time to time and i own a similar one and i was
just curious if you get a lot of spam comments? If so how do you reduce it, any plugin or anything you can suggest?
I get so much lately it’s driving me crazy so any help is very much appreciated.

This excellent website certainly has all of
the info I needed about this subject and didn’t know who to

Take a look at my page – AltheaJByner

Please pay a visit to the websites we adhere to, including this one particular, because it represents our picks through the web.

Howdy! I was able to have sworn I’ve gone to your website before but after checking out lots of
the articles I realized it’s new to me. Anyways, I’m definitely delighted I found it and I’ll be bookmarking it and checking back regularly!

My weblog: JayZBorrow

uMsi3r gbkovpsnkxfu, [url=http://tovooorjzgvv.com/]tovooorjzgvv[/url], [link=http://zlxgemsminyx.com/]zlxgemsminyx[/link], http://fdsspjidzpad.com/

Description: The best prices for electronics, household goods, furniture and considerably more. One Way Shopping offers easy online comparison shopping.

Description: The best prices for electronics, household goods, furniture and considerably more. One Way Shopping offers easy online comparison shopping.
Moreover, the company should likewise specialize in manufacturing custom challenge coins according to your specific desires. As the Internet technology has advanced, now orders can be located online with total ease.
Health workshops, corporate gym membership and counseling with the management of stress are a few practical methods that are recognized by have produced very good results for organizations. The cost of the implementation of such activities is comparatively a lesser amount than the one incurred by the extended leave of a hardworking tired employee.
Make sure to analysis your prospective program thoroughly in order you realize what you might receive. Insurance Quote Chico: Don’t Cope with it Lightly Be sure to realize each insurance quote Chico you get. Oftentimes, you may be comparing plans that happen to be not the identical.
The death benefit and money value of the policy are from the efficiency of one’s investments, so there is substantial more danger involved having a variable life insurance Chico policy.

Leave a Reply