Forget the glossy data sheets and single number benchmarks. Get the right information to make the right decisions.
Today’s HPC practitioners face a multitude of choices. They have some big ones: what brand to buy, what processor for their servers, what storage systems to use. Add to this, the growing choices with network technology. Should one use high speed Ethernet (10 GigE or even GigE), InfiniBand™, or some other network technology? While each has it’s place in the market, for high performance computing clusters, InfiniBand is quickly becoming the default choice among those with new deployments. In fact, on the Top500 list, Infiniband has increased from the tenth most popular interconnect to the second most popular over the last five years (GigE is still the most popular). There are several key reasons why Infiniband is a smart choice in HPC, and why asking the right questions is so important.
Of course we have all witnessed the discussion about network latency and bandwidth, but there is more, much more to consider. Four key questions come to mind when considering an interconnect such as InfiniBand. We will discuss these below.
What Are All The Inter-process Communications (IPC) Numbers?
Check the latency, the fastest time in which a single byte can be sent, and the the bandwidth, the maximum data rate (MBytes/sec), but don’t forget the N/2 number as well. The N/2 number is often an overlooked parameter. The N/2 packet size is the size of the packet that reaches half the single direction bandwidth of the interconnect (It is a measure of how fast the throughput curve rises). The smaller the number, the more bandwidth (speed) that small packets will get. An example is shown in Figure One below.
Figure One: Definition of N/2 rate
The important aspect of the N/2 number is its influence on message size. If your applications send messages in the N/2 region, then variation in the N/2 parameter can greatly affect performance. That is, because the slope of the line is large, a small variation can result in a large change in throughput.
Another number to check is the messaging rate, defined as the numbers of messages transmitted in a period of time. Message rate is important as the number of cores increases in HPC servers. The more cores, the more messages that must be sent through a single interface. A poor messaging rate will leave the cores waiting for communication and hurt performance.
As an example consider the industry leading QLogic QLE7280 PCIe HCA. It has an OSU Message Rate of 26 million messages per second, an HPCC Random Ring Latency (128 cores) of 1.1 μseconds, and a maximum throughout of 1950 MBytes/sec. From a performance perspective, these numbers are extremely impressive, but they do not tell the whole story. Other issues, in addition to IPC should be considered as well.
Is There File-System and Industry Standards Support?
In many cluster designs, all file system traffic is sent over a secondary NFS network. This network was in most cases Ethernet and uses standard kernel IP protocols. Huge performance gains can be realized by using parallel file systems (e.g. Lustre, HP SFF, and IBM GPFS) that are directly supported by a high speed interconnect.
Specifically, NFS over GigE was the preferred method for compute clusters to share data through a common file system. However, NFS creates two critical problems: bandwidth and scalability limitations. To overcome this shortfall, parallel file systems were developed. By providing support for native parallel file systems in the software stack large gains in performance can be obtained. Indeed, by working directly with the file-system protocols, the same IPC advantages mentioned above can be delivered to end users requiring fast I/O. Interoperability can assured by checking if your Infiniband vendor supports the InfiniBand Trade Association IBTA 1.2 specification. In addition, support for the open industry software stack maintained by the Open Fabrics Alliance also provide tools and drivers for the latest file systems and interconnects.