Previously we looked at the throughput performance of bcache by running IOzone on a common SATA disk, an Intel X25-E SSD, and Bcache using the SSD to cache a single drive. This article explores the IOPS performance of the same configuration hoping to find areas where bcache might shine.
In a recent article I presented two new patch concepts (bcache and flashcache) for improving disk performance by using SSD’s as a caching mechanism for hard drives. In reality the caching is achieved by using a block device to cache another block device, but practically it’s using an SSD to cache hard drives. I’ve been waiting for some time for a patch that increases performance by using a cache that is larger than the disk cache and faster than the disk. SSD’s fit that bill pretty well, especially given their fantastic read performance compared to disks.
The second article in the series examined the throughput performance of the bcache patches applied to the 2.6.34 kernel. The performance was not what was expected but that was actually expected (clear as mud – right?). These are early patches and don’t have the performance at this time and have not been tuned for certain workloads. Consequently, performance was not expected to be really good but there were a few bright spots in the throughput performance exploration.
This article is the next in the series and examines the IOPS performance of the four storage configurations:
- Single SATA II disk (7,200 rpm 500GB with 16MB cache)
- Single Intel X25-E SLC disk (64GB)
- Bcache combination that uses the Intel X25-E as a cache for the SATA drive and uses the CFQ (Completely Fair Queuing) IO Scheduler that is the default for most distributions
- Bcache combination that is the same as the previous but uses the NOOP IO Scheduler for the SSD that many people think could help SSD performance.
The details of the configurations are below as are the details of the benchmarks and tests run. Once again, we will be using our good benchmarking techniques in this article.
For measuring IOPS performance I’m going to use IOzone. It has switches that turn measurements into operations per second (IOPS).
IOzone and IOPS
IOzone is one of the most popular throughput benchmarks. It’s open-source and is written in very plain ANSI C (not an insult but a compliment). It is capable of single thread, multi-threaded, and multi-client testing. The basic concept of IOzone is to break up a file of a given size into records. Records are written or read in some fashion until the file size is reached. While it is more commonly used for measuring throughput performance, it can also measure operations per second (IOPS – IO Operations Per Second). More specifically, it can be used to measure sequential read and write IOPS as well as random read and random write IOPS.
For this article, IOzone runs four specific tests and computes IOPS for each. The four tests are:
This is a fairly simple test that simulates writing to a new file. Because of the need to create new metadata for the file, many times the writing of a new file can be slower than rewriting to an existing file. The file is written using records of a specific length (either specified by the user or chosen automatically by IOzone) until the total file length has been reached.
This test reads an existing file. It reads the entire file, one record at a time.
- Random Read
This test reads a file with the accesses being made to random locations within the file. The reads are done in record units until the total reads are the size of the file. The performance of this test is impacted by many factors including the OS cache(s), the number of disks and their configuration, disk seek latency, and disk cache among others.
- Random Write
The random write test measures the performance when writing a file with the accesses being made to random locations with the file. The file is opened to the total file size and then the data is written in record sizes to random locations within the file.
For IOzone the system specifications are fairly important. In particular, the amount of system memory is important because this can have a large impact on the caching effects. If the problem sizes are small enough to fit into the system or file system cache (or at least partially), it can skew results, even for IOPS testing. Comparing the results of one system where the cache effects are fairly large to a system where cache effects are not large, is comparing the proverbial apples to oranges. For example, if you run the same problem size on a system with 1GB of memory versus a system with 8GB you will get much different results.
As with the throughput tests, the IOPS tests used a file size that is twice the size of memory. The goal is to push the file size out of what could be cached by Linux. However, the actual record size within the file is under control of the user so we can effectively prescribe the size of each IO operation when performing the IOPS testing.
For this article a total file size of 16GB was used. Within this 16GB file size, five record sizes are tested: (1) 8KB, (2) 32KB, (3) 64KB, (4) 128KB, and (5) 512KB record sizes. These sizes were chosen because the run times for smaller record sizes were much longer and using our good benchmarking skills of running each test 10 times, resulted in very long benchmark times (weeks). You might laugh at the larger record sizes, but there are likely to be applications that depend upon how quickly they can read/write 512KB records (I quit saying “never” with respect to application IO – I’ve seen some truly bizarre patterns so “never” has been removed from vocabulary.).
The command line for the first record size (8KB) is,
./iozone -Rb spreadsheet_output_8K.wks -O -i 0 -i 1 -i 2 -e -+n -r 1K -s 16G > output_8K.txt
The command line for the second record size (32KB) is,
./iozone -Rb spreadsheet_output_32K.wks -O -i 0 -i 1 -i 2 -e -+n -r 1K -s 16G > output_32K.txt
The command line for the third record size (64KB) is,
./iozone -Rb spreadsheet_output_64K.wks -O -i 0 -i 1 -i 2 -e -+n -r 1K -s 16G > output_64K.txt
The command line for the fourth record size (128KB) is,
./iozone -Rb spreadsheet_output_128K.wks -O -i 0 -i 1 -i 2 -e -+n -r 1K -s 16G > output_128K.txt
The command line for the fifth record size (512KB) is,
./iozone -Rb spreadsheet_output_512K.wks -O -i 0 -i 1 -i 2 -e -+n -r 1K -s 16G > output_512K.txt
The tests were run on the same system as previous tests. The system highlights of the system are:
- GigaByte MAA78GM-US2H motherboard
- An AMD Phenom II X4 920 CPU
- 8GB of memory (DDR2-800)
- Linux 2.6.34 kernel (with bcache patches only)
- The OS and boot drive are on an IBM DTLA-307020 (20GB drive at Ultra ATA/100)
- /home is on a Seagate ST1360827AS
- There are two drives for testing. They are Seagate ST3500641AS-RK with 16 MB cache each. These are
Only the second Seagate drive was used,
/dev/sdc, for the file system. Since the version of bcache I used could not yet cache a block partition, I used the whole device for the file system (