dcsimg

Bcache Testing: Large Files and a Wrap-Up

This month we have been testing a new kernel patch named bcache that takes SSDs and uses them as a cache for block devices (with the typical device being hard drives). This article wraps up the testing with an investigation of the throughput of large files and summarizes all the testing to date (and there's a lot of that).

Introduction

This is the fourth article in the testing series of a new kernel patch called bcache. The really cool thing about bcache — and the reason I’ve spent so much time on it — is that it takes one block device and allows it to be a cache for a second block device. What makes this special is that it is intended to take SSD’s and make them cache devices for hard drives or RAID arrays. Conceptually you could take a few inexpensive SSD devices that cost around $100-$150 each, create a RAID-0 array, and use it to cache a RAID-6 or even a RAID-5 array. I have waited for a long time for the ability to cache hard drives using something like an SSD or even a ramdisk to really improve storage performance when performance is the most critical item (and, honestly, who doesn’t like performance?).

In the first performance article it was found that there are some workloads where bcache helped the throughput performance such as record rewrite, random write (not a lot of improvement but it is noticeable), and strided read (some improvement) compared to a single disk.

In the second article we also saw that there are some IOPS workloads that can benefit from bcache. For example, sequential write IOPS (in the case of 128KB records) and sequential read IOPS (usually larger record sizes, particularly 128KB) both saw some reasonable improvements from bcache. But we also saw that bcache hurt random IOPS performance (both read and write).

In the third article we saw that metadata performance is one area where bcache does better than a plain uncached disk. For file create/close operations, bcache was about 10% faster than a plain disk. For file utime operations (change the file’s last access and modification time), bcache also improved performance relative to a plain disk by about 10% for single processes. However, for file stat performance, bcache actually hurt performance relative to the plain disk.

Based on these three sets of results you might say that bcache is something of a mixed bag. It can help certain workloads but also hurt other workloads. You may be disappointed that it’s not helping all workloads, but remember that the patch is new and is still undergoing development. For a brand new patch that affects data it did quite well in that no data was lost and that it has a good portion of the basic functionality in place. At the same time some of the features, such as write through caching, that would help performance for some workloads, aren’t there (yet).

So cut the patch a break but don’t lose heart and don’t sweep bcache under the patch rug. Test, test, and ask for features! (oh and write patches if you are able).

In this last article about bcache testing, I want to examine the influence of bcache on throughput performance when the file is larger than the SSD. Prior throughput testing used files that could easily fit on the SSD itself and I’m curious how bcache behaves when the file is, in essence, “out of cache”, which in this case is the size of the SSD. Then, to close out this series, I want to summarize the testing of bcache to this point.

Large File Performance

In the first article that tested bcache, the file size was only 16GB. The SSD used for testing is 64GB. So theoretically the data could just reside on the SSD and you could get amazing performance. But SSD’s are usually much smaller than hard drives, so I became curious what happens to throughput performance when the file size is greater than the size of the drive it is caching.

The analogy is that when doing storage benchmarks, it always a good idea to understand how performance behaves as a function of memory size. The reason is that many times, the OS will “cache” the data, giving a false impression of the actual performance. By running benchmarks that exceed the memory of the system, you reduce the major cache effects of the OS. For bcache running benchmarks that have files greater than the memory size and greater than the size of the SSD, is a way to test the impact of cache effects on performance.

Rather than repeat all of the prior tests; throughput, IOPS, and metadata, I decided to just examine the impact of large files on throughput performance. As with previous articles, I used IOzone to test throughput performance, but I changed the file size to 128GB, twice the size of the SSD. However, because the file became so large and I wanted to test in a reasonable amount of time, I chose only a record size of 16MB.

The tests were run on the same system as previous tests. The system highlights of the system are:

  • GigaByte MAA78GM-US2H motherboard
  • An AMD Phenom II X4 920 CPU
  • 8GB of memory (DDR2-800)
  • Linux 2.6.34 kernel (with bcache patches only)
  • The OS and boot drive are on an IBM DTLA-307020 (20GB drive at Ultra ATA/100)
  • /home is on a Seagate ST1360827AS
  • There are two drives for testing. They are Seagate ST3500641AS-RK disks with 16 MB cache each. These are /dev/sdb and /dev/sdc.

Only the second Seagate drive was used, /dev/sdc, for the file system. Since the version of bcache I used could not yet cache a block partition, I used the whole device for the file system (/dev/sdc).

The IOzone command line for the tests is,

./IOzone -Rb spreadsheet_output_16M.wks -s 128G -r 16M > output_16M.txt

Using our good benchmarking skills, the test was run 10 times so we can take an average and a standard deviation.

Only two configurations were tested. The first is the disk alone and the second is bcache using the CFQ IO Scheduler. The following section presents the results from the tests.

Large File Throughput Results

Since the tests are fairly simple rather than present them in graphical form, the tables below just list the averages and the standard deviations for the IOzone tests. In particular, Table 1 presents the averages and standard deviations (in red) while Table 2 presents the percent differences between the average performance for the single disk and bcache.

Table 1 – IOzone Throughput Performance (Kilobytes per second) for the two configurations for a 128GB file and a record size of 16MB.

Test Disk Alone Performance
(KB/s)
bcache Performance
(KB/s)
Write 87,952.40
14,789.91
86,902.20
13,620.60
Rewrite 87,571.60
14,741.85
86,462.30
13,702.96
Read 88,124.80
14,562.14
87,046.30
13,552.86
Reread 88,126.70
14,562.08
87,046.30
13,553.83
Random Read 82,992.40
13,253.44
81,911.60
12,423.58
Random Write 81,315.60
13,964.03
78,147.20
11,662.78
Backward Read 85,298.10
13,633.95
84,250.50
13,686.90
Record rewrite 1,474,354.10
20,533.19
1,457,201.90
15,424.32
Stride Read 84,844.20
13,535.86
83,770.30
12,591.25
fwrite 87,882.80
11,185.42
86,402.90
11,629.78
frewrite 88,528.30
11,565.37
86,091.20
11,617.80
fread 88,146.20
11,071.60
87,610.80
11,621.44
freread 88,156.10
11,072.58
86,619.10
11,504.18

The values are fairly close to each other so bcache doesn’t offer any apparent performance benefits for files larger than the SSD for the test run here (throughput testing using IOzone). However, to get a better feel for the differences, Table 2 below lists the percent difference between the disk alone and bcache averages for the workloads tested. Positive values indicate that bcache is faster and negative values indicate that the disk alone is faster.

Table 1 – IOzone Throughput Performance differences for bcache versus the plain disk for a 128GB file and a record size of 16MB.

Test Percent Difference
Write -1.19%
Rewrite -1.27%
Read -1.22%
Reread -1.23%
Random Read -1.30%
Random Write -3.90%
Backward Read -1.23%
Record Rewrite -1.16%
Stride Read -1.27%
fwrite -1.68%
frewrite -2.75%
fread -0.61%
freread -1.74%

Notice that all the values are negative indicating the bcache hurts throughput performance for all the tests run. However the differences are within the standard deviation so it’s difficult to draw any conclusions from the comparison. However, the one observation that can be made is that bcache didn’t make an appreciable difference in throughput performance for any workload tested here when the file size exceeded the size of the SSD.

There may be many reasons for the small differences in performance. Depending upon the workload bcache should pull the data from the disk to the SSD. Since the performance is so close between the disk alone it is likely that the data was either written to or read from the disk to the SSD, so that the performance difference is a result of the overhead of bcache.

Bcache Testing Summary

This has been a long series of articles about the performance of bcache but I think it is important to really dig into a technology that many people, including myself, have been waiting for some time. In many ways this series of articles hasn’t been about the performance of bcache, although that is important, but rather it is about a patch representing a new attempt to improve storage performance by mixing in a smaller amount of very fast but more expensive storage with traditional slower, larger, and cheaper spinning disk.

Recall that we put small amounts of RAM on our hard drives to improve performance without driving up the cost too high. Now we have a new storage technology, SSD’s, that are becoming more mainstream and that fit neatly between RAM and hard drives in terms of performance and price ($/GB). Being able to use SSD’s to cache data to/from hard drives could potentially boost performance for various workloads.

The testing data presented in the last several articles shows that there are some workloads where bcache works well and many workloads where it doesn’t work well. But remember that bcache is still in early development so I’m not surprised that it doesn’t have good performance in certain areas.

When the throughput performance of bcache was examined it was found that there are some workloads where bcache helped. For the record rewrite test, bcache gave almost a 50% boost in performance compared to the single disk for larger block sizes (particularly at 16MB).

However, bcache didn’t appreciably change the write performance for the write and rewrite tests. The difference in averages was well within the standard deviation of the tests. For random writes, bcache was about 7-10% faster than a plain disk but again the difference is within the standard deviation.

For the read and re-read tests, the performance difference between both bcache configurations (CFQ and NOOP IO Schedulers) and the single disk is very small – well within the standard deviation of the tests. For the random read test, bcache gave about a 7.5% increase in performance with decreased as the block size increased. But the most spectacular performance difference was for the stride read case where bcache gave about a 21% improvement in performance over a single disk (for the 4MB block size case).

From the second article, which covered IOPS performance, we also saw that there are some IOPS workloads that can benefit from bcache. For example, sequential Write IOPS performance improved a fair amount. Bcache with the NOOP IO Scheduler at the 128KB record size was about 28% faster than the uncached disk, and at 8KB record size, bcache was about 9% faster. On the other hand, at 64KB record size, the performance is about 14% worse for bcache compared to the plain disk.

For sequential Read IOPS, when the NOOP IO Scheduler was used, at the 8KB record size, the performance of bcache was about 15% better than the disk alone and at a record size of 128KB (again aligned with the bcache block size) the performance is 38% better than the uncached disk. However, at 64KB, the NOOP IO Scheduler produces about 7% worse performance.

On the other hand, random IOPS performance with bcache did not do well. in general, bcache hurt random IOPS (both read and write). For small record sizes, performance degraded by as much as 38-45% compared to the single disk. However, poor random performance was expected since the bcache wiki mentions that its random performance is not likely to be very good.

In the third set of tests, which are the metadata tests, it was found that bcache did help improve performance for certain workloads. For example, the file create/close performance increased by a reasonable amount when using either the CFQ IO Scheduler or the NOOP IO Scheduler, but this was limited to a single process (NP=1). The file create/close performance of bcache with the CFQ IO Scheduler with was about 10% better than the plain disk while the performance with the NOOP IO Scheduler was about 17% better than the plain disk. However, as the number of processes increases, the performance gain from bcache diminishes to the point where for NP=4, bcache actually makes performance worse than the plain disk.

For file stat performance, bcache make performance worse across the tests conducted. For NP=1, bcache with CFQ is about 9% worse than the disk alone test, and bcache with NOOP is about 0.75% worse (actually this is fairly small).

For NP=1 (one process), the file utime performance for bcache with the CFQ IO Scheduler is about 8% better than the plain disk while the file utime performance of bcache with the NOOP IO Scheduler is about 12% better. But, as the number of processes increases to NP=4 (4 processes), the benefits of bcache diminishes. The file utime performance of bcache with the NOOP IO Scheduler is actually worse for NP=4 than the plain disk (about 2% worse) and the file utime performance of bcache with the CFQ IO Scheduler is about 3.8% worse than the plain disk.

So is bcache ready for prime time use? The author of bcache readily admits that it’s not and I agree. Any new patch that has the potential to affect data needs to be tested and vetted a great deal before being accepted into the mainstream kernel. Bcache is under development and represents something new for Linux meaning that performance can change fairly rapidly. Moreover, bcache is still missing some features which could improve affect performance, such as a write-through cache. Despite the newness of the patch, all the tests thrown at it over the last four articles passed without problems.

Another reason I presented so much benchmark data around bcache is that you have the opportunity to contribute to the kernel and influence its direction and performance. By testing bcache over a variety of workloads and providing feedback to the author, you have the opportunity to really influence the direction of new technology in the kernel. Keep testing and keep providing feedback!

Comments are closed.