Now that we've completed our initial examination of the basics of RAID levels (including Nested RAID) it's time to turn our attention to RAID functionality on Linux using software. In this article we will be discussing mdadm -- the software RAID administration tool for Linux. It comes with virtually every Linux distribution and has some unique features that many hardware RAID cards don't.
We’ve spent a great deal of time examining various RAID levels such as RAID-0, 1, 5, and 6, and Nested RAID levels such as RAID-10, 50, 51, 61, and 60 or even the more complicated RAID-100 or RAID-160. In all of these discussions we have assumed there was a RAID “controller” that performed the various RAID operations. This includes sending chunks of data to the appropriate disks, computing parity, hot-swapping, disk fail-over, checking read transactions to determine if the read was successfully and if not, declaring that disk as “down”, plus other important tasks related to RAID. All of these tasks require some sort of computation and have to be performed by a RAID controller.
You really have two options for RAID controllers: (1) a dedicated RAID controller such as those on add-in RAID cards, (2) software RAID that uses the CPU for RAID chores. In the first case you have a dedicated RAID controller, typically on an add-in RAID card but it can be on a motherboard, that performs the necessary RAID computations. This controller typically uses a dedicated lower power processor, often a real-time processor such as a PowerPC, to perform the computations. Typically these controllers are put on an add-in card and you plug the drives you want in the RAID array into the card.
With software RAID, all the RAID functions run on the CPU. Pretty much all Linux distributions comes with software RAID in the form of md (Multiple Device) which is a driver within Linux that provides virtual devices created from one or more underlying devices (e.g. storage block devices). Plus the great thing is that it comes with almost all Linux distributions so you have access to the source (plus the price isn’t too bad either and it has a great deal of functionality). In addition, you can create Nested RAID configurations as needed since all RAID functions are in software (I smell a Triple Lindy coming).
One thing you have to be very careful about is what is commonly called a fakeRAID card or fakeRAID controller. FakeRAID is not hardware RAID because there is no dedicated RAID controller. Rather, they use a standard disk controller chip on an add-in card or motherboard, with some specialized firmware and drivers. At boot-time they run a special application that allows users to configure disks attached to the fakeRAID as a RAID group. But the RAID processing is really handled by the drivers which are run on the CPUs (so the CPU provides the computational power). Consequently, it’s really a software RAID solution and not hardware RAID (hence the name “fakeRAID”).
There is a great deal of discussion and grinding of teeth within the Linux community about fakeRAID. One point of the discussions is that the vendors of fakeRAID don’t tell customers that what they are actually buying is not a RAID card with a dedicated RAID controller, but rather a simple card with a disk controller coupled with drivers that use the CPU for RAID processing (false advertising). Plus there is the additional problem of developing and supporting drivers for Linux to allow these fakeRAID cards to be used. Moreover, there is a strong argument that it is probably better to use software RAID that comes with Linux (md) since it is already part of Linux and can arguably give you better performance.
However, if you want to use software RAID that comes with Linux in the kernel (md), you still need some tools to control/manage/monitor the software RAID arrays. That’s where mdadm comes in. This article will do a brief examination of mdadm and some of its options.
Introduction to mdadm
Mdadm is a software tool primarily written by Neil Brown that allows you to create, assemble, report on, grow, and monitor RAID arrays that use software RAID in Linux. Actually according to the documentation there are seven modes of operation:
- Follow or Monitor
We’ll walk through these various modes of operation but the focus of this article is an introduction and not an in-depth HOWTO. You can find those types of articles on the web.
The first step in using mdadm, or any RAID for that matter, is to PLAN your RAID configuration carefully. Personally I like to work backwards starting with the purpose of the storage. Will it be used for a database? Will it be used for /home? Will it be used for high-speed scratch? Will it be used for data that requires a very high degree of reliability? Understanding the intent of the storage is really the key to creating the RAID configuration you need/want. Once you determine the storage use case, you need to develop an idea of how much I/O performance you need (throughput and IOPS) and the general ratio of read and write performance. You should also develop an idea of how much data redundancy you need for the storage.
Once you have an idea of the performance and redundancy of the array you can select the RAID configuration you think you might need. I would suggest you select a few candidate RAID configurations and then do some more reading/research on each one and select the RAID configuration that seems to be best. During this research be sure to examine the redundancy as well as the performance of the various RAID configurations and compare them to your estimations. But also be sure to examine the capacity and storage efficiency of each level. You may love the performance and storage efficiency of RAID-10 but the data redundancy may not be enough for you. Or you may love the data redundancy of RAID-61 but you may not willing to give up the performance or, perhaps more importantly, you may not be willing to have such low storage efficiency (especially if this is for your home system).
But just choosing the RAID configuration you want is not the end of your planning. You need to also consider a number of other things. Perhaps the most important thing you need to consider is if you will need to grow/shrink the storage. The reason this is important is because you are likely to have to use LVM (Logical Volume Manager) either on top of Linux software RAID or underneath it. This forces you to carefully consider how to build both LVM and software RAID and how you expand either one or both. I would recommend walking through the expansion steps to make sure you understand how to do it (you could even pass along your ideas to someone else to have another pair of eyes examine them).
One other thing you should consider before implementing your well formulated and thought-out RAID plan is the file system that sits on top of the storage. Based on your usage model for the storage, select one or two candidate file systems. Then do some research on each one to find out what problems or limitations exist, and also how you can optimize each file system for better performance (we’re all performance junkies at heart). There are a number of articles on the web that discuss tuning file systems with Linux software RAID.
Assuming that you have done your careful planning (including a backup solution), let’s move on to the first “mode” of mdadm, Creating a RAID array.
Creating a RAID array
Mdadm allows you to create a RAID array using Linux block devices. During the creation of the array, per-device superblocks are created for the RAID array (allows for the array to be assembled correctly). Using the “create” mode is the most common method for building the array and is recommended if you are just starting to use mdadm.
The basic mdadm command for creating a RAID array is fairly simple with the following generic command and typical options.
mdadm --create [md-device] --chunk=X --level=Y --raid-devices=Z [devices]
where the options are as follows:
-c, –chunk= Specify chunk size in kibibytes. The default is 64.
-l, –level= Set raid level, options are: linear, raid0, 0, stripe, raid1, 1, mirror, raid4, 4, raid5, 5, raid6, 6, raid10, 10, multipath, mp, faulty
-n, –raid-devices= Specify the number of active devices in the array.