Destined to become the default file system for the more popular Linux distributions, ext4 is out of experimental mode and gearing up for production environments. Here's what you need to know.
If you have spent enough time around Linux it’s almost certain you know about the file systems ext2 and ext3, and have probably heard of ext4. Get ready to hear some more.
On October 11, 2008, the “experimental” label for ext4 was removed. While this doesn’t necessarily mean that you should change all of your file systems over to ext4 immediately, it does mean that you should consider using ext4 moving forward. With the “experimental” label gone and openSUSE (among others) considering it for the default file system in a late-2009 release, it’s a good time to review ext4 so you have a solid working knowledge of what it is and what features it brings to the table.
A Bit of Linux History
If you go back to the Linux days of yore, you might remember that early distributions used the Minix File System. While MinixFS (my abbreviation) allowed Linux to get up and functioning quickly because a new file system did not have to be developed, it had a few limitations. It used 16-bit offsets internally resulting in a maximum file size limit of 64 Megabytes (MB) and only allowed file names of 14 characters. It’s pretty obvious that this wasn’t an ideal file system so work quickly began on the Extended File System (ext) by Remy Card and others.
ext was added to Linux 0.96c. It was able to handle file systems up to 2GB with files up to 255 characters. But there were still some issues with the file system so work on the second-generation version of ext, called ext2, was begun. It quickly became the most popular file system in Linux with a 4TB file system limit, a 2GB maximum file size, files with up to 255 characters, and 10^18 files. To this day, you can still use ext2 and it’s likely to be used for many years to come.
But just like everything in Linux, the ext2 file system was not standing still. Stephen Tie was evolving ext2 by adding, among other things, a journaling capability. Journaling improves the reliability of the file system and eliminates the need to check the file system after an unclean shutdown. In addition to journaling, the ability to resize the file system while it was on-line was added. Also, since, 64-bit computing was coming quickly, the b-tree algorithm was replaced with h-trees, allowing a larger number of files in a single directory.
Ext3 quickly became, arguably, the most popular file system in Linux. One attribute that contributed to it’s popularity is that you could upgrade from ext2 to ext3 very easily (basically it just added a journal to the existing ext2 file system). So you didn’t lose any data in the upgrade process from ext2 to ext3. By adding a journal to ext2 you increased it’s reliability and also significantly reduced the need for a file system check (fsck) in the event of an unclean umount.
However, ext3 still has limitations that people were not happy with. The biggest complaints were the size of the file systems that is limited to 16TB and the performance was not on-par with other file systems such as XFS and JFS. The first complaint, the limited size of the file system, is perhaps the biggest complaint given that fact you can buy 1.5TB SATA drives and soon will be able to buy 2TB drives. It’s pretty easy to create a simple RAID system in your home system that hits the 16TB limit. But there are other disadvantage as well.
Enter ext4, Stage Left
In 2006, the uber Linux developer, Theodore Ts’o, who was, at the time, the ext3 maintainer, began work on ext4. Unlike ext3, which just added some features to ext2 while keeping the on-line format and approach of ext2, ext4 is a fork of ext3 that is a deep code change affecting the data structures used in ext4 to make it a better file system – faster, more reliable, more features, better code, etc. Ext4 brought ext3 into the world of 64-bits allowing individual files of 16TB (assuming 4KB blocks), as well file systems of 1 Exabyte (EB) by using 48-bit data structures. One EB is the same as 1,048,576 Terabytes (TB).
While past predictions have been wrong about the amount of memory we would need (640KB) as well as storage, it is likely that our home machines won’t get to 1 EB for a long time. But just in case, ext4 is set to go to 64-bits but the surgery to get there is likely to be deep enough to require some fundamental changes in the file system.
From the perspective of many, one of the most positive features of ext4 is that it is backward compatible with ext2 and ext3, allowing you to take the ext2 or ext3 file systems, change a few options, and mount them as ext4 file systems. The existing data is not lost and ext4 will use the new data structures only on new data (pretty nifty feature if you ask me).
Additionly, there is a nice upgrade capability that will allow you to take an ext2 or ext3 file system and upgrade it to ext4 without a loss of data (but — as always! — back up your data just in case). However, ext4 has limited forward compatibility with ext3. That is you can’t always take an ext4 file system and mount it using ext3 because the data structures are completely different.
The hard work that went into ext4 added new features such as, extents, journaling checksumming, block allocation, delayed allocation, faster
fsck, on-line defragmentation, and larger directory sizes (up to 64,000 files). Let’s look at a few of these:
Extents are a feature that describe how the blocks are laid out on the drive in order to store the data for the file.
Ext3 (and ext2) use an indirect method of keeping track of the blocks used by a particular file. This means they have to keep track of every single block. For example, for a 100MB file, you have 25,600 4KB blocks. So for that file, ext3 has to keep track of all 25,600 blocks and how they are ordered.
Ext4 allows the blocks for a particular file to be stored as an extent. An extent is just a contiguous set of blocks. So the file system only has to store two bits of information, the starting block, and how many contiguous blocks are in the extent. Extents also help prevent file fragmentation improving performance because you are storing the data in contiguous blocks. Extents also help with file deletion because you have much less metadata information to change.
One of the big developments in ext3 was the implementation of a journal. A journal is just a list of the changes that need to be done to a file system (e.g. reads, writes, deletes, etc.). So a file system just “plays” this journal to commit the changes to the file system. If there is a crash, the journal, which is stored on disk, is just “replayed” and the file system is brought into a consistent state. But don’t forget that the journal is stored on the disk and is subject to disk failures.
Journaling Checksumming creates a checksum of the journal data so that ext4 can tell if the area of the disk where the journal is kept is failing or going corrupt. This improves reliability but can also improve performance because it allows faster commits of the journal compared to ext3.
Ext3 allocates blocks for a file one at a time (typically using 4KB blocks). For very large files, the associated function that does the allocation will have to be called thousands of times. ext4 uses “multi-block allocation” which allows multiple blocks (hence the name) to be allocated during one function call. This can greatly improve the performance of ext4 relative to ext3, particularly for large files.
In ext3 and other traditional file systems, blocks are allocated as soon as they are needed by a write function. But, in reality they may not be needed right away because the data may be in cache for some time. So delayed allocation allows blocks to be allocated only when they are actually needed to write the data. This can improve the performance of ext4 because during that time the allocator can be optimizing the allocation of blocks to minimize fragmentation and improve performance. It also has great benefits when coupled with extents and multi-block allocations.
Next: Distro Adoption and Creating an ext4 File System