Get your wetsuit on, we’re going data diving. Throughput benchmarks using IOzone on a common SATA disk, an Intel X25-E SSD, and Bcache, using the SSD to cache a single drive.
Caching is a concept used through computing. CPUs have several levels of cache; disk drives have cache; and the list goes on. Adding a small amount of high-speed data storage relative to a large amount of slower-speed storage can make huge improvements to performance. Enter two new kernel patches — bcache and flashcache — that leverage the power of SSDs.
One the coolest file systems in User Space has got to be GlusterFS. It has a very unique architecture that allows it to be configured for specific storage requirements and scenarios. It can be used as a high-performance parallel file system, or a cloud based file system, or even a simple NFS server. All of this in user-space. Could GlusterFS represent the future of file system development for Linux?
Have you ever wanted to look inside a tar.gz file but without expanding it? Have you ever wanted to just dump files in a .tar.gz file without having to organize it and periodically tar and gzip this data? This article presents another REALLY useful user-space file system, archivemount. It allows you to mount archives such as .tar.gz files as a file system and interact with it using normal file/directory tools.
Userspace file systems are one of the coolest storage options in Linux. They allow really creative file systems to be developed without having to go through the kernel gauntlet. This article presents one of them, SSHFS, that allows you to remotely mount a file system using ssh (sftp).
Have you been looking for open-source storage management tools that are easy to use and provide a graphical representation of your storage. Alas, there are no comprehensive tools but there are graphical tools that you can pair with command-line wizardry, particularly LVM.
It’s common knowledge that Linux has a fair number of file systems. Some of these are unappreciated and can be very useful outside their “comfort zone”. OCFS2 is a clustered file system initially contributed by Oracle and can be a great back-end file system for general, shared storage needs.
Having file systems in the kernel has its pros and cons. Being able to write file systems in user-space also has some pros and cons, but FUSE (File System in Userspace) allows you to create some pretty amazing results. This article takes a very brief look at user-space file systems and FUSE.
In a recent walkthru we outlined the steps for taking an existing server and converting it into a NAS box. That article assumed that you already installed Linux on the server and you will maintain that installation (i.e. updates, security, etc.). This article takes examines an alternative: a dedicated NAS distribution called OpenFiler that allows you to very simply create a stand-alone NAS box that can be administered over the web.
If you blinked you might have missed the announcement of the new 2.6.34 kernel. Things have been happening very quickly around file systems and storage in the recent kernels so it’s probably a good idea to review the kernels from 2.6.30 to 2.6.34 and see what developments have transpired.
Standalone Network Attached Storage (NAS) servers provide file level storage to heterogeneous clients, enabling shared storage. This article presents the basics of NAS units (NFS servers) and how you can create one from an existing system.
Mmmm…. bacon. This article examines two mechanisms to prevent data loss — write barriers and check summing. Both can be particularly important for drives with larger and larger caches. Pay attention: This can save your data bacon.
Last article we introduced the SMART capabilities of hard drives (who knew your drives were SMART?). In this article smartmontools, an application for examining the SMART attributes and trigger self tests, is examined.
Did you know your drive was SMART? Actually: Self-Monitoring, Analysis, and Reporting Technology. It can be used to gather information about your hard drives and offers some additional information about the status of your storage devices. It can also be used with other tools to help predict drive failure.
Sometimes you just have to get excited about what you can buy, hold in your hand, and use in your home machines. Let’s look at some cool storage technology that the average desktop user can tackle.
Did you ever see one of those terrible Sci-Fi movies involving a killer Octopus? Ceph, while named after just such an animal, is not a creature about to eat an unlucky Spring Breaker, but a new parallel distributed file system. The client portion of Ceph just went into the 2.6.34 kernel so let’s learn a bit more about it.
Metadata performance is perhaps the most neglected facet of storage performance. In previous articles we’ve looked into how best to improve metadata performance without too much luck. Could that be a function of the benchmark? Hmmm…
In the last couple of articles we have talked about using strace to help examine the IO profile of applications (including MPI applications; think HPC). But strace output can contain hundreds of thousands of lines. In this article we talk about the using a tool called strace_analyzer to help sift through the strace output.
One of the sorely missing aspects of storage is analyzing and understanding the IO patterns of applications. This article will examine some techniques for performing IO profiling of an application to illustrate what information you can gain.
POSIX IO is becoming a serious impediment to IO performance and scaling. POSIX is one of the standards that enabled portable programs and POSIX IO is the portion of the standard surrounding IO. But as the world of storage evolves with greatly increasing capacities and greatly increasing performance, it is time for POSIX IO to evolve or die.
When you’re hot, you’re hot. And SSD’s are hot right now. Let’s review recent developments in SSD hardware and to see where the technology is headed. Prepare to drool over new hardware!
Turning from Metadata performance to throughput performance, we examines the impact of journal size on ext4 when the journal is disk-based. Dig into the numbers and see what you can do to improve throughput performance.
The past couple of weeks we ran the numbers on metadata performance for ramdisks and hard drive-based journals for ext4. Now let’s compare/contrast the two journal devices and see what trends emerge.
Previously, we examined the impact of journal size using a separate disk on metadata performance as measured by fdtree. In this follow-up we repeat the same test but use a ramdisk for the journal, thereby boosting the best performance. Or does it?
Recently we saw that the journal device location, unfortunately, didn’t make much of a difference on ext4 metadata performance. But can the size of the journal will have an impact on metadata performance? The first in a series of articles examining the journal size and performance.
In the quest for more performance there are two new standards for SATA and SAS focused on doubling current throughput to 6 Gbps. While the standards may sound like a nice potential boost don’t expect individual hard drives to increase in performance.
In the never-ending quest for more performance, we examine three different journaling device options for ext4 with an eye toward improving metadata performance. Who doesn’t like speed?
It’s the end of the year and that means it’s time to either make predictions for the coming year or review the highlights from the past year. This article takes a look at the cool things that happened around storage in the past year and perhaps hints at some things in the coming year.
Everyone loves a shiny new kernel. The latest one, 2.6.32, was released on Dec. 3 and there are some nice updates/fixes for file systems and IO in general. But there is a very important change for the CFQ IO scheduler that you need to understand.
The SuperComputing Conference/Exhibition is always a great conference for learning about storage trends in the HPC world. This year the alert attendee could spot two emerging trends: smaller companies developing innovative storage solutions and the rise of flash storage units.
Cloud Storage — while perhaps not the best label ever invented — holds promise for the massive future storage requirements looming on the horizon. And does it at a very good price/performance ratio. This article takes a quick look at the concepts and the challenges of Cloud Storage.
iSCSI is one of the hottest topics in Storage because it allows you to create centralized SANs using TCP networks rather than Fibre Channel (FC) networks. Get a handle on the main iSCSI concepts and terminology.
The last article talked about the anatomy of SSDs and the origins of some of the their characteristics. In this article, we break down tuning storage and file systems for SSDs with an eye toward improving performance and helping overcome some of the platform’s limitations.
SSDs (Solid-State Drives) are a hot topic right now for a number of reasons; not the least of which being their power to performance ratio. But to better understand SSDs you should first get a grip on how they are constructed and the features/limitations of these drives.
A fairly common Linux storage question: Which is better for data striping, RAID-0 or LVM? Let’s take a look at these two tools and see how they perform data striping tasks.
The last article was a quick overview of the 4 schedulers in the Linux kernel. This article takes a closer look at the Completely Fair Queuing (CFQ) scheduler and how you can tune it.
The Linux kernel has several different IO schedulers. This article provides an introduction to the concept of schedulers and what options exist for Linux.
We finish off our IOzone performance exploration of the major Linux file systems. This time adding ext2, jfs, xfs, btrfs, and reiserfs. Let’s take a look at the numbers.
One of the hottest topics in the enterprise storage world is deduplication. We take a look at the technology behind the concept and discuss where it is best applicable in your storage strategy.
While metadata performance is important, another critical metric for measuring file systems is throughput. We put three Linux file systems their paces with IOzone.
More performance: We add five file systems to our previous benchmark results to creating a “uber” article on metadata file system performance. We follow the “good” benchmarking guidelines presented in a previous article and examine the good, the bad and the interesting.
Backups are a technology or process that everyone — everyone! — needs to consider. This article looks at some on-line backup options for Linux that can apply to the spectrum of home to enterprise-class users.
Linux comes with software-based RAID that can be used to provide either a performance boost or add a degree of data protection. This article gives a quick introduction to Linux software RAID and walks through how to create a simple RAID-1 array.
Benchmarking has become synonymous with marketeering to the point it is almost useless. This article takes a look at a very important paper that can demonstrate how bad it has become and makes recommendations on how to improve the situation.
Jeff Layton talks to Valerie Aurora, file system developer and open source evangelist, about a wide range of subjects including her background in file systems, ChunkFS, the Union file system and how the developer ecosystem can chip in.
There is a new file distributed file system in the staging area of the 2.6.30 kernel called POHMELFS. Sporting better performance than classic NFS, it’s definitely worth a look.
Ramdisks can offer a level of performance that is simply amazing. More than just a tool for benchmarking, there are new devices that utilize ramdisks for a bit of the ultra-performance.
Who knew that compression could be so useful in file systems? SquashFS, typically used for embedded systems, can be a great fit for laptops, desktops and, yes, even servers.
The 2.6.30 kernel is chock full of next-gen file systems. One such example is NILFS, a new log-structured file system that dramatically improves write performance.
Need details on your file system’s data? FS_scan allows you dig deep into your storage, giving you the ability to perform trend analysis on the results.
The vast of amount of data being stored in this day and age, naturally leads to files sitting unused for longer and longer periods of time. A new app, agedu, can quickly tell you what data on your filesystem is lying fallow.
Destined to become the default file system for the more popular Linux distributions, ext4 is out of experimental mode and gearing up for production environments. Here’s what you need to know.
Silence the struggle around cluster software stack configuration. Caos NSA is a distribution that focuses on making things simple, easy to install and upgrade, and easy to manage.
NFS frees you from proprietary file systems and, coupled with Infiniband, is the only standard file system that can be used for high-peroformance distributed processing.
While strace is often used for troubleshooting and debugging, you can also use strace to get started on examining the I/O pattern of your serial codes.
Getting the most out of your cluster is always important. But how exactly is that done? Do you really need to dissect your code and analyze every instruction to get optimal performance? Do you need to build custom kernels? Not necessarily. By testing some basic assumptions, you may be able to eke ten-node performance out of an eight-node cluster. Here’s how.