dcsimg

Storage Monitoring via SystemTap

With storage becoming increasingly complex, being able to monitor what's happening with your servers has taken on a critical role. To truly understand what is happening with your storage you may need to monitor what is happening within the kernel.

For some time, I’ve been looking for a simple storage management/monitoring tool for Linux that would cover most of the common hardware. I just want something that allows me to configure/provision storage and allows me to monitor what is going in the system. As part of this, I would like the tool to have a GUI capability. Yes, I admit it, I like to use GUI’s when possible or necessary but I don’t rely on them completely since CLI’s (Command Line Interface) can save the day when you can’t use a GUI. But a GUI can be extremely helpful in getting a snapshot of the status of your system and if there any immediate problems.

So far I haven’t met with much luck finding a simple, but very effective, LVM (Logical Volume Management) GUI, system-config-lvm. The tool can be very useful and is very simple to use as you can read here. But this only satisfies a portion of what I need. It doesn’t help with the management of RAID cards or iSCSI partitions or the underlying physical devices. It only works once the storage partitions appear on the server. Moreover, it lacks any sort of monitoring.

One of the keys to effectively running Linux systems, even desktops, is monitoring. The ability to watch, understand, and even perhaps debug (or at least log) what is going on with the system is very important. Monitoring is particularly important for storage since we’re talking about data. We want to ensure that our data remains safe and also that our access to the data (I/O) doesn’t become a bottleneck.

Compounding the issue, perhaps both in good ways and bad ways, is that fact that Linux is so flexible and configurable. For example, you have several choices for the IO Scheduler and each of these has several “knobs” that can be adjusted if you want or need. RAID cards and media controllers have options that can be tuned. LVM, if you are using it (and if you are not using it, you should seriously think about why), can also be configured, but not really after the volumes have been created. File systems also have options that can be adjusted during formatting, when the file system is mounted, and sometimes even while the file system is running. So how do we decide what parameters (values) to use to ensure that storage is not a problem considering that some of these tuning options are really buried within the kernel?

We need an effective monitoring tool that allows us to see what is going in the system including the kernel. Fortunately, years ago, a number of kernel hackers developed a deep monitoring capability for Linux called SystemTap. It is a very flexible tool that you can use to monitor almost anything you want within Linux, including the kernel. Let’s take a look at SystemTap and what it can possible do for us.

SystemTap

SystemTap is an infrastructure that allows you to gather information about your Linux system even while it is running. It is really a method for gathering data with the ability to watch certain events within the kernel and even watch certain lines of the kernel code. SystemTap has a scripting language that allows you to gather information and/or take action based on some sort of event that you define. These scripts are then compiled into a kernel module and added via the “insmod” command.

Traditionally SystemTap has been used to help debug problems and has been used by developers to watch the execution of their kernel code. But you can easily monitor your running system in a general sense with SystemTap. Having the ability to execute a script based on an event (action) sounds a great deal like monitoring. For example, you could have a script that collects information when a certain even happens and sends it to a log file. Then a user-space application could grab that information and plot it. You could also have a script that sends out an SNMP message, email, or text, or page, if an event happens or if a certain policy is activated.

Using SystemTap is very easy. You can either write your own script or use one of the pre-canned scripts written by SystemTap developers. If you write your own script, just use your favorite editor and create a file with the .stp extension. Then you run the SystemTap command, stap, that will translate it into a kernel module that is then loaded. Remember that you are creating a kernel module so it is entirely possible to write really bad or even destructive scripts that will crash your system.

Comments are closed.