Just last week we walked you though a new tool, agedu
, that allows you to get a snapshot view of your file system. agedu produces a very nice graphical display that provides an overview of the age and size of your data (either change or access time). However, there are times when you need or want more detail on the data that’s sitting in your storage. This time around we’ll look at a new tool, FS_scan, that does precisely that.
FS_scan allows you to recursively scan a directory tree to get a detailed view of your data. In particular, it will tell you the dates and ages of your files, the average ages of the files in a given directory, and it will tell you the oldest files in the directory tree. It also produces a CSV file that you can open in a spreadsheet. With this information you can get a very detailed view of the state of your storage with the ability to do a trend analysis of the resulting data (i.e. How fast is it changing? How often are files accessed? How often is data modified?).
Let’s dive in and see what our data’s doing.
When You Just Need More Details
Remember that when talking about the data on your storage there are three dates (or three ages) that need to be considered: (1) Last date accessed or the access age, (2) Last date modified or the modify age, (3) Date last changed or the change age. So when examining a file system it becomes much more difficult to quantify how data is being used because all three dates or ages can be very important. Agedu is a great tool for getting a quick glimpse of the access age or change age of the file system being examined, but it is only a glimpse of the state of the filesystem. If you want to create a more detailed report or monitor the file system over time for a trend analysis then you need more detailed information than what agedu can provide at this time.
One option for getting more detailed information is to use the stat
command in Linux. It can be used to get the status of files or even the file system. For example the output from stat
looks like the following,
$ stat *
File: `~storage002.html'
Size: 11472 Blocks: 24 IO Block: 4096 regular file
Device: 811h/2065d Inode: 3220767 Links: 1
Access: (0600/-rw-------) Uid: ( 1000/laytonjb) Gid: ( 1000/laytonjb)
Access: 2009-05-24 17:19:52.Modify: 2009-05-24 17:19:52.Change: 2009-05-24 17:19:52.File: `storage002.html'
Size: 11285 Blocks: 24 IO Block: 4096 regular file
Device: 811h/2065d Inode: 3220766 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/laytonjb) Gid: ( 1000/laytonjb)
Access: 2009-05-24 17:13:13.Modify: 2009-05-24 16:02:27.Change: 2009-05-24 16:02:27.
Or you can get a glimpse of the file system status using the “-f” option.
$ stat -f *
File: "~storage002.html"
ID: f11c91747fe09927 Namelen: 255 Type: ext2/ext3
Block size: 4096 Fundamental block size: 4096
Blocks: Total: 37263886 Free: 33094551 Available: 31216553
Inodes: Total: 9396224 Free: 9106153
File: "storage002.html"
ID: f11c91747fe09927 Namelen: 255 Type: ext2/ext3
Block size: 4096 Fundamental block size: 4096
Blocks: Total: 37263886 Free: 33094551 Available: 31216553
Inodes: Total: 9396224 Free: 9106153
Both options provide useful information. The first option, stat
, gives the access, modify, and change dates for the file, as well as the uid, gid, the size of the files, and the permissions. The second option, stat -f
gives additional information including the file system type and the fundamental block size. However, if you want to use the stat
command to gather detailed information you will have to perform these commands for the directory tree, parse the information, and assemble it into a usable form.
Python has a nice module, called the os module that can easily walk a file system and gather virtually all of the same information that the stat
command produces. Even better is that this module is part of the standard library for many of the python packages in many of the distributions. This can easily form the basis of a tool to walk a file system and gather detailed file information.
Python Modules to the Rescue
One of the functions in the os module is called “walk” (os.walk
). This function allows you to easily walk a directory tree (i.e. examine the files recursively in a directory tree) and get information on the directories and the files. From the Python 2.6.2 documentation there is a simple example that has been modified and presented below.
#!/usr/bin/python
import os
from os.path import join, getsize
for root, dirs, files in os.walk('.'):
print root, "consumes",
print sum(getsize(join(root, name)) for name in files),
print "bytes in", len(files), "non-directory files"
This quick code snippet displays the number of bytes taken by non-directory files in each directory under the starting directory (current working directory). This simple snippet can form the basis of a script that can walk through a directory tree and gather information about the files. A quick note - this code snippet does not have any exception handling and it is definitely possible you can encounter exceptions.
With the ability to walk a directory tree, you can open the files in the directory and gather statistics on each file. The os module also has a function (method) called os.fstat
that can give you most of the information that the stat
command produces. Taking the previous example and extending it a bit results in the following example.
#!/usr/bin/python
import os
from os.path import join, getsize
for root, dirs, files in os.walk('.'):
print root, "consumes",
print sum(getsize(join(root, name)) for name in files),
print "bytes in", len(files), "non-directory files"
for file in files:
fileloc = root + "/" + file
FILE = os.open(fileloc, os.O_RDONLY)
junk = os.fstat(FILE)
size = junk[6]
atime = junk[7]
mtime = junk[8]
ctime = junk[9]
uid = junk[4]
gid = junk[5]
print " File: %s size: %s atime: %s mtime: %s ctime: %s" % (file,size,atime,mtime,ctime)
os.close(FILE)
In the second for loop, the full path to the file is created (fileloc
) using the root of the director tree (root
) and the file name (file
). Notice that os.fstat
function returns a list of attributes. For example, it returns the access time (atime
), the modify time (mtime
), and the change time (ctime
), which are all in seconds since the epoch. There are other attributes as well includes the size in bytes (size
), the uid (uid
) and gid (gid
).
The previous example serves as a quick introduction to what you can do with Python using the modules in the standard library. In particular the os
module has many functions that are useful for getting detailed information about files.
Down to Details »
FS_scan: A Tool For Detailed File System Information