dcsimg

Data Replication Using rsync

Having just discussed replication in Linux -- what it is, how it can be used and how it's not the same as a backup -- it's time to tackle a simple example of one of the replication tools: rsync. You will be surprised how easy it is to use rsync to replicate data to a second storage pool.

As with the “everythinglinux” tutorial, let’s go over the options in the script to make sure we understand everything.

  • –verbose: When I start using something new, I like to use the verbose option so I can see what is happening. This option is fairly self-explanatory – rsync will produce extra output (“verbose”) to help explain what it is doing.
  • –progress: The progress option tells rsync to print information showing the progress of the transfer. While it may be pointless to use this option in a script where the output is sent to a file, I like to use it if I suspect there is a problem because I can identify sources of slowdowns. But once I’m happy with a process I usually remove this option.
  • –compress: This is my favorite option because it compresses the data traffic from the sender to the receiver saving network bandwidth and probably reducing the elapsed time for the replication.
  • –rsh=/usr/bin/ssh: This option tells rsync what to use for the remote shell. In this case, I chose to use “ssh” by giving it the full path. Alternatively you could have used the option -e /usr/bin/ssh but I used the “old school” option of “–rsh=”.
  • –recursive: This options tells rsync to recursively go through the source tree (a great option).
  • –times: This option tells rsync to preserve the time stamps on the files and directories on the replicated storage pool.
  • –perms: This option tells rsync to preserve the file permissions on the data in the replicated storage pool.
  • –links: This option tells rsync to copy symbolic links.
  • –delete: This option is a little more controversial. It tells rsync to delete things from the remote rsync server that are also deleted locally with “delete.” This option can be somewhat dangerous if you don’t pay attention to which direction you are using rsync (confusing sender and receiver) or you can lose data.
  • –exclude “*bak”: I left this option in from the example because I wanted to illustrate that you can exclude files during the rsync process. In this particular example it will exclude any files with a “.bak” extension. However, I didn’t need it for my simple example.

For the particular example in this article I didn’t need all of these options but I wanted to leave them in since they help illustrate some of what you can do with rsync.

The final two parts of the rsync command tell rsync what data to synchronize on the rsync sender and where to put it on the rsync receiver. In this particular example, the command is copying everything under the directory /home/laytonjb/Documents (don’t forget it is doing this recursively) and copying to my home server (IP address is 192.168.1.8) into the directory /data/laytonj/rsync_test. Sometimes, in the vernacular of rsync, the first part is referred to as the source and the second is referred to as the destination.

Before running the command I made sure that the directory existed on the destination (192.168.1.8). After that, I just ran the script for the first time.

root@laytonjb-laptop:~# ./runit_rsync
[email protected]'s password:
building file list ...
6241 files to consider
00354107.pdf
     1291267 100%    4.90MB/s    0:00:00 (xfer#1, to-check=6240/6241)
01-IntroToCUDA.ppt
    12302848 100%    5.11MB/s    0:00:02 (xfer#2, to-check=6239/6241)
145213.pdf
     1809831 100%    2.61MB/s    0:00:00 (xfer#3, to-check=6238/6241)_.pdf
      961611 100%    1.04MB/s    0:00:00 (xfer#4, to-check=6237/6241)
AIAA-2007-512-524.pdf
     3015861 100%    1.98MB/s    0:00:01 (xfer#5, to-check=6236/6241)
AIAA-2009-601-842
     4164087 100%    3.33MB/s    0:00:01 (xfer#6, to-check=6235/6241)
CFD_SC07.ppt

...

Number of files: 6241
Number of files transferred: 5938
Total file size: 680013393 bytes
Total transferred file size: 680013393 bytes
Literal data: 680013393 bytes
Matched data: 0 bytes
File list size: 177740
File list generation time: 4.146 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 554389880
Total bytes received: 132474

sent 554389880 bytes  received 132474 bytes  3734157.27 bytes/sec
total size is 680013393  speedup is 1.23
root@laytonjb-laptop:~#

I cut out some of the output since there are so many files (6,241). At the very end is the summary data of the rsync operation. Remember that this is the first time the rsync operation happened so it will transfer all of the data as specified in the script. The next time rsync is invoked it will only copy the parts of the files that have changed.

One other quick observation is that I ran the script as root. Consequently, when I tried to connect to the destination server (192.168.1.8) it did so as root so I needed to use root’s password.

To illustrate what happens when rsync is run again after some files have changed, I edited two files in the source directory tree and re-ran the same script. Below is the output from rsync.

root@laytonjb-laptop:~# ./runit_rsync
[email protected]'s password:
building file list ...
6241 files to consider
FEATURES/STORAGE066/
FEATURES/STORAGE066/notes.txt
        1816 100%    0.00kB/s    0:00:00 (xfer#1, to-check=73/6241)
FEATURES/STORAGE066/storage066.html
        4082 100%    3.89MB/s    0:00:00 (xfer#2, to-check=72/6241)

Number of files: 6241
Number of files transferred: 2
Total file size: 680014698 bytes
Total transferred file size: 5898 bytes
Literal data: 2398 bytes
Matched data: 3500 bytes
File list size: 177740
File list generation time: 0.069 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 178691
Total bytes received: 112

sent 178691 bytes  received 112 bytes  32509.64 bytes/sec
total size is 680014698  speedup is 3803.15

Notice that only two files are listed as being transferred (these are the two files that were modified). This is the beauty of rsync – it only transfers the data that has been changed. This can make replication much, much easier.

At this point I know the specific rsync command works correctly. I could then make a script using this command sending the output to a log file. I could also put this script in a cron job so that it runs every so often (perhaps every hour). But ideally I would like to run it only when the laptop is first plugged into my home network. Without getting crazy about writing a script to check if I’m on my home network, that the home server is up, etc., I could just put the script in rc.local. So when the laptop boots, the script will be run and my data will be replicated onto my home server.

Summary

Data replication can be a very important technology for making sure that you have an up to date copy of your data somewhere in case you lose your primary source of data. In a previous article I mentioned that there are two tools in Linux for data replication – DRBD and rsync. In this article I showed a very simple example of using rsync (a very simple example).

The simple example was just replicating the data from my laptop (primary data storage) to my home server (secondary data storage). More precisely, I was replicating a specific directory from my laptop, recursively, to my home server. I used a simple example from a popular rsync tutorial as a starting point. I modified it slightly but I left in options that I don’t normally use so that you can see what options are available for rsync (the man pages for rsync are HUGE so perhaps this simple example will help jump start your understanding of rsync).

Before anyone gets really upset that I’m stating that rsync is a replication tool and replication is different from backups, let me also say that you can use rsync for backups. An example is here where the author talks about how to use rsync to make daily, weekly, and monthly backups. However, there are some important differences between using rsync for backups and using a true backup utility such as Amanada that you should understand before using rsync as your production backup tool. But if you play your cards correctly, you can use the same tool for backups as data replication.

Rsync is a great and easy to use utility for data replication. It has some really great features such as compression, recursion, the ability to utilize different shells or sockets, and it understands to look for differences in files between rsync operations and only transmit the changes. Not a bad utility if you ask me – not bad at all.

Comments on "Data Replication Using rsync"

Here are some hyperlinks to web sites that we link to simply because we believe they’re really worth visiting.

Here are some links to web-sites that we link to mainly because we consider they are really worth visiting.

We prefer to honor a lot of other web internet sites on the internet, even if they aren?t linked to us, by linking to them. Under are some webpages worth checking out.

I hate reading extensive content, only because i’ve got
a small amount of dislexia, but i actually enjoyed this one

Below you will discover the link to some sites that we believe you must visit.

We prefer to honor numerous other internet web sites around the net, even if they aren?t linked to us, by linking to them. Under are some webpages really worth checking out.

One of our guests not too long ago suggested the following website.

Below you?ll discover the link to some sites that we consider you must visit.

Wonderful story, reckoned we could combine a few unrelated information, nevertheless definitely really worth taking a search, whoa did one study about Mid East has got far more problerms at the same time.

Usually posts some really intriguing stuff like this. If you are new to this site.

Here is an excellent Blog You might Find Intriguing that we encourage you to visit.

Always a large fan of linking to bloggers that I enjoy but don?t get quite a bit of link adore from.

The time to read or check out the content material or web pages we’ve linked to beneath.

Just beneath, are a lot of entirely not related web-sites to ours, nonetheless, they are surely worth going over.

Very couple of internet sites that occur to be comprehensive beneath, from our point of view are undoubtedly effectively really worth checking out.

Wonderful story, reckoned we could combine some unrelated information, nevertheless truly worth taking a look, whoa did one learn about Mid East has got much more problerms also.

Usually posts some very exciting stuff like this. If you are new to this site.

Wonderful story, reckoned we could combine several unrelated information, nonetheless really really worth taking a look, whoa did a single discover about Mid East has got additional problerms as well.

The info talked about in the article are a number of the top accessible.

The time to read or pay a visit to the subject material or web pages we have linked to below.

Although web sites we backlink to beneath are considerably not related to ours, we really feel they’re basically really worth a go by means of, so possess a look.

Although internet sites we backlink to below are considerably not related to ours, we feel they are truly really worth a go via, so possess a look.

The time to study or pay a visit to the content material or web-sites we have linked to below.

The time to read or take a look at the subject material or web pages we have linked to below.

The time to read or check out the content material or web-sites we have linked to beneath.

Please stop by the websites we comply with, which includes this 1, as it represents our picks in the web.

One of our visitors just lately proposed the following website.

Always a huge fan of linking to bloggers that I love but do not get a good deal of link really like from.

I am sure this article has touched all the internet viewers, its really really good article
on building up new website.

One of our guests not too long ago proposed the following website.

We came across a cool website that you could possibly take pleasure in. Take a search when you want.

Although websites we backlink to beneath are considerably not associated to ours, we really feel they are truly really worth a go as a result of, so have a look.

Every when in a though we select blogs that we read. Listed below would be the most current web pages that we pick out.

Leave a Reply