Creating a Backup Server with rsync and Ubuntu

Keep your files and database safe without extra backup cost

Maintaining the backup of files is very important so as to recover from any disaster like server crash or server hack. Not only files but also the database backup is a must. Server administrators often forget to backup the configuration files which has to be added when a fresh installation of the server is done.

Experience: I got a mail for our customer support team stating that a user is not able to log in. I checked my database and found that the account is merged with some other user's account, which happened due to buggy code. Finall I had to go through two backups of my database, retrieve the data and rectify the user details manually. Thanx for the backup that I take hourly!!!

Now, to the point. There are lots of tools available for taking the backup of the files ("files" include - database dumps, code files, css, js, images, etc.) Following are some of the solutions that I considered before turning to rsync. Client - The system whose files we have to back up. Server - The system where the files will be backup/stored.

  1. scp: It copies the files from source to destination using a secure connection. The major drawback is that it replaces the whole file and does not check whether the file is modified or not i.e. it does not make a differential backup.
  2. Bacula: Bacula is so complex to configure, that finally, I gave up. We have to configure client and backup servers both.
  3. unison: Unison checks the difference in the files at the server and the client both. It is somewhat slow in making the backup. Both unison and rsync work on rsync protocol.

rsync

Rsync is a protocol used to make backup remotely or locally. It is best suitable when we have to make the differential backup over the internet. The interesting thing that I learnt while using rsync is that it does not need to specify whether the path to backup is a directory or the file. Here, I'll show you how to copy the files from the client to the server. Firstly we need to install rsync on the client and the server.

sudo apt-get update
sudo apt-get install rsync

Rsync is not a secure protocol. In order to make it secure we have to use SSH (Secure Socket Shell) along with rsync. The basic rsync command, that is used to backup the files over network looks as shown below..


rsync -avz /client/directory/to/backup server_user@192.168.0.100:/server/backup/directory

The details of above command are explained below

a: archive is a quick way of saying you want recursion and want to preserve almost all attributes of the source files.

v: This option increases the amount of information you are given during the transfer (rsync works silently without it).

z: With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmitted.

The first argument is the client directory to backup 192.168.0.100 is the IP of the server where backup will be made followed by ":" which is followed by the backup directory path on backup machine. Run this command over your terminal changing the parameters to actual directory locations and the actual IP address along with server's user name. It will prompt you for the password of the server. Enter the password and you will see the files being transferred! Now the same command can be modified to make it work locally


rsync -avz /directory/to/backup /backup/directory

The above commands do not use ssh. Thus the data transferred while making the backup is vulnerable to attack. In order to make it secure we'll use SSH. The above command can be modified in following manner to use the SSH.


rsync -avze ssh /client/directory/to/backup server_user@192.168.0.100:/server/backup/directory

Where,

e: specify the remote shell to use ssh

ssh: Secure Script Shell

Each time you run this command over your terminal you'll be prompt to enter the password for server_user. But while writing script for "cron job" or any automatic process which will make the backup we'll have to avoid the password prompt. This can be done using the "ssh-keygen" without "passphrase". Note: Please read SSH for rsync before proceding. Following is the bash script that I use to make backup of live server at client, i.e. my system.


#!/bin/bash

# Configurations
clientSshPort="22"
clientUser="root"
clientIp="109.74.192.113"
client="$clientUser@$clientIp"

backupDirectory="/extra/rsync_simulate/"
logFilePath="/jddata/backupLogLive.log"

clientBackupDirectoryList=(
 	"/var/log/apache2"
 	"/var/backup/db"
 	"/var/www/grabhouse.com/grbhstatic"
 	"/var/www/staticImages"
 	"/var/www/yii_framework"
 	"/var/www/grabhouse.com/blog"
 	"/var/www/grabhouse.com/protected/runtime"
 	"/etc/php5/apache2/php.ini"
)

# Log the timing of sync
now="$(date +'%Y_%m_%d_%H:%M:%S')"
echo "Syncing started at $now"
echo "Syncing started at $now" >> "$logFilePath"

for directory in ${clientBackupDirectoryList[@]}
	do
		echo $directory
		/usr/bin/rsync -avz -e "ssh -p $clientSshPort" --delete "$client:$directory" "$backupDirectory" >> "$logFilePath"
	done

# echo $directory
now="$(date +'%Y_%m_%d_%H:%M:%S')"
echo "Syncing ended at $now" >> "$logFilePath"
echo "Syncing ended at $now"
echo "----------------------------------------------------------------------------------" >> "$logFilePath"

You might have noticed --delete paramtere is the rsync command. The --delete parameter deletes any file that is deleted at client and is present at the backup server, thus achieving high synchronization. It is bad practice to use the configuration in the same file as the script file. So the following code defines two different file. One for the script and other for the configuration. Also, if you want to backup multiple clients this method of seperation of the files is best suited. The script is written in Python. script.py


#!/usr/bin/env python

import os
import datetime
import subprocess

if __name__ == "__main__":
    config = {}
    execfile("config.conf", config) 
    logFile = open(config['logFile'], "a")
    logFile.write("Rsync started at - " + datetime.datetime.now().strftime('%Y/%m/%d %H:%M:%S') + "\n\n")

    for key, value in config["clients"].iteritems():
    	for fileKey, fileValue in enumerate(value["files"]):
    		command = "rsync -avz -e 'ssh -p " + value['port'] + "' --delete " + value['user'] + "@" + value['ip'] + ":" +  fileValue + " /extra/rsync_simulate/"
	    	# os.system(command + ">" + config['logFile'] + "2>&1")
	    	output = subprocess.check_output(command, shell=True)
	    	logFile.write(output)

	logFile.write("\nRsync ended at   - " + datetime.datetime.now().strftime('%Y/%m/%d %H:%M:%S') + "\n")
	logFile.write("---------------------------------------------------------------------------------------\n")
	logFile.close();

The configuration file config.conf looks as below


clients={
        1:{
                'port':'6983',
                'ip':'192.168.0.123',
                'user':'root',
                'files':[
                        "/var/backup",
                        "/var/www",
                        "/var/log",
                 ],
                'location':'/var/backup/website',
        },
        2:{
                'port':'6983',
                'ip':'192.168.0.124',
                'user':'root',
                'files':[
                        "/var/backup",
                        "/var/www",
                        "/etc/apache2",
                        "/etc/elasticsearch",
                        "/var/log",
                ],
                'location':'/var/backup/crawler',
        }
}
logFile="/var/backup/backupLog.log"

So the final tas to make the backup system work is to set the cron job. I prefer adding cron to /etc/crontab. You can also use crontab -e command to add the cron job.


30 * * * * root /var/backup/script.py >> /var/backup/cronLog.log 2>&1

Voila!!!