fs-mirror.py

This is a little script in python that mirrors in near-real time given directories asynchronously

How this works

Script watches changes on some directories and copies each modified file to other directory.

Invoke it like this:

python fs-mirror.py /tmp/mirrored /home/quake/important /home/quake/projects

every NEW or newly written (modified) file in /home/quake/important or /home/quake/projects is copied to /tmp/mirrored/home/quake/important or /tmp/mirrored/home/projects respectively. Directory structure is saved.

Simple example:

What happens on /home/quake/… What's done by fs-mirror
/home/quake/important/new-file is created /home/quake/important/new-file is copied to /tmp/mirrored/home/quake/important/new-file
/home/quake/important/new-directory/ is created Nothing happens (and there is no /tmp/mirrored/home/quake/new-directory directory)
/home/quake/important/new-directory/some-file is created /tmp/mirrored/home/quake/new-directory directory is created and /home/quake/important/new-directory/some-file is copied to /tmp/mirrored/home/quake/new-directory/some-file
/home/quake/important/new-directory/some-file is modified /home/quake/important/new-directory/some-file is copied to /tmp/mirrored/home/quake/new-directory/some-file

What it needs

In order to let it work you must install pyinotify (http://pyinotify.sourceforge.net/) and have a quite recent Linux kernel.

Limitations

  • only files are copied over the directories (making dir does not trigger making dir on the mirror)
  • file removing and renaming events is not watched (there may be some mirrored files even though the original files does not exist)
    • if you create a file, then delete it and then create a directory with the same name, this is going to be completely ignored when coping files, because there will still exist a file with the name of the directory
  • some rat-race can happen
    • after creating directory in watched directory there is some time before we set up a watch on this new one. If you create a file exactly then, this will not be caught
    • if you modify a file really quickly (twice), two copy workers can get two orders to copy the same file and copy it in the reversed order of adding to files to copy queue
  • not all files might be copied (inotify does not guarantee delivering all events for sure. It's almost 100% though)
  • small changes to big files (like logs) triggers copying the file over and over

Coolness

  • it's small and quite reliable (and readable)
  • non-obtrusive (if you want a smart mirroring this does not require you to modify anything in your applications)
  • easy to extend (to remove some limitations)
  • easy to change number of processes copying files
  • can work well with sshfs to backup your important files in near real-time, but still asynchronously
  • you can rsync the dirs from time to time to eliminate small differences that may happen due to some of the limitation of software and inotify architecture

THE CODE

import os
from sys import argv
from pyinotify import WatchManager, Notifier, EventsCodes, ProcessEvent
from shutil import copyfile
from Queue import Queue
from threading import Thread

NUM_COPY_THREADS = 4
WATCHED_DIRS = argv[2:]
MIRROR_ROOT = argv[1]

print "Preparing to go"

filesToCopy = Queue()

# how to copy files
def copy(src):
    trg = MIRROR_ROOT + src;
    try:
        os.makedirs(os.path.dirname(trg))
    except OSError:
        pass
    copyfile(src, trg)

# starting copying threads
def CopyFileWorker():
    while True:
        file = filesToCopy.get()
        copy(file)
#        filesToCopy.task_done()
# not available in Python 2.4

for i in range(NUM_COPY_THREADS):
    t = Thread(target=CopyFileWorker)
    t.setDaemon(True)
    t.start()

# event dispatcher
class MirrordProcessEvent(ProcessEvent):
    def process_IN_CLOSE_WRITE(self, event):
        filesToCopy.put(os.path.join(event.path, event.name))

    def process_IN_CREATE(self, event):
        if event.is_dir:
            dir = os.path.join(event.path, event.name)
            self.wm.add_watch(dir, self.wm.mymask)

# pyinotify magic
wm = WatchManager()
wm.mymask = EventsCodes.IN_CLOSE_WRITE | EventsCodes.IN_CREATE
wpe = MirrordProcessEvent()

notifier = Notifier(wm, wpe)
wdd = wm.add_watch(WATCHED_DIRS, wm.mymask, rec=True)
wpe.wm = wm

print "Ready"

while True:
    try:
        notifier.process_events()
        if notifier.check_events():
            notifier.read_events()
    except KeyboardInterrupt:
        break

notifier.stop()
#filesToCopy.join()
# not available in Python 2.4
Add a New Comment
or Sign in as Wikidot user
(will not be published)
- +
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License