[TAG] Keeping indices of filesystems to handle backup archives

René Pfeiffer lynx at luchs.at
Wed Jul 11 20:06:59 MSD 2007


Hello!

So, here's my question about that index problem I mentioned in the
answer to Ben's question about backups.

Imagine you have two backups servers. Server A keeps a rather recent
copy of live servers. Server B tries to archive stuff from server A in
order to keep recent backups recent and to save space on server A. Of
course this means that server B keeps accumulating files and
directories. In order to avoid this one could think of a strategy of
deleting files according to a mathematical distribution. There's a tool
called fileprune which does just that
(http://www.spinellis.gr/sw/unix/fileprune/). I found it by browsing
through an old issue of ;login:. fileprune deletes data by using a
Gaussian, exponential oder Fibonacci distribution. The problem is that
fileprune needs to read the metadata of the entire tree into memory
before it can decide which files to delete. Backup storage may have
millions of files and directories.

I'd like to ask the filesystem directly "which files have an access time
of older than X" and get an answer. In the database world you have
indices for that. (Most) filesystems don't have such things (at least
not exported to userspace), so you would have to maintain one for
yourself. This could be done by the Linux kernel's Inotify API which
tells you what changes were done in a specific filesystem tree. I tried
it, it works, but I have no idea if I catch every modification when
rsync or other tools come along (I am going to test this with higher load
as soon as my load is lower).

Another way is to see whether existing filesystems have similar
functionality. I believe Reiser4 went into this direction. Yet another
way is to parse the filesystem tree seperately in order to maintain a
metadata index.

Do you have some more ideas besides writing a new filesystem?

Just being curious,
Ren?.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.linuxgazette.net/mailman/private/tag/attachments/20070711/681c7dcc/attachment.pgp 



More information about the TAG mailing list