There is no glory in backups. I regularly work across six or seven different systems. I hate bringing a hard drive or USB flash drive around. Most of the time the systems I work on are remotely housed around the world.
So how do I get all my files in sync?
In the past, I used to rsync files to and from a central server. This is great in theory, but too many times that server goes down for one reason or another. I also forget the progress of syncing things up and down, which leads me to always wonder what parts of the file system are the newest.
The solution might have been NFS, but too many times things go stale and still involve a central server. It does not offer a secure transfer mode out of the box.
The solution might have been Dropbox. Dropbox was close, but Dropbox wasn’t encrypted at rest, many companies that host blocked it, and I found that the Linux daemon choked on large directory moves. It didn’t work out for micro servers because there just isn’t enough space.
S3 is promising. Amazon backed storage. You pay only for what you use. Encryption in transit and encryption at rest. The problem is that it just doesn’t feel like a unix file system. Also, commands I have used for the past decades no longer work in the same way.
The solution turns out to be a FUSE virtual file system for S3. There are open source products of note: s3fs and YAS3FS.
Both are written in Python using the boto library. The major difference between the two is that YAS3FS caches files and metadata locally and s3fs calls the Amazon S3 servers for each request.
I’ve been using yas3fs for the past two months. It feels like a real filesystem. It will notify other servers subscribed (via YAS3FS) about changes to the S3 bucket to invalidate the cache via the Amazon’s SQS service:
As with most open source, it is always 95% there. I’ve had to fork it to fix and add a few features — i.e. encryption, failover plugins, time functions, timing issues. Most importantly, I added some functionality to make sure rsync runs happily.
There are of course unsolvable issues. S3 does not provide a way to append data to an object. Instead, files need to be completely re-uploaded.
Since I made my own dog food, I should eat it. Now, any file I edit will trickle down to every other machine I use. The data is all held in a redundant system. The data is encrypted at rest and in transit. And everything feels like the I am still on a Linux system from 1996.
POSTSCRIPT: While I was traveling recently, my credit card got locked out for a seemingly fraudulent charge. I called my bank to resolve it when I returned. It happened to be a $0.25 charge from Amazon Web Services, for using the file system…
1 thought on “Backing up S3 with YAS3FS”
What changes did you need to make rsync happy?
Comments are closed.