How I back up my data
For a long time, I didn’t have backups at all. Most of what I cared about was in the cloud already, on GitHub or various Google services. However, in the past eighteen months or so, I’ve begun to accumulate lots of files that I don’t really want to store in plaintext on Google Drive or the like. This, combined with a general reluctance to be reliant on Google services for my data backups, led me to delve into a more robust backup solution.
The general guidance for backups is to have three copies of important data, in two forms of storage, with one offsite copy. The solution I ended up with is closer to a 4-2-2 solution, with four copies of my backup repository, two forms of storage, and two offsite copies (on different cloud storage providers).
What I want
I have four main requirements for a backup solution:
- Ignore files, defined by a pattern
There are a lot of files that I don’t want to back up, and they’re scattered through my file system: build artifacts,
node_modulesfolders, etc. I can’t make a list of all of them, and I don’t want to have to manually exclude folders every time I start a new project. It is essential that whatever solution I use offer a
.gitignore-esque list of files to exclude from the backup. Without this feature, my backups will grow to multiple hundreds of gigabytes very quickly, given the size of Rust build artifacts.
- Back up to a local hard disk
In order to fulfill the 3-2-1 backup guideline, I need to back up to a local hard disk as well as a cloud service.
- Back up to a cloud service
To protect against a fire in my apartment or a similar catastrophe that destroys all of my storage in a given location, I need to back up to a cloud service as well as to a local hard disk.
- Provide snapshots or file history
Things change over time, and I want to be able to track that. This isn’t a hard requirement, but it’s definitely something I want.
The first stage of my research was looking at turnkey backup software that would provide a client, let me back everything up, and be generally hassle-free. I looked at several options, including:
Unfortunately, none of these services fulfilled all of my requirements. Most of them didn’t provide a way to exclude files from a backup. Some didn’t allow you to back up to a local hard disk. Very few provided snapshots. As a result, I had to roll my own solution.
Since none of the pre-built software met my requirements, I wanted to roll my own. Searching landed me at
rclone, a way to mirror files to remote repositories.
rclone fulfills all but one of my requirements: it doesn’t generate snapshots of a backup. I wrote a Python script that backed up all of the files I cared about to a combination of a local hard disk, a Backblaze B2 bucket, and a Google Drive folder. This worked well, but I still wanted snapshots.
Why two cloud storage providers?
I opted to go with both Backblaze B2 and Google Drive for some extra redundancy. I was already paying for a large amount of storage in Google One for my Pixel backups and Google Photos storage, so I figured it would be fine to store my backup repository there as well. However, I am extremely reluctant to host my only offsite backup on Google services, given their lack of customer support and algorithms that occasionally ban people automatically, with no recourse. As a second line of defense against this happening, I chose to back up to Backblaze B2 as well.
In order to get snapshotting, I needed another piece of software. After some research, I came across
restic, which seemed to provide everything I wanted. It would even encrypt the backups, so that any breach of my Google or Backblaze accounts doesn’t leak my data immediately.
I set up a
restic repository on my local hard drive, and modified my Python script to invoke
restic to back up my files. Once that was done, I used
rclone to sync the repository to a backup hard drive, Backblaze B2, and Google Drive. This worked really well! Backups were really fast, and
rclone synchronized the repository without any hiccups.
I don’t have the space to do this in my apartment right now, but in the future I want to set up a server with enough storage space to act as a NAS. Once that’s done, I can move the
restic repository off of my local hard drive and onto the NAS, and allow the NAS to synchronize things. This will make backups faster, at least on my local machine, and also move me away from using a single external hard drive as my local backup.
Another thing I might investigate is switching to Duplicacy instead of
restic. Duplicacy promises to be friendlier for backing up multiple computers, which isn’t something I really need to do right now - I’m only backing up my desktop. If I start doing more development on my laptop, it might be worth looking into.