It goes without saying that data backup is important. I think most people are aware of this in some way, regardless of whether they actually do it or not. With the advent of Dropbox, Google Drive et al., backup and folder syncing have become synonymous for many. For the latter, I think its key advantage is that it happens automatically — as long as a user develops the habit of putting important files into the right folder, they can trust that the helper app for whatever service they use will dispatch it to the cloud.
Sync vs Backup
For system-wide backups, folder-syncing services start to become a little unwieldy. It's impractical to try putting every file you want backed up into one giant folder (especially if your service of choice pretends symlinks don't exist, Google Drive…). Here, something like Time Machine on macOS, or services like BackBlaze and CrashPlan are much better-suited to the task. These are all easy to set up, but come with disadvantages. For Time Machine, you don't generally get an offsite backup. Additionally, if used it with a hard drive that isn't always connected, it's not nearly as automatic as backups need to be. Online backup services alleviate these issues, mostly. Depending on your upload speeds, though, there might be an uncomfortable delay in large files getting backed up. This is especially true for laptops, or computers that aren't left on for extended periods of time.
For my own needs, and peace of mind, I've settled on a combination of software and services: BackBlaze, Time Machine, Carbon Copy Cloner, and Resilio Sync, all running on a 'Hackintosh' server with ~10.5TB of storage. I'll go into more detail on why I use each of these below1, but first my requirements.
I have a pretty diverse set of file management needs, which is why I've given my backup solution this much thought.
- I manage a lot of video files, from short film shoots etc. with QuothMe. Since any loss would be catastrophic, these need to be backed up asap, faster than BackBlaze can feasibly do with my internet connection.2 I need to be able to access these files on my MacBook Pro, but I don't want to keep them on it.
- For my PhD work and software development, I have large quantities of small files that live on my laptop, and that change frequently. I want this stuff backed up in as close to real time as possible, since losing even a few hours' work would be a pain.
- There's also the usual stuff like my personal photo and video library, which I'd like to keep on my laptop and also be accessible from my iPad and phone.3 This doesn't expand too rapidly, most of the time.
- I also keep my girlfriend's laptop (a MacBook Air) backed up, though fortunately her data mostly comprises text documents and photos.
Services I use, and ones I don't
I'll start with a quick summary of the services I do use:
- BackBlaze is incredibly good value, and works very well. With a couple of tweaks to the file exemptions list, it backs up everything I need, including time machine disk images.
- Resilio Sync is a fantastic piece of software. It's basically like a personal Dropbox service, but with the ability to sync arbitrary folders in-place. There are no direct limitations on storage space, but the caveat here is that you need to supply your own devices to sync to; there's no server provided by default. It also has a great mobile app, which is very useful as long as you always have a machine running somewhere.
- Time Machine has the obvious advantage of being a first-party service, and its hourly backups and support for network drives makes it pretty perfect for my needs.
- Carbon Copy Cloner is a great, very reliable bit of software. It can mirror drives and folders to other drives or disk images, as well as schedule tasks. Its 'SafetyNet' feature is also great for peace of mind.
And why I don't use some others:
- Dropbox has plenty of issues. Check out Marco Arment's Twitter feed, or listen to ATP, and it won't take too long to hear one of the many reasons it's sketchy. That aside, I feel like it's just not flexible enough for my current setup anyway.
- Google Drive's main issue for me has been its CPU usage. For example, the Wi-Fi at my university blocks certain ports, so the Google Drive app can't connect. GD reacts to this by pegging a CPU core at 100% usage indefinitely, with no warning other than heating up my laptop and quickly draining its battery. Not the kind of thing that's nice to notice midway through giving a lecture - Resilio Sync doesn't work on this Wi-Fi network either, for what it's worth, but at least it just stops doing anything rather than actively causing problems.
The central component of my setup is a Skylake-based Hackintosh running macOS Server, that sits in a cupboard, running 24/7. It's basically the lowest-spec Hackintosh I could build with modern components, running 8GB of RAM, an i3 Skylake CPU, two 5TB hard drives, a 500GB SSD, and a MicroATX case. I basically wanted the equivalent of a Mac Mini with current-gen, low-power components, in a case that can hold a few 3.5-inch hard drives internally. It's still running El Capitan (I built it mid-2016 and haven't updated to Sierra), but it's been extremely reliable; it hasn't crashed once.
There are a few reasons I wanted the central component of my setup to run macOS (vs Linux, proabably Ubuntu, as a second choice). I wanted to be able to run BackBlaze on it, easily set up Time Machine and Resilio Sync, and also to easily and regularly run a few macOS utilities and scripts I've written.
Finally it's time to actually summarise what goes on, software-wise:
- Resilio synchronises any folders that contain rapidly changing files, such as my software projects, to the SSD on my server. The SSD in this case, to avoid spinning up the hard drives constantly.
- I also use Resilio to make available any folders which I'd like to be able to access remotely on my iPad or phone. For example, my raw photo library, documents folder, and folders containing final renders of video projects.
- Time Machine backs up my laptop, and my girlfriends', hourly to the one of the 5TB hard drives in the server.
- BackBlaze runs constantly, backing up the SSD and that same 5TB hard drive.
- Finally, Carbon Copy Cloner runs a scheduled job every day at 4am, to mirror the first 5TB drive onto the other (with SafetyNet on, so nothing is really getting deleted while space permits).
- I sometimes4 also run this job manually when I've done a large video import, to make sure there's a backup of that data as soon as possible. I import those video projects directly into a folder on the server.
- The SSD and first 5TB drive of the server drives are also shared on the local network via SMB, so I can access video projects etc. without having to actually copy the files to my laptop.
It might seem a little complex, or over-engineered, but this combination of things means that all of my data is backed up at least twice, onsite and offsite. It also means all my data is available anywhere, whenever I need it, and that the most important things continue to get backed up whenever I'm on another network and Time Machine isn't available.5
This setup has been working really well for me, for almost a year now. I regularly check things to make sure everything is running as planned, and every so often I pull down random files from BackBlaze, or my backup drive, and check that all is well. I think the key aspect of this, and any effective backup setup, is that it all happens automatically. Once the planning and initial setup was complete, I haven't needed to intervene, nor have I had to actually trigger any backups. If you have a backup that requires plugging something in, or pressing a button, or anything that isn't just happening automatically, it's probably not as effective as you would want it to be in a real emergency.
- I do also use iCloud for a few things, but it's mostly redundant and just makes accessing some things easier across iOS and macOS. ^
- Currently 75mbit down, 20mbit up. ^
- I use iCloud photo library for jpeg copies of all my photos, but here I'm referring to the raw files. ^
- If I have to wipe the source memory cards for re-use. ^
- Since Resilio seems to break on my university's Wi-Fi, I usually just tether to my phone instead. ^