File storage and backup for photographers

Eric Cheng Pictures
I spend way too much time thinking about data storage and backup. I've been a [professional photographer](/photo) for nearly 10 years, and have accumulated over 10 terabytes of pictures, video, and project data. I have finally implemented a storage and backup scheme that I'm happy with. It took a long time to set up, but I have direct access to all of my media now, and have comfort in knowing that it is securely backed up.

### Can I back up to the cloud?

A lot of normal people (non-photographers) are starting to store their pictures exclusively on the cloud, and while there are some great cloud storage services out there that cater to photographers, none of them are really suitable for storing or backing up multiple terabytes of data. Also, uploading to the cloud is slow. A fairly-fast DSL or Cable connection will probably allow you to sustain upload speeds of 200KB/s (I'm being generous). At that speed, uploading 1TB takes about 2 months. Uploading 10TB would take nearly 2 years, and a mainstream ISP will likely throttle you before allowing you to use that much data. So cloud backup is out.

### Simplicity

Over the years, I've had crazy backup schemes involving multiple computers, multiple software products and services, and custom scripts, all requiring coordinated (but automated) execution to keep my data safe. All of these schemes required that I create flow charts to track how data moved during backups; without that documentation, I might have forgotten how things work, over time.

I'm sick of these crazy schemes, and have finally settled on something much more simple.

### Kinds of data

As someone who collects pictures and video files, I think of my data as living in two different categories. I've had to simplify my thinking a lot in order categorize data this way, but that's OK. It's got to be rough to be simple.

1. **Working data:** all system files, applications, documents, email, and project data (including temporary or intermediate files and final output files) 2. **Raw data:** pictures, video, audio and other media generated by cameras or capture devices

The main difference between the two is that Raw data doesn't really change after it is captured. After a single photography trip, I might have 300GB of pictures and video, which I consider to be Raw data. I may then create 50GB of additional data in project data over time (e.g., slideshows, produced videos, edited pictures saved as TIFs). I consider all of this to be Working data. Even if it doesn't change in a long time, I may decide—at any time—to re-open and tweak a project, which will result in a need to back up again. It also means that I might accidentally screw up a project, so saving multiple versions of Working data is desirable.

### Backup requirements

- **Working data** should be continuously, incrementally backed up in a versioned manner so I can roll back to a prior state for any given file

- **Raw data** should be backed up in a versioned manner as well, but doesn't need continuous backup. I can kick this off manually, but need to have the discipline to do so regularly.

All data also needs to be stored offsite, so I don't lose everything is there is a fire or flood.

### So here's how I've implemented backup:

My main machine is a mid-2010 Mac Pro. Inside, I have:

- 1 x 500GB Samsung 840 Series SSD (boot, applications, fast data) - 4 x [Seagate Barracuda 4TB 7200RPM 3.5" drives]( in a RAID 0 stripe (16GB volume)

Mac Pro drive configuration

I am currently using 10TB of the 16TB available space, which gives me 6TB of growing room. 6TB should last a long time... unless I suddenly get a RED camera and start shooting RAW video. :) About 9TB of this data is picture / video data (Raw data), and 1TB is Working data.

I connect a [Sans Digital TowerRAID TR5UT+B](, which is a 5-bay, USB 3.0/eSATA box that features hardware RAID. The box has 5 x 3TB Seagate Barracuda 3TB drives in it configured as a concatenated array[^1] (15TB volume). Accessed over a single eSATA port (port-multipled), this setup sustains around 90 MB/s, but when using something like rsync, I see transfer speeds between 30-70 MB/s. You can also configure the box to use RAID 0[^2] or RAID 5[^3], if you so desire.

[^1]: In theory, a concatenated array, which the box supports via switches, results in the loss of only a single drive's worth of data if a drive fails. In practice, I've never had to deal with a failure in this kind of array, so I'm just guessing.

[^2]: A RAID 0 stripe is an option as well; I see 130 MB/s from a RAID 0 stripe over a single eSATA port, and real-world rsync speeds of 80 MB/s. This is much faster than using a concatenated array, but you lose the entire set if a drive fails instead of losing only a single drive's worth of data.

[^3]: You should think of RAID 5 as a way to not lose your data if 1 drive fails, but I [wouldn't assume that you can rebuild the RAID successfully]( if you have a lot of data. If a drive fails, copy the data off as soon as possible and start over. Considering that RAID 5 performance degrades a lot once a drive fails (by up to 80%, according to stuff I've read on the internet), this may take a long, long time. In my opinion, it's to assume the entire volume is toast when a single drive fails, so multiple backups are necessary. I much prefer newer, proprietary RAID implementations like [Synology Hybrid RAID](, which are dynamically expandable and allows for 2-drive redundancy.

**For the 1TB of Working data, I use [Crashplan](** for incremental backups to two locations:

1. a Mac Mini, which has a 3TB drive attached to it via USB 3.0 (backup set includes entire boot drive, as well as Working data) 2. Crashplan Cloud (Working data only; no system or applications, nor Raw data). The initial backup seed is still in progress: the Crashplan app tells me it will take 5 months to upload 1TB, so I will likely mail in a drive to seed the backup (a service they offer).

**I backup the 9TB of Raw data to the TowerRAID using a custom rsync script** that supports incremental snapshots (a modified copy of [Mike Rubel's script]( It took about 40 hours to do the initial backup (9.3TB over 40 hours is an average of 64.5MB/s), but successive backups take less than an hour. I keep 4 daily snapshots, 3 weekly snapshots, and 3 monthly snapshots. I may add a semi-yearly or yearly snapshot as well. For those of you who are more technical, the script uses hard links for files that have not changed, which means that I can effectively copy those files to a snapshot without using any additional drive space. Only files that have changed are actually copied to the backup during each incremental backup process.

I actually backup my entire computer, including both Working and Raw data, to the TowerRAID. Why not? I have the space to do so, and it doesn't take that much more time.

### Why snapshots?

Why use a crazy snapshot script to version files instead of just cloning a drive using SuperDuper!? Recently, two of my photographer friends discovered that they had some corrupted pictures. Both their master and backups were corrupted because once the master copy was corrupted, future backups were also corrupted. Luckily, both of them had very old backups that they used to restore good versions of the files. With versioned backups, the backup will notice that the file is different (potentially, corrupted) and make a new version. It keeps the old version so you can always go back.

### Other notes:

1. For much of my active data, I work out of [Dropbox](, which is a fantastic cloud sync service. All data in Dropbox is instantly backed up, versioned, and accessible to any device. It works very well, and nearly everyone I know uses the service. 2. I use [SuperDuper!]( to maintain a bootable clone of my machine's boot disk. If the drive fails, I want to be able to boot up and be productive immediately. I do this every once in awhile, but am not too rigorous about doing it frequently. If you're a Windows person, try [Acronis True Image](, instead. 3. I actually have two of the TowerRAID boxes, each with 5 x 3TB drives installed. One is configured as a concatenated array (as described above), and the other, as a RAID 0 stripe. One is stored offsite, and the other lives at home. I backup regularly to the box at home, and periodically swap it out with the one that is stored offsite.

There is a full list of all of the hardware referred to in this article [over at my page]( Full disclosure: I get referral fees for many of the items on that page. Feel free to click through from there if you'd like to, but don't feel obligated to do so.

Backups in the field are another topic, which I'll write about at a later date.

What do you use to backup your data? I'm very interested in how other photographers—or people with large data sets—keep their data secure.