This document is intended to provide an example of how to setup a system to store large amounts of mostly write once / read often data with reasonably simple backup and recovery at a decent price.
There are many technologies in the storage space and the ease of use, cost, and reliability vary greatly. Here we'll attempt to articulate why these particular technologies were used versus others available.
MergerFS is a union filesystem which can make several drives look as if it's merged (or pooled) together as one. It also allows for policies to be applied to different filesystem functions which make it very nice for individuals storing large amounts of media.
- https://github.com/trapexit/mergerfs#faq
- Why not mhddfs?
mhddfs is no longer maintained and has some known stability and security issues (see below). MergerFS provides a superset of mhddfs' features and should offer the same or maybe better performance. - Why not aufs? While aufs can offer better peak performance mergerfs offers more configurability and is generally easier to use. mergerfs however doesn't offer the same overlay features (which tends to result in whiteout files being left around the underlying filesystems.)
- Why not LVM/ZFS/BTRFS/RAID0 drive concatenation/striping? With simple JBOD / drive concatenation / stripping / RAID0 a single drive failure will lead to full pool failure. mergerfs performs a similar behavior without the catastrophic failure and general lack of recovery. Drives can fail and all other data will continue to be accessable.
A proprietary cross-platform backup solution which offers unlimited backups, data deduplication, compression, and encryption via an easy to use GUI.
While the current consumer CrashPlan client (version 4.8.0) is Java based and with large collections of files can be CPU and memory heavy the price for unlimited, cross-platform backup makes it a very nice service. Especially given it can be used without Code42's cloud service so it's safe to use without a service contract. The downside is that it's completely proprietary and lacks features powerusers may want. Some of those flaws however can be worked around.
Let's assume we have the following hardware setup.
- 6 drives
- boot/os drive: /dev/sda
- data drives: /dev/sd{b,c,d,e}; labeled DATA-DRIVE{0,1,2,3}
- The sizes of the data drives are unimportant.
New drives should probably go through a burnin test prior to being used. Note however that such a test on modern large drives can take a very long time (days). If you are using preexisting drives with data already on them you can skip this step.
Read the storage device setup guide for specifics.
Neither MergerFS or CrashPlan care about the filesystem type. EXT4 is probably the best filesystem right now for this setup. Especially given many recovery tools understand EXT{2,3,4} filesystems. BTRFS is also a possibility though data recovery tools are very immature at the moment.
# mkdir /mnt/data{0,1,2,3}
# mkdir /storage
# <file system> <mount point> <type> <options> <dump> <pass>
LABEL=DATA-DRIVE0 /mnt/data0 auto defaults,nobootwait,errors=remount-ro 0 2
LABEL=DATA-DRIVE1 /mnt/data1 auto defaults,nobootwait,errors=remount-ro 0 2
LABEL=DATA-DRIVE2 /mnt/data2 auto defaults,nobootwait,errors=remount-ro 0 2
LABEL=DATA-DRIVE3 /mnt/data3 auto defaults,nobootwait,errors=remount-ro 0 2
Read the fstab setup guid for details on the options.
Add entry to /etc/fstab
/mnt/data* /storage fuse.mergerfs defaults,allow_other,direct_io,moveonenospc=true 0 0
Look at the mergerfs readme for other options and further details.
Follow the CrashPlan setup guide. Create a "backup set" for each type of data you wish to backup (phtoos, music, etc.) and for each source drive (/mnt/data{0,1,2,3}).