Flashback: a Backup-strategy on Linux
Requirement
Everyone needs backups, because data-loss can be:
-
expensive (if it included crypto-currency).
-
annoying & a liability (if it included forgotten passwords).
-
irreplaceable (if it included photographs), & embarrassing if any were taken from the inside of one's trouser-department.
What I want from a backup is:
-
Portability.
-
I'll always want to have a copy with me.
-
Multiple independent copies.
-
Replication is the only solution to the variety of misfortunes to which flash-memory devices are prone … & not the kind that RAID offers; it merely gathers one's eggs into a larger basket.
-
Loss of the physical flash-memory device:
-
Hardware-failure:
-
-
Voltage-spikes when connected (though hopefully you're using a surge-arrester).
-
Static electricity when disconnected.
-
Breakage (perhaps from leverage of a connected USB thumb-drive).
-
Repeated insertion (USB connectors are rated for a limited number of insertions).
-
-
File-system corruption:
-
-
Perhaps because it was removed before unmounting, or from power-failure.
-
Bit-rot (though over longer durations than the typical period between backups).
-
Having said that, they're actually remarkably robust, & can sometimes survive being processed by a washing-machine.
-
-
Robustness.
-
It may become damp, hot, or sat on by some slack-jawed oaf.
-
Cheapness.
-
I'll want to rotate several independent backup-devices into service.
-
Capacity.
-
It must have sufficient space to record an ever expanding volume of data.
-
Encryption.
-
Protecting one's data isn't just about preventing its loss; letting someone else read plain-text from the flash-memory device they found on a bus, may be worse.
Full Backup
For many years I have performed full backups from my Linux OS, using a Bash-script which:
-
used tar to archive a full copy of the personal files under my home-directory,
-
used gpg to encrypt the archive,
-
recorded the encrypted archive under a filename which included a time-stamp,
-
stored these backups on removable flash-memory devices which were rotated about every week.

Even without explicit compression (encryption typically achieves this as a side-effect), each file (despite working in IT all my adult life) was still less than 1 GB (I don't consider third-party video-files, audio-files, or ebooks to be personal, & I don't take photographs with a resolution that only an owl might appreciate). As a result, each backup could be rapidly recorded & each storage-device could record several such files. Regrettably, my wife's almost Catholic belief that every file is sacred & that deleting one is murder, has resulted in each of her backup-files taking a significant portion of eternity to create & has its own gravitational field.
Cloud-storage / Backup-service
One could upload one's data to cloud-storage or a dedicated backup-service (which might additionally include file-versioning), but:
-
It's going to be an ongoing expense.
-
You're going to need a zero-knowledge service because your backup contains sensitive data & the only person you can trust with it … is yourself.
-
If it's a dedicated backup-service, then there'll probably be some proprietary s/w to perform the backup-process, but this may not support Linux.
Since uploading via ADSL is much slower than downloading, then for a general purpose cloud-storage, your also going to want an ssh-daemon & rsync-daemon to be available to minimise the uploaded data.
Incremental Backup
-
Some of my backups now take too long to create & the file-size is too large.
-
The solution is Incremental backup. One doesn't store multiple full backups on each storage-device, just one. This single copy is then repeatedly updated with any changes which have been applied to the master copy of one's data since the previous backup.
-
The next problem is how to determine which backup-files are outdated & must be updated, when file's time-stamp & contents are hidden in an encrypted archive. It would be very time-consuming to decrypt the whole archive, update those which were outdated, then re-encrypt.
-
The solution is to move the encryption of the archive down beneath the file-system. The file-system sits on a LUKS storage-volume. Once access to this encrypted storage-volume has been granted, each file's time-stamp & contents are exposed for comparison with master copies.
-
The original backup-strategy was non-destructive (each backup-file was an independent copy of the master) & one could retrieve old copies of specific files as required. Regrettably the incremental strategy is destructive (old versions of files are overwritten with the latest). Whilst one could in principle access an old version from one of the other backup-devices in the rotation, it doesn't give one many choices, so a more sophisticated strategy is required.
-
The solution is Btrfs
, which can rapidly create snapshots of a file-system. Before each incremental backup, a read-only snapshot is taken of the incumbent backup, & time-stamped. Each snapshot only absorbs space when files in the incumbent backup change; so initially it takes approximately none.
-
Synchronisation
-
The process of synchronising the backup with the master copy is performed by rsync, which minimises the number of changes required. As files in the backup are overwritten, the snapshots absorb space in the file-system as they record the unaltered versions of those files. This is known as a "Reverse Incremental" backup.
Set-up
CAVEAT: almost every command requires superuser-privileges (hence the use of sudo) …
mark seven times and cut once
.
$ sudo cryptsetup --verbose --verify-passphrase --type=luks2 luksFormat /dev/device-name; # Create a storage-volume on the device.1
$ sudo cryptsetup --verbose open --type=luks2 /dev/device-name decrypted; # Define the passphrase for encryption & give the decrypted storage-volume a new device-name.2
$ sudo mkfs.btrfs --csum=blake2 --label=Backup /dev/mapper/decrypted; # Create & label a file-system on the storage-volume, referencing the newly mapped device-name.3
$ mkdir mount-point; # Create a mount-point of arbitrary path.
$ sudo mount /dev/mapper/decrypted mount-point; # Mount the storage-volume.
$ sudo chown user:group mount-point; # Whilst the filesystem was created by root, the owner can be less privileged.
$ chmod 0600 makefile; # Permit only the owner to run make.4
$ mv makefile exclusions.txt mount-point/; # Move downloaded files into the mounted filesystem. 5
$ sudo umount mount-point; # Unmount the encrypted volume.
$ sudo cryptsetup close decrypted; # Close the mapped device.
Use6
$ sudo cryptsetup --verbose open --type=luks2 /dev/device-name decrypted; # Decrypt & map to a new device-name.2
$ sudo mount /dev/mapper/decrypted mount-point; # Mount the decrypted device into the filesystem.7
$ cd mount-point/;
$ make backup scrub; # Create a backup & optionally check the file-system.
$ cd -; # Exit the mounted directory.
$ sudo umount mount-point; # Disconnect the device from the filesystem.
$ sudo cryptsetup close decrypted; # Re-encrypt the storage-volume.

Foot-notes
-
-
I've selected a luks2 storage-volume, which provides more scope for features than the default luks storage-volume.
N.B.: the luks2-header takes 2 MB.
-
CAVEAT: replace device-name with that to which one's flash-memory device was mapped by the OS, which can be observed by issuing
sudo journalctl --follow;
before connecting the storage-device.If the storage-device has been partitioned, one could alternatively create the storage-volume on one of the partitions.
-
CAVEAT: this will overwrite the incumbent file-system (or partition), & (obviously) any data stored in it.
-
-
The decrypted LUKS storage-volume must be mapped to a new device-name so that it can be referenced.
-
I've chosen a 256-bit checksum-algorithm rather than the default crc32c, to reduce the chance of hash-collisions.
-
-
makefile:
-
You'll need to edit the path defining your master directory, which defaults to "~/Documents/" (CAVEAT: which is user-dependent).
rsync is used to implement the incremental backup of your master directory, & has been requested to:
-
Follow symlinks; which can be included under the master directory, to reference external files (e.g. /etc/fstab, /etc/hosts & ~/.bashrc) to be included in the backup, without polluting the makefile with user-specific requirements.
-
delete files from the backup which no longer exist in the master; though they will continue to exist in any snapshots.
CAVEAT: if any user other than the owner were able to read the makefile, they could run
make backup
& obliterate the backup.This scenario is unlikely anyway, since the device can't be decrypted without the password, so I can't accidentally connect the wrong storage-device (assuming I don't reuse passwords) & destroy it.
-
preserve file-permissions.
-
exclude files matching any of the list of globs defined below.
-
-
-
-
exclusions.txt:
-
a list of globs, each defining a set of files to exclude from the backup (those which are temporary, volatile, or which can be regenerated), e.g.:
-
Vim:
-
swap-files.
-
Thunderbird email-client:
-
creates a symlink called "lock", with an unconventional target consisting of an IP-address & port-number … rsync doesn't like that.
-
Chromium Browser:
-
creates symlinks called "SingletonCookie" & "SingletonLock" with unconventional targets.
-
temporary files created during compilation.
-
-
-
On the KDE Plasma5 desktop of my Linux-distribution, the "Removable Devices" widget (which pops-up from the system-tray when a device is connected), can perform the cryptsetup, mount, & umount commands; there's probably something similar on other desktop-environments.
-
You might need to re-create the mount-point, if it was deleted after set-up.
The Easy Way.
If all the command-line stuff looks rather daunting …
-
Download the Flashback-project, which includes:
-
formatForBackup.bash
-
a Bash-script which performs the formatting of the storage-device (a more rigorous version of the above code).
-
A makefile:
-
which performs the backup.
-
A sample "exclusions.txt":
-
referenced by the makefile, to determine the set of files to exclude from the backup.
-
-
Run formatForBackup.bash (as your normal unprivileged user-id), to format the storage-device.
-
Disconnect the storage-device, reconnected it, & accept the OS' offer to mount the storage-device, which will require the encryption-passphrase.
-
Ideally
chmod 'go-r' makefile;
. -
Move the makefile & "exclusions.txt" into the mounted directory.
-
Move your current working directory to the mounted directory.
-
make backup;
-
Exit the mounted directory & ask the OS to unmount the storage-device.
Conclusion
The backup-strategy outlined above:
-
facilitates the rapid creation of backups, by means of an incremental strategy utilising rsync, which deletes files from the backup which no longer exist in the master copy & preserves file-permissions.
-
provides the simplicity of a full backup by using space-efficient Btrfs-snapshots.
-
can be checksummed by Btrfs.
-
is symmetrically encrypted by LUKS.
Enhancements
Automating the backup-process sounds convenient.
One could schedule (perhaps using cron or a systemd-timer) a regular backup (perhaps in the small hours),
then just forget it until required.
Regrettably, there's a fly in the ointment:
-
You must still physically rotate several flash-memory devices into service.
N.B.: When a storage-device is rotated back into service, it may contain relatively old versions of your data & synchronisation will be more lengthy.
-
You must still ensure the process completes without error; perhaps via an emailed report.
-
You need to automatically supply the passphrase; perhaps by adding a key-file (see
cryptsetup luksAddKey;
) in your filesystem as an alternative to the passphrase designed for interactive use.
The first issue doesn't have an obvious solution, so given that the advantages seem tenuous, I've not progressed it further.