Flashback: a Backup-strategy on Linux

Requirement

Everyone needs backups, because data-loss can be:

What I want from a backup is:

Portability.

I'll always want to have a copy with me.

Multiple independent copies.

Replication is the only solution to the variety of misfortunes to which flash-memory devices are prone … & not the kind that RAID offers; it merely gathers one's eggs into a larger basket. Scrambled Eggs

Loss of the physical flash-memory device:

  • Micro-SD cards are not only small, but (cf. USB thumb-drives) there's no facility to connect a fob.

  • Theft (particularly an SD-card left in the laptop which some Herbert decided to pinch). Burglar

Hardware-failure:

  • Voltage-spikes when connected (though hopefully you're using a surge-arrester).

  • Static electricity when disconnected.

  • Breakage (perhaps from leverage of a connected USB thumb-drive).

  • Repeated insertion (USB connectors are rated for a limited number of insertions).

File-system corruption:

  • Perhaps because it was removed before unmounting, or from power-failure.

  • Bit-rot (though over longer durations than the typical period between backups).

Having said that, they're actually remarkably robust, & can sometimes survive being processed by a washing-machine.

Robustness.

It may become damp, hot, or sat on by some slack-jawed oaf.

Cheapness.

I'll want to rotate several independent backup-devices into service.

Capacity.

It must have sufficient space to record an ever expanding volume of data.

Encryption.

Protecting one's data isn't just about preventing its loss; letting someone else read plain-text from the flash-memory device they found on a bus, may be worse.

Full Backup

For many years I have performed full backups from my Linux OS, using a Bash-script which:

Gravitational field

Even without explicit compression (encryption typically achieves this as a side-effect), each file (despite working in IT all my adult life) was still less than 1 GB (I don't consider third-party video-files, audio-files, or ebooks to be personal, & I don't take photographs with a resolution that only an owl might appreciate). As a result, each backup could be rapidly recorded & each storage-device could record several such files. Regrettably, my wife's almost Catholic belief that every file is sacred & that deleting one is murder, has resulted in each of her backup-files taking a significant portion of eternity to create & has its own gravitational field.

Cloud-storage / Backup-service Cloud

One could upload one's data to cloud-storage or a dedicated backup-service (which might additionally include file-versioning), but:

Incremental Backup

Some of my backups now take too long to create & the file-size is too large.

The solution is Incremental backup. One doesn't store multiple full backups on each storage-device, just one. This single copy is then repeatedly updated with any changes which have been applied to the master copy of one's data since the previous backup.

The next problem is how to determine which backup-files are outdated & must be updated, when file's time-stamp & contents are hidden in an encrypted archive. It would be very time-consuming to decrypt the whole archive, update those which were outdated, then re-encrypt.

The solution is to move the encryption of the archive down beneath the file-system. The file-system sits on a LUKS storage-volume. Once access to this encrypted storage-volume has been granted, each file's time-stamp & contents are exposed for comparison with master copies.

The original backup-strategy was non-destructive (each backup-file was an independent copy of the master) & one could retrieve old copies of specific files as required. Regrettably the incremental strategy is destructive (old versions of files are overwritten with the latest). Whilst one could in principle access an old version from one of the other backup-devices in the rotation, it doesn't give one many choices, so a more sophisticated strategy is required.

The solution is Btrfs Btrfs: a file-system with CoW, which can rapidly create snapshots of a file-system. Before each incremental backup, a read-only snapshot is taken of the incumbent backup, & time-stamped. Each snapshot only absorbs space when files in the incumbent backup change; so initially it takes approximately none.

Synchronisation

The process of synchronising the backup with the master copy is performed by rsync, which minimises the number of changes required. As files in the backup are overwritten, the snapshots absorb space in the file-system as they record the unaltered versions of those files. This is known as a "Reverse Incremental" backup.

Set-up

CAVEAT: almost every command requires superuser-privileges (hence the use of sudo) … mark seven times and cut once.


$ sudo cryptsetup --verbose --verify-passphrase --type=luks2 luksFormat /dev/device-name;	# Create a storage-volume on the device.1
$ sudo cryptsetup --verbose open --type=luks2 /dev/device-name decrypted;			# Define the passphrase for encryption & give the decrypted storage-volume a new device-name.2
$ sudo mkfs.btrfs --csum=blake2 --label=Backup /dev/mapper/decrypted;				# Create & label a file-system on the storage-volume, referencing the newly mapped device-name.3
$ mkdir mount-point;										# Create a mount-point of arbitrary path.
$ sudo mount /dev/mapper/decrypted mount-point;							# Mount the storage-volume.
$ sudo chown user:group mount-point;								# Whilst the filesystem was created by root, the owner can be less privileged.
$ chmod 0600 makefile;										# Permit only the owner to run make.4
$ mv makefile exclusions.txt mount-point/;							# Move downloaded files into the mounted filesystem. 5
$ sudo umount mount-point;									# Unmount the encrypted volume.
$ sudo cryptsetup close decrypted;								# Close the mapped device.

Use6


$ sudo cryptsetup --verbose open --type=luks2 /dev/device-name decrypted;	# Decrypt & map to a new device-name.2
$ sudo mount /dev/mapper/decrypted mount-point;					# Mount the decrypted device into the filesystem.7
$ cd mount-point/;
$ make backup scrub;								# Create a backup & optionally check the file-system.
$ cd -;										# Exit the mounted directory.
$ sudo umount mount-point;							# Disconnect the device from the filesystem.
$ sudo cryptsetup close decrypted;						# Re-encrypt the storage-volume.
Foot-notes

Foot-notes

    • I've selected a luks2 storage-volume, which provides more scope for features than the default luks storage-volume.

      N.B.: the luks2-header takes 2 MB.

    • CAVEAT: replace device-name with that to which one's flash-memory device was mapped by the OS, which can be observed by issuing sudo journalctl --follow; before connecting the storage-device.

      If the storage-device has been partitioned, one could alternatively create the storage-volume on one of the partitions.

    • CAVEAT: this will overwrite the incumbent file-system (or partition), & (obviously) any data stored in it.

  1. The decrypted LUKS storage-volume must be mapped to a new device-name so that it can be referenced.

  2. I've chosen a 256-bit checksum-algorithm rather than the default crc32c, to reduce the chance of hash-collisions.

  3. makefile:

    You'll need to edit the path defining your master directory, which defaults to "~/Documents/" (CAVEAT: which is user-dependent).

    rsync is used to implement the incremental backup of your master directory, & has been requested to:

    • Follow symlinks; which can be included under the master directory, to reference external files (e.g. /etc/fstab, /etc/hosts & ~/.bashrc) to be included in the backup, without polluting the makefile with user-specific requirements.

    • delete files from the backup which no longer exist in the master; though they will continue to exist in any snapshots.

      CAVEAT: if any user other than the owner were able to read the makefile, they could run make backup & obliterate the backup.

      This scenario is unlikely anyway, since the device can't be decrypted without the password, so I can't accidentally connect the wrong storage-device (assuming I don't reuse passwords) & destroy it.

    • preserve file-permissions.

    • exclude files matching any of the list of globs defined below.

  4. exclusions.txt:

    a list of globs, each defining a set of files to exclude from the backup (those which are temporary, volatile, or which can be regenerated), e.g.:

    Vim:

    swap-files.

    Thunderbird email-client:

    creates a symlink called "lock", with an unconventional target consisting of an IP-address & port-number … rsync doesn't like that.

    Chromium Browser:

    creates symlinks called "SingletonCookie" & "SingletonLock" with unconventional targets.

    Object-files:

    temporary files created during compilation.

  5. On the KDE Plasma5 desktop of my Linux-distribution, the "Removable Devices" widget (which pops-up from the system-tray when a device is connected), can perform the cryptsetup, mount, & umount commands; there's probably something similar on other desktop-environments.

  6. You might need to re-create the mount-point, if it was deleted after set-up.

The Easy Way.

If all the command-line stuff looks rather daunting …

Conclusion

The backup-strategy outlined above:

Enhancements

Automating the backup-process sounds convenient. One could schedule (perhaps using cron or a systemd-timer) a regular backup (perhaps in the small hours), then just forget it until required. Regrettably, there's a fly in the ointment: Foot-notes

The first issue doesn't have an obvious solution, so given that the advantages seem tenuous, I've not progressed it further.