Scrubbing my data - BTRFS
It's no secret I use BTRFS, so I have a fair amount of data stored on this filesystem. With most popular filesystems you have no way of knowing if your data is the same read as was originally written. A few modern filesystems support a function known as scrubbing.
Scrubbing is:
btrfs scrub is used to scrub a btrfs filesystem, which will read all data and metadata blocks from all devices and verify checksums. Automatically repair corrupted blocks if there’s a correct copy available.
On the most common filesystems1 data is written and read but never validated. Between writing and reading data, sometimes the data can get changed. This could occur due to a media fault or firmware bug or user error. (dd the wrong volume anyone?)
Before this technology was available, it was always a worry that my most precious stored family photos and videos could at any time become silently corrupted without my knowing!
Now thanks to BTRFS (or ZFS if your more into the BSDs) you never have to suffer not knowing about data corruption, and as a bonus you can prevent it too. (if you have mirrored volumes)
My BTRFS Volumes
I currently have the following volumes in BTRFS:
- root sd card of the Raspberry Pi 4 hosting this site 32 gigs
- 7Tb external spinning disk attached to the RPI4
- 256Gb ssd boot volume on my main desktop PC
- 2Tb ssd home volume on my main desktop PC
- 3Tb + 2Tb external spinning disks connected via USB3 to my main desktop PC
- 500Gb nvme luks encrypted volume on precision 5510
- 1Tb external luks encrypted backup 2.5" USB2 sometimes attached to Precision 5510
Backing them up
All of these systems have their root/home volumes backed up at least once via BTRFS send and in the case of the desktop PC, it also performs daily snapshots with restic to a cloud storage system.
RPI:
Daily snapshots sent to it's locally attached 7Tb volume
Occasionally I btrfs send
the 7Tb volume to the 3+2 spanned external disks on
the PC. I'm not too fussed if I loose the 7Tb volume. It is mainly backup and
some unimportant media.
PC:
I wrote a little bash script to btrfs snapshot and send incrementally backups every hour for the last 24 hours, day for the last month and monthly until it gets to a % full. These go to the 2+3 Tb external disks. Restic performs cloud backups
Precision:
Thee same bash script on a systemd timer, sends backups to my external disk when it's connected
Scrubbing it all
Not long ago I was running my home volumes on mirrored spinning disks. In the modern era these were beginning to feel slow and one drive started failing. BTRFS scrub would fix errors for me then:
You can see that a few errors were corrected because another copy of the block was available. The whole 800+Gb of data took an hour 44 to scrub. Compare that to today:
In this screenshot the bottom left terminal is a scrub of my new 2Tb ssd. Bottom right is 256Gb boot volume. Top left is a snapshot of a backup of the 7Tb volume on the 2+3Tb spanning disks. And top right is the 7Tb volume itself connected to the RPI4.
-
Ext2, Ext3, Ext4, XFS, NTFS, APFS, HFS+ ↩