bitrot, checksum, parity, safety
For a few years now I’ve been checksumming important data (photos, music), in order to see if and when bits flip. 1kb/1TB/months, if we can believe a recent study at CERN. I indeed quite regularly observe errors, probably more than at CERN due to me not having ECC memory and consumer-rated storage.
I’ve been seeing what the options are for going further than mere checksumming without requiring multi-drive storage systems with ECC ram (because that is simply not workable in laptops). So, ZFS isn’t an option, although I just discovered you can specify a number of copies per zpool, therefore on a single drive as well, and autohealing does take advantage of this. Bcachefs, a new filesystem developed for flash drives, apparently has correction code too as of last autumn, although I was not able to see if it is already available (Linux only it seems).
I’ve found a utility called rsbep, which can produce Reed-Solomon metadata, and even has a
poorzfs.py tool to mount a directory and handle the creation and error detection for you. The author does admit its quite hacky, and not too fast, so I’ve been hesitant to try (and I suspect
poorzfs.py does not work in Windows). The advantage is the overhead is low compared to keeping a second copy on the same drive around (say, 5% for almost 5% redundancy). I should add I am aware of par(2) and parchive tools, but those appear to only work on the level of single, flat directories, while I’m dealing with not so flat musical and photographical archives.
Salvation is near though, at least temporary until I’ve upgraded to Ubuntu 16.04 and will sacrifice a drive and go with the ZFS option. A Windows utility, apparently from the nineties, ICE ECC does what I need: generating correction metadata for an arbitrary file hierarchy. Armed with a hexeditor, I have observed flawless detection of some flipped bits, as well as resiliency against bitflips in the correction file itself (by default it has 400-500% header redundancy). So I am checksumming as we speak. Next test is to see how well the tool works in Linux/Wine.
It’s nice to have a filesystem independent tool, hopefully OS indepent too if it works in Wine. That way I can just keep the metadata around, and migrate to any future drive, filesystem, operating system (provided Wine will help out). Not bad, until such functionality becomes standard on filesystems everywhere. It seems such a cheap and obvious way to protect data.