The main Virtual Machine Server was seeing hardware failures and ZFS “scrub not zero bytes”. One of the (cheap) Hitachi Ultrastar 2TB disks was starting to fail, after only 1.5 years. Smartctl was showing 53 recent errors.
pool: vmstorage state: ONLINE scan: scrub repaired 1.75M in 5h12m with 0 errors on Tue Oct 1 07:12:22 2019 config: NAME STATE READ WRITE CKSUM vmstorage ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-Hitachi_HUA723020ALA641_YFG31Y3A-part2 ONLINE 0 0 0 ata-Hitachi_HUA723020ALA641_YFG4GJ8A-part2 ONLINE 0 0 0
/var/log/messages was showing stuff like this, over and over:
Sep 15 03:03:42 dellt3600 smartd[19459]: Device: /dev/sdc [SAT], 831 Currently unreadable (pending) sectors
Looking in /dev/disk/by-id/*, the failing drive has a serial number of YFG4GJ8A.
So, the setup for these commands became:
export DISK_GOOD=/dev/disk/by-id/ata-Hitachi_HUA723020ALA641_YFG31Y3A-part2 export DISK_BAD=/dev/disk/by-id/ata-Hitachi_HUA723020ALA641_YFG4GJ8A-part2 export DISK_REPLACE=/dev/disk/by-id/ata-Hitachi_HUA723020ALA641_YGJ0JSYA-part2
The command to remove the failing, but not yet failed, drive from the mirror:
zpool detach vmstorage $DISK_BAD
(At this point, I shutdown the machine, and had to swap disks since there was only room for two 3.5″ HDDs).
After reboot, the command to add the new disk into the mirror:
zpool attach vmstorage $DISK_GOOD $DISK_REPLACE
Resilvering 1.25TB took 4h56m.
Note: if you “pre-partition” your ZFS disks (like I do), then you also need the “root” disk to run parted:
export DISK_REPLACE_ROOT=/dev/disk/by-id/ata-Hitachi_HUA723020ALA641_YGJ0JSYA parted $DISK_REPLACE_ROOT
Use ‘unit s’ to create the partitions with exactly the same sector counts as the drive being replaced.
Just recording the replacement drive: $42 – HGST/Hitachi Ultrastar 7K3000 2TB 7200RPM Enterprise Grade Sata III For the record – the drive arrived new, with 0 hours power-on time. Vendor was DBSKY.