Monitoring the Health of ZFS Volumes on MacOS

I have a little Plex Media server that I share with friends (obviously all legitimate content). This runs on a ZFS partition across four drives housed in an eSATA enclosure.

I chose this Mediasonic non-RAID enclosure so I could mount it as a local device and use ZFS. MacOS has some filesystem change notification hooks that don’t work on NAS. This means you don’t have to use polling to detect changes.

RAID is only really useful if you swap out drives while there’s redundancy, so I’ve created two Launch Agents to regularly check the status of this partition and notify me, by PushOver, of anomalous events.

Actually Checking ZFS Health

In order to check the health of a ZFS partition you need to run two commands somewhat regularly.

The first is zpool scrub:

$ sudo zpool scrub tank

Obviously, replace tank with your own pool name.

This command initiates a “scrub” to check the health of the filesystem. It’s something like fsck for zfs.

Scrubbing is an ultra-low-priority operation that often takes many hours and a couple of days to complete. It only very slightly degrades pool performance, but should be run at least monthly — I run mine once per week.

Then you need to run the following, perhaps daily:

$ zpool status -v

This shows you the current filesystem and volume status, including the status and speed of any running scrub operations.

The usual output will look something like this:

  pool: tank
 state: ONLINE
  scan: scrub in progress since Sun Feb 10 10:41:31 2019
	155G scanned out of 5.01T at 15.5M/s, 90h57m to go
    0 repaired, 3.02% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    media-C1C20B59308F  ONLINE       0     0     0
	    media-D764718EC5F2  ONLINE       0     0     0
	  mirror-1                                      ONLINE       0     0     0
	    media-FE8414724FF3  ONLINE       0     0     0
	    media-FC6D18EAB7CA  ONLINE       0     0     0

Volumes that haven’t reported any problems (including SMART errors) will show up as ONLINE. Those that need attention will be DEGRADED, UNAVAIL, or OFFLINE.

Running Things Regularly

Rather than remembering to run these commands myself, I wrote a script and a couple of LaunchCtl jobs.

The script depends on the pushover gem:

$ gem install pushover

You’ll have to make yourself a PushOver.net account — and buy a one-time $5 app license for your mobile device. Totally worth it.

Here’s the body of the script:

/usr/local/bin/zpool-status.sh

#!/bin/bash

USER_KEY=utxcorEGUoekRntUeP9PBRtkPemwUv
APP_KEY=a4q3gpv2a9pjb4eawpqmp7u1vt1fa1

/usr/local/bin/zpool status -v > /tmp/.zpool-status
if cat /tmp/.zpool-status | grep -q 'DEGRADED\|UNAVAIL\|OFFLINE'; then
  echo "ZFS is degraded!"
  /usr/local/bin/pushover -u $USER_KEY -a $APP_KEY -p h "ZFS is degraded!"
fi

Replace the keys with your own.

Note that the above script does not explicitly check the status of scrub operations. My assumption – perhaps warranted – is that irreparable errors will show up as degraded volumes. If you know better then please let me know in the comments!

To have this, and scrub, run at a set frequency on MacOS you need to install them as LaunchAgents.

Here are those files.

/Library/LaunchAgents/com.mostlydev.zpool-scrub.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Disabled</key>
	<false/>
	<key>Label</key>
	<string>com.mostlydev.zpool-scrub</string>
	<key>ProgramArguments</key>
	<array>
		<string>/usr/local/bin/zpool</string>
		<string>scrub</string>
		<string>tank</string>
	</array>
	<key>StartCalendarInterval</key>
	<array>
		<dict>
			<key>Day</key>
			<integer>1</integer>
			<key>Hour</key>
			<integer>3</integer>
		</dict>
	</array>
</dict>
</plist>

This initiates a scrub at 3 AM on the 1st of every month.

And, to check status, once every hour on the hour:

/Library/LaunchAgents/com.mostlydev.zpool-status.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Disabled</key>
	<false/>
	<key>Label</key>
	<string>com.mostlydev.zpool-status</string>
	<key>Program</key>
	<string>/usr/local/bin/zpool-status.sh</string>
	<key>StartCalendarInterval</key>
	<array>
		<dict>
			<key>Minute</key>
			<integer>0</integer>
		</dict>
	</array>
</dict>
</plist>

Make sure these files are owned by root:wheel:

$ sudo chown root:wheel /Library/LaunchAgents/com.mostlydev.zpool*

Then load them up and you’re all set to go.

$ sudo launchctl load -w /Library/LaunchAgents/com.mostlydev.zpool-scrub.plist
$ sudo launchctl load -w /Library/LaunchAgents/com.mostlydev.zpool-status.plist

One thought on “Monitoring the Health of ZFS Volumes on MacOS”

  1. I struggled to use the pushover gem. I am not familiar with the gem system, but I couldnt find the pushover executable. The provided path did not work for me.

    I used just curl instead, like described here:
    https://pushover.net/faq#library-shell

    thanks for the tutorial!

    For an easy creation of the scheduling I used Lingon X

Leave a Reply

Your email address will not be published. Required fields are marked *