ZFS on Ubuntu
Goals#
- Build a NAS on low-power TinyPC (Thinkcentre M910q, i5 6500T, 8GB RAM)
- Ubuntu 24.04.4 LTS
- ZFS pools and filesystem with auto-snapshots
- Serve via NFS and SMB
- ZFS snapshots / Previous Versions working on Windows clients
- Enable smartd monitoring
- Send Telegram alerts for smartd and ZED events
- Sensible tuning
Intro#
This PC is also my living-room media player, so this is an always-on, low budget, and low power draw (< 15W idle) build. Being a TinyPC, I chose to add storage via USB-attached SCSI caddies (UASP).
If you go the same route, pay attention to the chipset and supported features; many people have issues with the JMircron JMS578/583 chipset.
Parts:
- 2 x UGREEN 2.5" Hard Drive Enclosure USB 3.0
- RealTek 9201 chipset supports: TRIM, UASP, SMART passthrough
- 2 x SAMSUNG 870 EVO 2.5" SSD 1TB
- TLC NAND, 1GB DRAM, MKX controller; boring and reliable
Note on ARC:
- This guide does not cover ARC; for this build and use case:
- Under load ARC is stable around 2.2GB RAM (8GB system RAM)
- L2ARC; not needed, also, USB disks are a bottleneck
- SLOG; undesirable, complexity without benefit, potentially harmful
Label the caddies!#
Once you have the SSDs installed in caddies, the first task is to slap labels on the caddies so that when one fails you know which to unplug and replace. You can enable the blinkenlight but I don’t want to rely on that alone.
Connect them to USB 3.0 ports and remember the attachment order, we are going to check dmesg command output for their device names.
Predictably enough they came up as sdb and sdc for me:
Attachment order#
dmesg | tail
# disk 1 "front left USB / left caddy"
<snip>
[218621.916401] usb 2-5: Manufacturer: Ugreen
[218621.916407] usb 2-5: SerialNumber: 13330***A**1
<snip>
[218622.172910] scsi 4:0:0:0: Direct-Access Samsung SSD 870 EVO 1TB 1.02 PQ: 0 ANSI: 6
[218622.207110] sd 4:0:0:0: Attached scsi generic sg1 type 0
[218622.213299] sd 4:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
<snip>
# disk 2 "front right USB / right caddy"
[218631.852792] usb 2-6: Manufacturer: Ugreen
[218631.852797] usb 2-6: SerialNumber: 13330***B**2
<snip>
[218632.048772] scsi 5:0:0:0: Direct-Access Samsung SSD 870 EVO 1TB 1.02 PQ: 0 ANSI: 6
[218632.083564] sd 5:0:0:0: Attached scsi generic sg2 type 0
[218632.089750] sd 5:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
<snip>
disk-id#
The correct mapping is not by USB port, dev name [sdb|sdc], but by disk-id
get ids#
# ls -l /dev/disk/by-id/ | egrep "sdb|sdc"
lrwxrwxrwx 1 root root 9 Mar 28 14:28 ata-Samsung_SSD_870_EVO_1TB_S8NBNS0*****BBB -> ../../sdc
lrwxrwxrwx 1 root root 9 Mar 28 14:27 ata-Samsung_SSD_870_EVO_1TB_S8NBNS0*****AAA -> ../../sdb
lrwxrwxrwx 1 root root 9 Mar 28 14:27 usb-Samsung_SSD_870_EVO_1TB_13330***A**1-0:0 -> ../../sdb
lrwxrwxrwx 1 root root 9 Mar 28 14:28 usb-Samsung_SSD_870_EVO_1TB_13330***B**2-0:0 -> ../../sdc
lrwxrwxrwx 1 root root 9 Mar 28 14:28 wwn-0x5001112223a3344z -> ../../sdc
lrwxrwxrwx 1 root root 9 Mar 28 14:27 wwn-0x5001112223a1122a -> ../../sdb
decipher them#
We have several ids:
usb- tied to serial number of USB caddy; don’t use that.
ata- tied to serial number of SSD; better, but ata IDs are protocol dependent, string not guaranteed stable across systems.
wwn-0x tied to serial number of SSD; best practice, designed to be globally unique and stable.
So the only id we really care about is:
- wwn-0x world-wide name
Anyway here is what we found:
Disk 1 / LEFT CADDY
USB serial: 13330***A**1
SSD serial: S8NBNS0*****AAA
WWN: 0x5001112223a1122a
Disk 2 / RIGHT CADDY
USB serial: 13330***B**2
SSD serial: S8NBNS0*****BBB
WWN: 0x5001112223a3344z
If we later move the disks between systems, the following commands help us correctly identify them again:
# WWN; this is all we need
ls -l /dev/disk/by-id/
# can also view the SSD serial of a given dev like:
udevadm info --query=all --name=/dev/sdX | grep ID_SERIAL
# smartctl may fail with USB caddies:
smartctl -a /dev/sdX | grep Serial
smartctl -i /dev/sdX | grep Serial
# but you can do '-d <type>' to tell it how
# for my enclosures, '-d sat' is correct for my Realtek 9201 (UAS) USB<>SATA bridge
smartctl -a /dev/sdX -d sat [| grep Serial]
smartctl -i /dev/sdX -d sat [| grep Serial]
# lsblk
# lsblk -o NAME,SIZE,MODEL,SERIAL | grep -v loop
NAME SIZE MODEL SERIAL
sda 465.8G WDC WD5000BEVT-00A0RT0 WD-WX91A2099621
└─sda1 465.8G
sdb 931.5G Samsung SSD 870 EVO 1TB S8NBNS0*****AAA
sdc 931.5G Samsung SSD 870 EVO 1TB S8NBNS0*****BBB
nvme0n1 119.2G SAMSUNG MZVLW128HEGR-000L1 S341NX0K610955
├─nvme0n1p1 1G
└─nvme0n1p2 118.2G
To reimport them again:
zpool import
<will show discovered pools>
zpool import <whatever you called yours>
SMART monitoring#
We will perform some basic checks and enable SMART monitoring before building the pool. If re-using aged disks you might spend more time verifying their health.
Checks#
# install smartmontools if not already present
apt install smartmontools
# check stats
smartctl -d sat -a /dev/sdb
smartctl -d sat -a /dev/sdc
# run a short test; wait 2 mins
smartctl -d sat -t short /dev/sdb
smartctl -d sat -t short /dev/sdc
sleep 125
# check results
smartctl -d sat -a /dev/sdb
smartctl -d sat -a /dev/sdc
# should show something like:
Short self-test routine
# 1 Short offline Completed without error 00% 1 -
Short self-test routine
# 1 Short offline Completed without error 00% 1 -
edit /etc/smartd.conf#
If you’ve never edited smartd.conf previously, please first read the comments therein; there is a line beginning DEVICESCAN -d removeable.. you may wish to comment out.
Here we add our new drives:
# These are my WWNs, and '-d sat' for by USB enclosures
# The entries in brackets - S = short test every day 2/3AM, L = long test every Saturday or Sunday 3AM
# In this way the tests are staggered
# Later when we setup ZFS scrub, we need to stagger that not to collide either
#
/dev/disk/by-id/wwn-0x5001112223a3344z -d sat -a -o on -S on -s (S/../.././02) -s (L/../../6/03) -m root
/dev/disk/by-id/wwn-0x5001112223a1122a -d sat -a -o on -S on -s (S/../.././03) -s (L/../../7/03) -m root
custom logging#
At this point you are running tests, but what happens on failure .. something or nothing?
If you want to run custom scritps, add -M exec /usr/share/smartmontools/smartd-runner to the end of each line.
The smartd-runner will call the scripts inside /etc/smartmontools/run.d/.
So what will your custom script do? It could send an email, ping you on Slack, hit an API of your choice.
By default smartmontools has a simple mailer script 10mail and nothing else.
In this section I will setup 20logging script; later in the guide we cover Telegram.
If you add your own script, ensure owner and permissions are correct (root:root, 755). If your script writes to a dedicated logfile, ensure you don’t forget to configure logrotate.
Example outputs from my logging:
# tail /var/log/syslog.log
2026-03-28T16:05:44.410122+00:00 tank smartd-hook: 2026-03-28 16:05:44 +0000 host=tank device=/dev/disk/by-id/wwn-0x5001112223a1122a devtype=sat failtype=test address=root msg="manual smartd hook test"
# tail /var/log/smart-events.log
2026-03-28 16:05:44 +0000 host=tank device=/dev/disk/by-id/wwn-0x5001112223a1122a devtype=sat failtype=test address=root msg="manual smartd hook test"
If you want to use this logging script, put this into /etc/smartmontools/run.d/20logging:
#!/bin/bash
set -u
LOGFILE="/var/log/smart-events.log"
TAG="smartd-hook"
TS="$(date '+%Y-%m-%d %H:%M:%S %z')"
HOST="$(hostname -f 2>/dev/null || hostname)"
DEVICE="${SMARTD_DEVICE:-unknown-device}"
DEVTYPE="${SMARTD_DEVICETYPE:-unknown-type}"
FAILTYPE="${SMARTD_FAILTYPE:-unknown-failtype}"
MESSAGE="${SMARTD_MESSAGE:-no-message}"
ADDRESS="${SMARTD_ADDRESS:-no-address}"
LINE="$TS host=$HOST device=$DEVICE devtype=$DEVTYPE failtype=$FAILTYPE address=$ADDRESS msg=\"$MESSAGE\""
# Write to syslog/journal
logger -t "$TAG" -- "$LINE"
# Write to dedicated flat log
printf '%s\n' "$LINE" >> "$LOGFILE"
enable and start smartd#
Enable the daemon and start it:
systemctl enable smartd
systemctl start smartd
ZFS#
Install tools#
Install zfs tools if not already present and NB: ensure they are the correct and latest versions for your distro release! eg: Ubuntu 24.04.3 LTS fresh install identified correct vers by apt policy but I still had to purge and reinstall to end up on the right versions; this is some Ubuntu issue, not an OpenZFS issue.
apt update
apt install zfsutils-linux zfs-fuse -y
Wipe disks correctly#
Ensure the disks have been wiped of magic strings, ie: partition tables, filesystem signatures, GPT/MBR
wipefs -a /dev/disk/by-id/wwn-0x5001112223a1122a
wipefs -a /dev/disk/by-id/wwn-0x5001112223a3344z
sgdisk -Z /dev/disk/by-id/wwn-0x5001112223a1122a
sgdisk -Z /dev/disk/by-id/wwn-0x5001112223a3344z
Determine ZFS settings#
For SSD, force 4K alignment ashift=12 Compression is wanted compression=lz4 Avoid unnecessary writes atime=off Better metadata performance xattr=sa Ensure power-loss safe(r) sync sync=standard <– default; fast is risky
Create ZFS pool using our WWN IDs#
For -o ; ashift is crucial because we can’t change it later For -O ; we can modify these later
Create the pool:
zpool create \
-o ashift=12 \
-o autotrim=on \
-O compression=lz4 \
-O atime=off \
-O xattr=sa \
tank mirror \
/dev/disk/by-id/wwn-0x5001112223a1122a \
/dev/disk/by-id/wwn-0x5001112223a3344z
Verify pool#
zpool status
zpool list
zpool get ashift,autotrim tank
zfs list
zfs get compression,atime,xattr tank
Initial scrub#
A scrub reads data, verifies checksums, and repairs data if needed.
zpool scrub tank
# verify
zpool status -v | grep scrub
scan: scrub repaired 0B in 00:00:01 with 0 errors on Sat Mar 28 17:35:02 2026
Check TRIM works#
Do initial check that TRIM will work:
zpool trim tank
Verify trim running:
zpool status -v | grep trim
wwn-0x5001112223a1122a ONLINE 0 0 0 (trimming)
wwn-0x5001112223a3344z ONLINE 0 0 0 (trimming)
Confirm TRIM is actually happening on the disks themselves; we check the DISC-GRAN column in lsblk output:
- shows
512B<– TRIM supported and working - shows
0B<– oops, this USB enclosure does not support it!
# lsblk -D | egrep "sdb|sdc"
sdb 0 512B 4G 0
├─sdb1 0 512B 4G 0
└─sdb9 0 512B 4G 0
sdc 0 512B 4G 0
├─sdc1 0 512B 4G 0
└─sdc9 0 512B 4G 0
Setup monthly scrubbing#
Crontab style:
crontab -e
0 4 1-7 * 2 /sbin/zpool scrub tank
systemd style:
# nano /etc/systemd/system/zfs-scrub@.service
[Unit]
Description=ZFS scrub on pool %i
[Service]
Type=oneshot
ExecStart=/sbin/zpool scrub %i
# nano /etc/systemd/system/zfs-scrub@tank.timer
[Unit]
Description=Monthly ZFS scrub for tank
[Timer]
OnCalendar=Tue *-*-01..07 04:00
Persistent=true
[Install]
WantedBy=timers.target
Enable now:
systemctl daemon-reexec
systemctl daemon-reload
systemctl enable --now zfs-scrub@tank.timer
Setup better logging#
# nano /usr/local/sbin/zfs-scrub-runner
#!/bin/bash
set -e
POOL="$1"
TAG="zfs-scrub"
LOGFILE="/var/log/zfs-events.log"
TS="$(date '+%Y-%m-%d %H:%M:%S %z')"
HOST="$(hostname -f 2>/dev/null || hostname)"
log() {
LINE="$TS host=$HOST pool=$POOL msg=\"$1\""
logger -t "$TAG" -- "$LINE"
echo "$LINE" >> "$LOGFILE"
}
log "scrub started"
if /sbin/zpool scrub "$POOL"; then
log "scrub triggered successfully"
else
log "scrub failed to start"
exit 1
fi
Make executable:
chmod +x /usr/local/sbin/zfs-scrub-runner
Create log and set owner and permissions:
touch /var/log/zfs-events.log
chown root:adm /var/log/zfs-events.log
chmod 0644 /var/log/zfs-events.log
Update cron / systemd as required; restart service if systemd:
# 0 4 1-7 * 2 /usr/local/sbin/zfs-scrub-runner tank
# or
# ExecStart=/usr/local/sbin/zfs-scrub-runner %i
Daemonise logging of ZFS events#
Create /etc/systemd/system/zfs-events.service:
[Unit]
Description=ZFS event logger
After=zfs.target
[Service]
Type=simple
ExecStart=/bin/bash -c 'exec zpool events -f | while read -r line; do logger -t zfs-events -- "$line"; done'
Restart=always
RestartSec=2
# Optional hardening
Nice=10
IOSchedulingClass=best-effort
[Install]
WantedBy=multi-user.target
Enable and start:
systemctl daemon-reload
systemctl enable --now zfs-events.service
Create ZFS datasets#
zpool automatically creates mount points, eg: if your pool is called “tank” it will automatically have been mounted at /tank.
We won’t use /tank root, we want to create specific datasets under it:
zfs create tank/data
zfs create tank/backups
I also want specific mounts under /srv:
zfs set mountpoint=/srv/data tank/data
zfs set mountpoint=/srv/backups tank/backups
I want to be able to hit these shares over both Sambda and NFS. Create a group ‘storage’, and add that group to my local user:
groupadd storage
usermod -aG storage <youruser>
Permissions on the /srv/s:
chown -R root:storage /srv/data /srv/backups
chmod -R 2775 /srv/data /srv/backups
Set ZFS POSIX ACLs to ensure NFS/SMB permissions are consistent:
zfs set acltype=posixacl tank/data
zfs set acltype=posixacl tank/backups
zfs set aclinherit=passthrough tank/data
zfs set aclmode=passthrough tank/data
zfs set aclinherit=passthrough tank/backups
zfs set aclmode=passthrough tank/backups
NFS server#
Install if not already present:
apt update
apt install nfs-kernel-server -y
Export the shares, first modify the config:
# nano /etc/exports
/srv/data 192.168.1.0/24(rw,sync,no_subtree_check)
/srv/backups 192.168.1.0/24(rw,sync,no_subtree_check)
Export them and restart the nfs server:
exportfs -ra
systemctl restart nfs-kernel-server
Permissions / accessing shares from client side#
- Samba shares take user+password; straightforward and intuitive.
- NFS operates differently:
NFS v2/v3 shares we created above don’t use user+password; instead:
- if client UID or GID matches, rw access is granted
- UID may vary if your machines have multiple users
- GID matching is easier to make consistent
If you want to create a “storage” group on server and all clients:
# do (carefully!):
sudo groupadd -g 1001 storage <--- where 1001 matches the GID on the NFS server
If you have GID misalignment and want to set the GID, you can do:
sudo groupmod -g 1001 storage <--- if 'storage' already exists on clients but you need to change GID
If you change the GID on server, then you need to update file permissions :-)
# update permissions if you have changed GID, note the variable <OLD_GID>:
sudo find / -group <OLD_GID> -exec chgrp -h storage {} \;
Finally, ensure you client user is in the storage group:
sudo usermod -aG storage youruser
NFS client#
Test you can mount the shares:
sudo apt install nfs-common
sudo mkdir -p /mnt/data
sudo mkdir -p /mnt/backups
sudo mount -t nfs <SERVER IP or hostname>:/srv/data /mnt/data
sudo mount -t nfs <SERVER IP or hostname>:/srv/backups /mnt/backups
mount | grep nfs
ls -l /mnt/data
touch /mnt/data/testfile
ls -l /mnt/data/testfile <--- should be <youruser> storage for user and group
Make persistent across reboots:
_netdev means to wait for network before trying to mount#
# sudo nano /etc/fstab
<SERVER IP or hostname>:/srv/data /mnt/data nfs defaults,_netdev 0 0
<SERVER IP or hostname>:/srv/backups /mnt/backups nfs defaults,_netdev 0 0
To mount all in /etc/fstab:
sudo mount -a
SMB/Samba server#
Install if not already present:
apt update
apt install samba -y
Export the shares, first modify the config:
# nano /etc/samba/smb.conf
[data]
path = /srv/data
browseable = yes
read only = no
guest ok = no
valid users = @storage
force group = storage
create mask = 0664
directory mask = 2775
[backups]
path = /srv/backups
browseable = yes
read only = no
guest ok = no
valid users = @storage
force group = storage
create mask = 0664
directory mask = 2775
# NB: you also need to edit the [global] section
# particularly to bind the daemon to your LAN interface
# uncomment the line and add LAN subnet / LAN interface, eg: ens3
Create smb user:
smbpasswd -a youruser <enter>
New SMB password: <type one, then enter>
Retype new SMB password: <retype, then enter>
Restart smbd:
systemctl restart smbd
SMB client#
GUI:
- My Computer > right click > Map network drive
\\<SERVER IP or hostname>\data
\\<SEREVR IP or hostname\backups
- Use your and
- Tick boxes for save credentials and reconnect at logon
CLI:
net use Z: \\server\data /user:USERNAME /persistent:yes
net use Y: \\server\backups /user:USERNAME /persistent:yes
ZFS snapshots#
A main feature of ZFS is ability to create snapshots/“Previous versions”/shadow copies. There aren’t n copies of every file; snapshotting only consumes space for changed files between snapshots.
Install tool:
apt install zfs-auto-snapshot
Edit config to match the policy you want:
# note // hits all pools; if you want granularity you can tinker in here
# eg: if you just want this 'tank' and not other pools, replace '//' with 'tank'
#
nano /etc/cron.d/zfs-auto-snapshot
# I am going with this policy:
0 * * * * root /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=hourly --keep=24 //
0 0 * * * root /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=daily --keep=7 //
0 0 * * 0 root /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=weekly --keep=4 //
0 0 1 * * root /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=monthly --keep=12 //
Verify:
# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
tank@zfs-auto-snap_frequent-2026-03-28-1845 0B - 96K -
tank/backups@zfs-auto-snap_frequent-2026-03-28-1845 0B - 96K -
tank/data@zfs-auto-snap_frequent-2026-03-28-1845 0B - 96K -
Not much time passed nor any data changed, so create a text file from a client, eg:
/mnt/data$ echo "this is the first version" > test.txt
Fire “daily” and “monthly” manually on server rather than wait on timers:
/usr/sbin/zfs-auto-snapshot --label=daily --keep=7 //
/usr/sbin/zfs-auto-snapshot --label=monthly --keep=12 //
Verify again:
# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
tank@zfs-auto-snap_frequent-2026-03-28-1845 0B - 96K -
tank@zfs-auto-snap_daily-2026-03-28-1850 0B - 96K -
tank@zfs-auto-snap_monthly-2026-03-28-1850 0B - 96K -
tank/backups@zfs-auto-snap_frequent-2026-03-28-1845 0B - 96K -
tank/backups@zfs-auto-snap_daily-2026-03-28-1850 0B - 96K -
tank/backups@zfs-auto-snap_monthly-2026-03-28-1850 0B - 96K -
tank/data@zfs-auto-snap_frequent-2026-03-28-1845 56K - 96K - <<<<<<<< test.txt file created
tank/data@zfs-auto-snap_daily-2026-03-28-1850 0B - 104K -
tank/data@zfs-auto-snap_monthly-2026-03-28-1850 0B - 104K -
Now introduce a delta in the text file:
# edit test text file so next 'frequent' snapshot captures delta
/mnt/data$ echo "THIS IS A DELTA" >> test.txt
Make snapshots / previous versions visibile for clients:
zfs set snapdir=visible tank/data
zfs set snapdir=visible tank/backups
For SMB clients we need to edit /etc/smb.conf:
# add this under each share; NB shadow:basedir differs for each of 'data' and 'backups'
vfs objects = shadow_copy2 acl_xattr
map acl inherit = yes
store dos attributes = yes
shadow:basedir = /srv/data
shadow:snapdir = .zfs/snapshot
shadow:localtime = yes
shadow:sort = desc
shadow:fixinodes = yes
# use .* if you want to allow snapshots other than zfs-auto-snap generated and named
# shadow:snapprefix = .*
shadow:snapprefix = ^zfs-auto-snap_\(frequent\)\{0,1\}\(hourly\)\{0,1\}\(daily\)\{0,1\}\(monthly\)\{0,1\}
shadow:delimiter = -20
shadow:format = -%Y-%m-%d-%H%M
Restart smbd:
systemctl restart smbd
Verify with smbclient running locally, first:
# smbclient //localhost/data -U niall -c "allinfo test.txt"
Password for [WORKGROUP\user]:
Try "help" to get a list of possible commands.
smb: \> allinfo test.txt
altname: test.txt
create_time: Sat Mar 28 06:48:39 PM 2026 GMT
access_time: Sat Mar 28 06:48:39 PM 2026 GMT
write_time: Sat Mar 28 07:05:24 PM 2026 GMT
change_time: Sat Mar 28 07:05:24 PM 2026 GMT
<snapshots here>
Verify on disk:
# for f in /srv/data/.zfs/snapshot/*/pvtest.txt; do echo "== $f =="; cat "$f"; done
== /srv/data/.zfs/snapshot/zfs-auto-snap_daily-2026-03-28-19-42/pvtest.txt ==
v2
== /srv/data/.zfs/snapshot/zfs-auto-snap_frequent-2026-03-28-20-50/pvtest.txt ==
v5
== /srv/data/.zfs/snapshot/zfs-auto-snap_hourly-2026-03-28-19-42/pvtest.txt ==
v2
== /srv/data/.zfs/snapshot/zfs-auto-snap_hourly-2026-03-28-19-45/pvtest.txt ==
v3
== /srv/data/.zfs/snapshot/zfs-auto-snap_hourly-2026-03-28-20-13/pvtest.txt ==
v4
Windows SMB clients should see snapshots under “Previous Versions” tab of file and folder properties.
Telegram alerts#
Create the Bot#
First we create a bot by sending a Telegram message to user BotFather:
/newbot
Save the bot name and token somewhere safe (your new ZFS?)
bot name: <itsname>
bot token: <secrettoken>
Use the t.me/ link to start a chat; send a message to initiate and generate chat ID:
foo
Get the chat ID:
curl -s https://api.telegram.org/bot<YOUR_TOKEN>/getUpdates
Look for:
"chat":{"id":123456789}
Create alert script:
# nano /usr/local/bin/telegram-alert
#!/bin/bash
TOKEN="YOUR_BOT_TOKEN"
CHAT_ID="YOUR_CHAT_ID"
HOST=$(hostname)
MSG="$*"
timeout 20 \
curl \
--connect-timeout 5 \
--max-time 15 \
-s -X POST "https://api.telegram.org/bot${TOKEN}/sendMessage" \
-d chat_id="${CHAT_ID}" \
-d text="[$HOST] $MSG" \
-d disable_web_page_preview=true \
> /dev/null
Make executable:
chmod +x /usr/local/bin/telegram-alert
Test it:
/usr/local/bin/telegram-alert "test message"
You should get a message on Telegram from your bot.
Hook into ZFS-ZED#
- If you look in /etc/zfs/zed.d/zed.rc, note that it has variables for email and for Slack webhooks; you can use those too.
- I will create a separate script for Telegram:
# nano /etc/zfs/zed.d/all-telegram.sh <--- you must use ZED cmdlets naming convention
# the 'all-' prefix catches all events; then inside the script you can filter
# calling it "telegram.sh" will break; NB manpages :)
#!/bin/bash
set -eu
# Only alert on the classes you care about; remove this case block to alert on everything.
case "${ZEVENT_CLASS:-}" in
*statechange*|*config_sync*|*io*|*fault*|*vdev*|*resilver*|*scrub*)
;;
*)
exit 0
;;
esac
/usr/local/bin/telegram-alert \
"ZFS event on $(hostname)
class=${ZEVENT_CLASS:-unknown}
subclass=${ZEVENT_SUBCLASS:-unknown}
pool=${ZEVENT_POOL:-unknown}
vdev_path=${ZEVENT_VDEV_PATH:-unknown}
vdev_guid=${ZEVENT_VDEV_GUID:-unknown}
time=${ZEVENT_TIME_STRING:-unknown}"
Fix permissions:
chown root:root /etc/zfs/zed.d/all-telegram.sh
chmod 755 /etc/zfs/zed.d/all-telegram.sh
Restart zed:
systemctl restart zed
Test the script by offlining a disk.
# get the WWN of one disk:
zpool status
# offline it
zpool offline tank <wwn>
You should have a Telegram message like:
[hostname] ZFS event on ${hostname}
<snip>
Now adjust the script if you wish to dial down verbosity:
# currently: *statechange*|*config_sync*|*io*|*fault*|*vdev*|*resilver*|*scrub*)
# suggested - remove *config_sync* which will be relatively noisy housekeeping
#
# if you want to ignore scrub starts and finishes with 0 errors
# add this after the case block so script exits early;
# else it proceeds to send alerts
if [[ "${ZEVENT_CLASS:-}" == *scrub_start* ]]; then
exit 0
fi
if [[ "${ZEVENT_CLASS:-}" == *scrub_finish* ]]; then
if zpool status | grep -q "0 errors"; then
exit 0
fi
fi
Hook into ZFS health check (zfs-health-check)#
We will create a monitoring script for capacity utilisation and fragmentation state:
# nano /usr/local/bin/zfs-health-check
#!/bin/bash
POOL="tank"
HOST=$(hostname)
CAP=$(zpool list -H -o capacity $POOL | tr -d '%')
FRAG=$(zpool list -H -o fragmentation $POOL | tr -d '%')
STATE_FILE="/tmp/zfs-health-last"
CURRENT="$CAP-$FRAG"
# if this run = last run, exit early
if [ -f "$STATE_FILE" ] && grep -q "$CURRENT" "$STATE_FILE"; then
exit 0
fi
# else update state and proceed to do checks
echo "$CURRENT" > "$STATE_FILE"
MSG=""
# Capacity checks
if (( CAP >= 85 )); then
MSG+="🔴 CRIT: Pool $POOL usage ${CAP}%
"
elif (( CAP >= 75 )); then
MSG+="🟡 WARN: Pool $POOL usage ${CAP}%
"
fi
# Fragmentation checks
if (( FRAG >= 45 )); then
MSG+="🔴 CRIT: Fragmentation ${FRAG}%
"
elif (( FRAG >= 30 )); then
MSG+="🟡 WARN: Fragmentation ${FRAG}%
"
fi
# Send alert if needed
if [ -n "$MSG" ]; then
/usr/local/bin/telegram-alert \
"$HOST \
$MSG"
fi
Make executable:
chmod +x /usr/local/bin/zfs-health-check
Daemonise it with systemd:
# service; nano /etc/systemd/system/zfs-health-check.service
[Unit]
Description=ZFS Health Check
[Service]
Type=oneshot
ExecStart=/usr/local/bin/zfs-health-check
# timer; nano /etc/systemd/system/zfs-health-check.timer
[Unit]
Description=Run ZFS health check every hour
[Timer]
OnCalendar=hourly
Persistent=true
[Install]
WantedBy=timers.target
Enable and start:
systemctl daemon-reload
systemctl enable --now zfs-health-check.timer
Testing the script is trickier; as workaround you can edit the script:
# to test it - you can set CRIT = capacity >= 0 and fragmentation >= 0
# then run it, but change it back to desired thresholds after successful test:
/usr/local/bin/zfs-health-check
Hook smartd into Telegram#
This one is easy; if smartd.conf has -M exec /usr/share/smartmontools/smartd-runner then it runs all scripts in /etc/smartmontools/run.d.
So if you have already:
-rwxr-xr-x 1 root root 231 Oct 10 2019 10mail
-rwxr-xr-x 1 root root 604 Mar 28 16:02 20logging
We just add a new file, 30telegram:
# nano /etc/smartmontools/run.d/30telegram
#!/bin/bash
# smartd-runner passes a temp file path as $1
if [ -f "$1" ]; then
MSG=$(cat "$1")
else
MSG="$*"
fi
exec /usr/local/bin/telegram-alert "[SMART] $MSG"
Fix permissions:
chmod +x 30telegram
chown root:root 30telegram
Restart smartd:
systemctl restart smartd
Test that smartd-runner is calling it; noting that the runner calls all scripts in the directory; in our case, mail, logging, and finally telegram.
If any script is failing you should see the output from it, eg: if you haven’t setup mail, 10mail will exit 1.
To prevent that error - if you won’t use 10mail, remove chmod -x it.
If we test like:
echo "smartd test message" | /usr/share/smartmontools/smartd-runner
Then on telegram we should receive a message like:
[hostname] [SMART] smartd test message
Outro#
Hopefully you managed to get ZFS, snapshots, NFS, SMB, smartd, ZED, zfs-health-check, and Telegram all singing and dancing.
Well, from here I would say well done and enjoy; here is what this post did not cover (yet):
- Alert scripts can all do with:
- More robust error handling
- Cleaner formatting
- Verbosity tweaks / extra outputs / extra filtering
- Move to asynchronous non-blocking
- Handle retries in case of WAN failures
- Post can do with:
- Images / screenshots from eg: Windows client