Goals#

  • Build a NAS on low-power TinyPC (Thinkcentre M910q, i5 6500T, 8GB RAM)
  • Ubuntu 24.04.4 LTS
  • ZFS pools and filesystem with auto-snapshots
  • Serve via NFS and SMB
  • ZFS snapshots / Previous Versions working on Windows clients
  • Enable smartd monitoring
  • Send Telegram alerts for smartd and ZED events
  • Sensible tuning

Intro#

This PC is also my living-room media player, so this is an always-on, low budget, and low power draw (< 15W idle) build. Being a TinyPC, I chose to add storage via USB-attached SCSI caddies (UASP).

If you go the same route, pay attention to the chipset and supported features; many people have issues with the JMircron JMS578/583 chipset.

Parts:

  • 2 x UGREEN 2.5" Hard Drive Enclosure USB 3.0
    • RealTek 9201 chipset supports: TRIM, UASP, SMART passthrough
  • 2 x SAMSUNG 870 EVO 2.5" SSD 1TB
    • TLC NAND, 1GB DRAM, MKX controller; boring and reliable

Note on ARC:

  • This guide does not cover ARC; for this build and use case:
    • Under load ARC is stable around 2.2GB RAM (8GB system RAM)
    • L2ARC; not needed, also, USB disks are a bottleneck
    • SLOG; undesirable, complexity without benefit, potentially harmful

Label the caddies!#

Once you have the SSDs installed in caddies, the first task is to slap labels on the caddies so that when one fails you know which to unplug and replace. You can enable the blinkenlight but I don’t want to rely on that alone.

Connect them to USB 3.0 ports and remember the attachment order, we are going to check dmesg command output for their device names.

Predictably enough they came up as sdb and sdc for me:

Attachment order#

dmesg | tail
# disk 1 "front left USB / left caddy"
<snip>
[218621.916401] usb 2-5: Manufacturer: Ugreen
[218621.916407] usb 2-5: SerialNumber: 13330***A**1
<snip>
[218622.172910] scsi 4:0:0:0: Direct-Access     Samsung  SSD 870 EVO 1TB  1.02 PQ: 0 ANSI: 6
[218622.207110] sd 4:0:0:0: Attached scsi generic sg1 type 0
[218622.213299] sd 4:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
<snip>

# disk 2 "front right USB / right caddy"
[218631.852792] usb 2-6: Manufacturer: Ugreen
[218631.852797] usb 2-6: SerialNumber: 13330***B**2
<snip>
[218632.048772] scsi 5:0:0:0: Direct-Access     Samsung  SSD 870 EVO 1TB  1.02 PQ: 0 ANSI: 6
[218632.083564] sd 5:0:0:0: Attached scsi generic sg2 type 0
[218632.089750] sd 5:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
<snip>

disk-id#

The correct mapping is not by USB port, dev name [sdb|sdc], but by disk-id

get ids#

# ls -l /dev/disk/by-id/ | egrep "sdb|sdc"
lrwxrwxrwx 1 root root  9 Mar 28 14:28 ata-Samsung_SSD_870_EVO_1TB_S8NBNS0*****BBB -> ../../sdc
lrwxrwxrwx 1 root root  9 Mar 28 14:27 ata-Samsung_SSD_870_EVO_1TB_S8NBNS0*****AAA -> ../../sdb
lrwxrwxrwx 1 root root  9 Mar 28 14:27 usb-Samsung_SSD_870_EVO_1TB_13330***A**1-0:0 -> ../../sdb
lrwxrwxrwx 1 root root  9 Mar 28 14:28 usb-Samsung_SSD_870_EVO_1TB_13330***B**2-0:0 -> ../../sdc
lrwxrwxrwx 1 root root  9 Mar 28 14:28 wwn-0x5001112223a3344z -> ../../sdc
lrwxrwxrwx 1 root root  9 Mar 28 14:27 wwn-0x5001112223a1122a -> ../../sdb

decipher them#

We have several ids: usb- tied to serial number of USB caddy; don’t use that. ata- tied to serial number of SSD; better, but ata IDs are protocol dependent, string not guaranteed stable across systems. wwn-0x tied to serial number of SSD; best practice, designed to be globally unique and stable.

So the only id we really care about is:

  • wwn-0x world-wide name

Anyway here is what we found:

Disk 1 / LEFT CADDY
  USB serial: 13330***A**1
  SSD serial: S8NBNS0*****AAA
  WWN:        0x5001112223a1122a

Disk 2 / RIGHT CADDY
  USB serial: 13330***B**2
  SSD serial: S8NBNS0*****BBB
  WWN:        0x5001112223a3344z
 

If we later move the disks between systems, the following commands help us correctly identify them again:

# WWN; this is all we need
ls -l /dev/disk/by-id/

# can also view the SSD serial of a given dev like:
udevadm info --query=all --name=/dev/sdX | grep ID_SERIAL

# smartctl may fail with USB caddies:
smartctl -a /dev/sdX | grep Serial
smartctl -i /dev/sdX | grep Serial

# but you can do '-d <type>' to tell it how
# for my enclosures, '-d sat' is correct for my Realtek 9201 (UAS) USB<>SATA bridge
smartctl -a /dev/sdX -d sat [| grep Serial]
smartctl -i /dev/sdX -d sat [| grep Serial]

# lsblk
# lsblk -o NAME,SIZE,MODEL,SERIAL | grep -v loop
NAME          SIZE MODEL                      SERIAL
sda         465.8G WDC WD5000BEVT-00A0RT0     WD-WX91A2099621
└─sda1      465.8G
sdb         931.5G Samsung SSD 870 EVO 1TB    S8NBNS0*****AAA
sdc         931.5G Samsung SSD 870 EVO 1TB    S8NBNS0*****BBB
nvme0n1     119.2G SAMSUNG MZVLW128HEGR-000L1 S341NX0K610955
├─nvme0n1p1     1G
└─nvme0n1p2 118.2G

To reimport them again:

zpool import
<will show discovered pools>
zpool import <whatever you called yours>

SMART monitoring#

We will perform some basic checks and enable SMART monitoring before building the pool. If re-using aged disks you might spend more time verifying their health.

Checks#

# install smartmontools if not already present
apt install smartmontools

# check stats
smartctl -d sat -a /dev/sdb
smartctl -d sat -a /dev/sdc

# run a short test; wait 2 mins
smartctl -d sat -t short /dev/sdb
smartctl -d sat -t short /dev/sdc
sleep 125

# check results
smartctl -d sat -a /dev/sdb
smartctl -d sat -a /dev/sdc
# should show something like:

Short self-test routine
# 1  Short offline       Completed without error       00%         1         -
Short self-test routine
# 1  Short offline       Completed without error       00%         1         -

edit /etc/smartd.conf#

If you’ve never edited smartd.conf previously, please first read the comments therein; there is a line beginning DEVICESCAN -d removeable.. you may wish to comment out.

Here we add our new drives:

# These are my WWNs, and '-d sat' for by USB enclosures
# The entries in brackets - S = short test every day 2/3AM, L = long test every Saturday or Sunday 3AM
# In this way the tests are staggered
# Later when we setup ZFS scrub, we need to stagger that not to collide either
#
/dev/disk/by-id/wwn-0x5001112223a3344z -d sat -a -o on -S on -s (S/../.././02) -s (L/../../6/03) -m root
/dev/disk/by-id/wwn-0x5001112223a1122a -d sat -a -o on -S on -s (S/../.././03) -s (L/../../7/03) -m root

custom logging#

At this point you are running tests, but what happens on failure .. something or nothing? If you want to run custom scritps, add -M exec /usr/share/smartmontools/smartd-runner to the end of each line. The smartd-runner will call the scripts inside /etc/smartmontools/run.d/.

So what will your custom script do? It could send an email, ping you on Slack, hit an API of your choice.

By default smartmontools has a simple mailer script 10mail and nothing else. In this section I will setup 20logging script; later in the guide we cover Telegram.

If you add your own script, ensure owner and permissions are correct (root:root, 755). If your script writes to a dedicated logfile, ensure you don’t forget to configure logrotate.

Example outputs from my logging:

 # tail /var/log/syslog.log
2026-03-28T16:05:44.410122+00:00 tank smartd-hook: 2026-03-28 16:05:44 +0000 host=tank device=/dev/disk/by-id/wwn-0x5001112223a1122a devtype=sat failtype=test address=root msg="manual smartd hook test"
# tail /var/log/smart-events.log
2026-03-28 16:05:44 +0000 host=tank device=/dev/disk/by-id/wwn-0x5001112223a1122a devtype=sat failtype=test address=root msg="manual smartd hook test"

If you want to use this logging script, put this into /etc/smartmontools/run.d/20logging:

#!/bin/bash
set -u

LOGFILE="/var/log/smart-events.log"
TAG="smartd-hook"

TS="$(date '+%Y-%m-%d %H:%M:%S %z')"
HOST="$(hostname -f 2>/dev/null || hostname)"

DEVICE="${SMARTD_DEVICE:-unknown-device}"
DEVTYPE="${SMARTD_DEVICETYPE:-unknown-type}"
FAILTYPE="${SMARTD_FAILTYPE:-unknown-failtype}"
MESSAGE="${SMARTD_MESSAGE:-no-message}"
ADDRESS="${SMARTD_ADDRESS:-no-address}"

LINE="$TS host=$HOST device=$DEVICE devtype=$DEVTYPE failtype=$FAILTYPE address=$ADDRESS msg=\"$MESSAGE\""

# Write to syslog/journal
logger -t "$TAG" -- "$LINE"

# Write to dedicated flat log
printf '%s\n' "$LINE" >> "$LOGFILE"

enable and start smartd#

Enable the daemon and start it:

systemctl enable smartd
systemctl start smartd

ZFS#

Install tools#

Install zfs tools if not already present and NB: ensure they are the correct and latest versions for your distro release! eg: Ubuntu 24.04.3 LTS fresh install identified correct vers by apt policy but I still had to purge and reinstall to end up on the right versions; this is some Ubuntu issue, not an OpenZFS issue.

apt update
apt install zfsutils-linux zfs-fuse -y

Wipe disks correctly#

Ensure the disks have been wiped of magic strings, ie: partition tables, filesystem signatures, GPT/MBR

wipefs -a /dev/disk/by-id/wwn-0x5001112223a1122a
wipefs -a /dev/disk/by-id/wwn-0x5001112223a3344z
sgdisk -Z /dev/disk/by-id/wwn-0x5001112223a1122a
sgdisk -Z /dev/disk/by-id/wwn-0x5001112223a3344z

Determine ZFS settings#

For SSD, force 4K alignment ashift=12 Compression is wanted compression=lz4 Avoid unnecessary writes atime=off Better metadata performance xattr=sa Ensure power-loss safe(r) sync sync=standard <– default; fast is risky

Create ZFS pool using our WWN IDs#

For -o ; ashift is crucial because we can’t change it later For -O ; we can modify these later

Create the pool:

zpool create \
  -o ashift=12 \
  -o autotrim=on \
  -O compression=lz4 \
  -O atime=off \
  -O xattr=sa \
  tank mirror \
  /dev/disk/by-id/wwn-0x5001112223a1122a \
  /dev/disk/by-id/wwn-0x5001112223a3344z

Verify pool#

zpool status
zpool list
zpool get ashift,autotrim tank
zfs list
zfs get compression,atime,xattr tank

Initial scrub#

A scrub reads data, verifies checksums, and repairs data if needed.

zpool scrub tank
# verify
zpool status -v | grep scrub
  scan: scrub repaired 0B in 00:00:01 with 0 errors on Sat Mar 28 17:35:02 2026

Check TRIM works#

Do initial check that TRIM will work:

zpool trim tank

Verify trim running:

zpool status -v | grep trim
            wwn-0x5001112223a1122a  ONLINE       0     0     0  (trimming)
            wwn-0x5001112223a3344z  ONLINE       0     0     0  (trimming)

Confirm TRIM is actually happening on the disks themselves; we check the DISC-GRAN column in lsblk output:

  • shows 512B <– TRIM supported and working
  • shows 0B <– oops, this USB enclosure does not support it!
# lsblk -D | egrep "sdb|sdc"
sdb                0      512B       4G         0
├─sdb1             0      512B       4G         0
└─sdb9             0      512B       4G         0
sdc                0      512B       4G         0
├─sdc1             0      512B       4G         0
└─sdc9             0      512B       4G         0

Setup monthly scrubbing#

Crontab style:

crontab -e
0 4 1-7 * 2 /sbin/zpool scrub tank

systemd style:

# nano /etc/systemd/system/zfs-scrub@.service
[Unit]
Description=ZFS scrub on pool %i

[Service]
Type=oneshot
ExecStart=/sbin/zpool scrub %i

# nano /etc/systemd/system/zfs-scrub@tank.timer
[Unit]
Description=Monthly ZFS scrub for tank

[Timer]
OnCalendar=Tue *-*-01..07 04:00
Persistent=true

[Install]
WantedBy=timers.target

Enable now:

systemctl daemon-reexec
systemctl daemon-reload
systemctl enable --now zfs-scrub@tank.timer

Setup better logging#

# nano /usr/local/sbin/zfs-scrub-runner
#!/bin/bash
set -e

POOL="$1"
TAG="zfs-scrub"
LOGFILE="/var/log/zfs-events.log"

TS="$(date '+%Y-%m-%d %H:%M:%S %z')"
HOST="$(hostname -f 2>/dev/null || hostname)"

log() {
    LINE="$TS host=$HOST pool=$POOL msg=\"$1\""
    logger -t "$TAG" -- "$LINE"
    echo "$LINE" >> "$LOGFILE"
}

log "scrub started"

if /sbin/zpool scrub "$POOL"; then
    log "scrub triggered successfully"
else
    log "scrub failed to start"
    exit 1
fi

Make executable:

chmod +x /usr/local/sbin/zfs-scrub-runner

Create log and set owner and permissions:

touch /var/log/zfs-events.log
chown root:adm /var/log/zfs-events.log
chmod 0644 /var/log/zfs-events.log

Update cron / systemd as required; restart service if systemd:

# 0 4 1-7 * 2 /usr/local/sbin/zfs-scrub-runner tank
# or
# ExecStart=/usr/local/sbin/zfs-scrub-runner %i

Daemonise logging of ZFS events#

Create /etc/systemd/system/zfs-events.service:

[Unit]
Description=ZFS event logger
After=zfs.target

[Service]
Type=simple
ExecStart=/bin/bash -c 'exec zpool events -f | while read -r line; do logger -t zfs-events -- "$line"; done'
Restart=always
RestartSec=2
# Optional hardening
Nice=10
IOSchedulingClass=best-effort

[Install]
WantedBy=multi-user.target

Enable and start:

systemctl daemon-reload
systemctl enable --now zfs-events.service

Create ZFS datasets#

zpool automatically creates mount points, eg: if your pool is called “tank” it will automatically have been mounted at /tank.

We won’t use /tank root, we want to create specific datasets under it:

zfs create tank/data
zfs create tank/backups

I also want specific mounts under /srv:

zfs set mountpoint=/srv/data tank/data
zfs set mountpoint=/srv/backups tank/backups

I want to be able to hit these shares over both Sambda and NFS. Create a group ‘storage’, and add that group to my local user:

groupadd storage
usermod -aG storage <youruser>

Permissions on the /srv/s:

chown -R root:storage /srv/data /srv/backups
chmod -R 2775 /srv/data /srv/backups

Set ZFS POSIX ACLs to ensure NFS/SMB permissions are consistent:

zfs set acltype=posixacl tank/data
zfs set acltype=posixacl tank/backups
zfs set aclinherit=passthrough tank/data
zfs set aclmode=passthrough tank/data
zfs set aclinherit=passthrough tank/backups
zfs set aclmode=passthrough tank/backups

NFS server#

Install if not already present:

apt update
apt install nfs-kernel-server -y

Export the shares, first modify the config:

# nano /etc/exports
/srv/data     192.168.1.0/24(rw,sync,no_subtree_check)
/srv/backups  192.168.1.0/24(rw,sync,no_subtree_check)

Export them and restart the nfs server:

exportfs -ra
systemctl restart nfs-kernel-server

Permissions / accessing shares from client side#

  • Samba shares take user+password; straightforward and intuitive.
  • NFS operates differently:

NFS v2/v3 shares we created above don’t use user+password; instead:

  • if client UID or GID matches, rw access is granted
  • UID may vary if your machines have multiple users
  • GID matching is easier to make consistent

If you want to create a “storage” group on server and all clients:

# do (carefully!):
sudo groupadd -g 1001 storage   <--- where 1001 matches the GID on the NFS server

If you have GID misalignment and want to set the GID, you can do:

sudo groupmod -g 1001 storage   <--- if 'storage' already exists on clients but you need to change GID

If you change the GID on server, then you need to update file permissions :-)

# update permissions if you have changed GID, note the variable <OLD_GID>:
sudo find / -group <OLD_GID> -exec chgrp -h storage {} \;

Finally, ensure you client user is in the storage group:

sudo usermod -aG storage youruser

NFS client#

Test you can mount the shares:

sudo apt install nfs-common
sudo mkdir -p /mnt/data
sudo mkdir -p /mnt/backups
sudo mount -t nfs <SERVER IP or hostname>:/srv/data /mnt/data
sudo mount -t nfs <SERVER IP or hostname>:/srv/backups /mnt/backups
mount | grep nfs
ls -l /mnt/data
touch /mnt/data/testfile
ls -l /mnt/data/testfile <--- should be <youruser> storage for user and group

Make persistent across reboots:

_netdev means to wait for network before trying to mount#

# sudo nano /etc/fstab
<SERVER IP or hostname>:/srv/data     /mnt/data     nfs     defaults,_netdev  0  0
<SERVER IP or hostname>:/srv/backups  /mnt/backups  nfs     defaults,_netdev  0  0

To mount all in /etc/fstab:

sudo mount -a

SMB/Samba server#

Install if not already present:

apt update
apt install samba -y

Export the shares, first modify the config:

# nano /etc/samba/smb.conf
[data]
   path = /srv/data
   browseable = yes
   read only = no
   guest ok = no
   valid users = @storage
   force group = storage
   create mask = 0664
   directory mask = 2775

[backups]
   path = /srv/backups
   browseable = yes
   read only = no
   guest ok = no
   valid users = @storage
   force group = storage
   create mask = 0664
   directory mask = 2775
   
# NB: you also need to edit the [global] section
# particularly to bind the daemon to your LAN interface
# uncomment the line and add LAN subnet / LAN interface, eg: ens3

Create smb user:

smbpasswd -a youruser <enter>
New SMB password: <type one, then enter>
Retype new SMB password: <retype, then enter>

Restart smbd:

systemctl restart smbd

SMB client#

GUI:

  • My Computer > right click > Map network drive
\\<SERVER IP or hostname>\data
\\<SEREVR IP or hostname\backups
  • Use your and
  • Tick boxes for save credentials and reconnect at logon

CLI:

net use Z: \\server\data /user:USERNAME /persistent:yes
net use Y: \\server\backups /user:USERNAME /persistent:yes

ZFS snapshots#

A main feature of ZFS is ability to create snapshots/“Previous versions”/shadow copies. There aren’t n copies of every file; snapshotting only consumes space for changed files between snapshots.

Install tool:

apt install zfs-auto-snapshot

Edit config to match the policy you want:

# note // hits all pools; if you want granularity you can tinker in here 
# eg: if you just want this 'tank' and not other pools, replace '//' with 'tank'
#
nano /etc/cron.d/zfs-auto-snapshot

# I am going with this policy:
0 * * * * root /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=hourly --keep=24 //
0 0 * * * root /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=daily --keep=7 //
0 0 * * 0 root /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=weekly --keep=4 //
0 0 1 * * root /usr/sbin/zfs-auto-snapshot --quiet --syslog --label=monthly --keep=12 //

Verify:

# zfs list -t snapshot
NAME                                                  USED  AVAIL  REFER  MOUNTPOINT
tank@zfs-auto-snap_frequent-2026-03-28-1845             0B      -    96K  -
tank/backups@zfs-auto-snap_frequent-2026-03-28-1845     0B      -    96K  -
tank/data@zfs-auto-snap_frequent-2026-03-28-1845        0B      -    96K  -

Not much time passed nor any data changed, so create a text file from a client, eg:

/mnt/data$ echo "this is the first version" > test.txt

Fire “daily” and “monthly” manually on server rather than wait on timers:

/usr/sbin/zfs-auto-snapshot --label=daily --keep=7 //
/usr/sbin/zfs-auto-snapshot --label=monthly --keep=12 //

Verify again:

# zfs list -t snapshot
NAME                                                  USED  AVAIL  REFER  MOUNTPOINT
tank@zfs-auto-snap_frequent-2026-03-28-1845             0B      -    96K  -
tank@zfs-auto-snap_daily-2026-03-28-1850                0B      -    96K  -
tank@zfs-auto-snap_monthly-2026-03-28-1850              0B      -    96K  -
tank/backups@zfs-auto-snap_frequent-2026-03-28-1845     0B      -    96K  -
tank/backups@zfs-auto-snap_daily-2026-03-28-1850        0B      -    96K  -
tank/backups@zfs-auto-snap_monthly-2026-03-28-1850      0B      -    96K  -
tank/data@zfs-auto-snap_frequent-2026-03-28-1845       56K      -    96K  -  <<<<<<<< test.txt file created
tank/data@zfs-auto-snap_daily-2026-03-28-1850           0B      -   104K  -
tank/data@zfs-auto-snap_monthly-2026-03-28-1850         0B      -   104K  -

Now introduce a delta in the text file:

# edit test text file so next 'frequent' snapshot captures delta
/mnt/data$ echo "THIS IS A DELTA" >> test.txt

Make snapshots / previous versions visibile for clients:

zfs set snapdir=visible tank/data
zfs set snapdir=visible tank/backups

For SMB clients we need to edit /etc/smb.conf:

# add this under each share; NB shadow:basedir differs for each of 'data' and 'backups'
   vfs objects = shadow_copy2 acl_xattr
   map acl inherit = yes
   store dos attributes = yes

   shadow:basedir = /srv/data
   shadow:snapdir = .zfs/snapshot
   shadow:localtime = yes
   shadow:sort = desc
   shadow:fixinodes = yes
# use .* if you want to allow snapshots other than zfs-auto-snap generated and named
#   shadow:snapprefix = .*
   shadow:snapprefix = ^zfs-auto-snap_\(frequent\)\{0,1\}\(hourly\)\{0,1\}\(daily\)\{0,1\}\(monthly\)\{0,1\}
   shadow:delimiter = -20
   shadow:format = -%Y-%m-%d-%H%M

Restart smbd:

systemctl restart smbd

Verify with smbclient running locally, first:

# smbclient //localhost/data -U niall -c "allinfo test.txt"
Password for [WORKGROUP\user]:
Try "help" to get a list of possible commands.
smb: \> allinfo test.txt
altname: test.txt
create_time:    Sat Mar 28 06:48:39 PM 2026 GMT
access_time:    Sat Mar 28 06:48:39 PM 2026 GMT
write_time:     Sat Mar 28 07:05:24 PM 2026 GMT
change_time:    Sat Mar 28 07:05:24 PM 2026 GMT
<snapshots here>

Verify on disk:

# for f in /srv/data/.zfs/snapshot/*/pvtest.txt; do echo "== $f =="; cat "$f"; done
== /srv/data/.zfs/snapshot/zfs-auto-snap_daily-2026-03-28-19-42/pvtest.txt ==
v2
== /srv/data/.zfs/snapshot/zfs-auto-snap_frequent-2026-03-28-20-50/pvtest.txt ==
v5
== /srv/data/.zfs/snapshot/zfs-auto-snap_hourly-2026-03-28-19-42/pvtest.txt ==
v2
== /srv/data/.zfs/snapshot/zfs-auto-snap_hourly-2026-03-28-19-45/pvtest.txt ==
v3
== /srv/data/.zfs/snapshot/zfs-auto-snap_hourly-2026-03-28-20-13/pvtest.txt ==
v4

Windows SMB clients should see snapshots under “Previous Versions” tab of file and folder properties.

Telegram alerts#

Create the Bot#

First we create a bot by sending a Telegram message to user BotFather:

/newbot

Save the bot name and token somewhere safe (your new ZFS?)

bot name: <itsname>
bot token: <secrettoken>

Use the t.me/ link to start a chat; send a message to initiate and generate chat ID:

foo

Get the chat ID:

curl -s https://api.telegram.org/bot<YOUR_TOKEN>/getUpdates

Look for:

"chat":{"id":123456789}

Create alert script:

# nano /usr/local/bin/telegram-alert
#!/bin/bash

TOKEN="YOUR_BOT_TOKEN"
CHAT_ID="YOUR_CHAT_ID"

HOST=$(hostname)
MSG="$*"

timeout 20 \
        curl \
        --connect-timeout 5 \
        --max-time 15 \
        -s -X POST "https://api.telegram.org/bot${TOKEN}/sendMessage" \
        -d chat_id="${CHAT_ID}" \
        -d text="[$HOST] $MSG" \
        -d disable_web_page_preview=true \
        > /dev/null

Make executable:

chmod +x /usr/local/bin/telegram-alert

Test it:

/usr/local/bin/telegram-alert "test message"

You should get a message on Telegram from your bot.

Hook into ZFS-ZED#

  • If you look in /etc/zfs/zed.d/zed.rc, note that it has variables for email and for Slack webhooks; you can use those too.
  • I will create a separate script for Telegram:
# nano /etc/zfs/zed.d/all-telegram.sh  <--- you must use ZED cmdlets naming convention
#     the 'all-' prefix catches all events; then inside the script you can filter
#     calling it "telegram.sh" will break; NB manpages :)

#!/bin/bash
set -eu

# Only alert on the classes you care about; remove this case block to alert on everything.
case "${ZEVENT_CLASS:-}" in
  *statechange*|*config_sync*|*io*|*fault*|*vdev*|*resilver*|*scrub*)
    ;;
  *)
    exit 0
    ;;
esac

/usr/local/bin/telegram-alert \
  "ZFS event on $(hostname)
class=${ZEVENT_CLASS:-unknown}
subclass=${ZEVENT_SUBCLASS:-unknown}
pool=${ZEVENT_POOL:-unknown}
vdev_path=${ZEVENT_VDEV_PATH:-unknown}
vdev_guid=${ZEVENT_VDEV_GUID:-unknown}
time=${ZEVENT_TIME_STRING:-unknown}"

Fix permissions:

chown root:root /etc/zfs/zed.d/all-telegram.sh
chmod 755 /etc/zfs/zed.d/all-telegram.sh

Restart zed:

systemctl restart zed

Test the script by offlining a disk.

# get the WWN of one disk:
zpool status

# offline it
zpool offline tank <wwn>

You should have a Telegram message like:

[hostname] ZFS event on ${hostname}
<snip>

Now adjust the script if you wish to dial down verbosity:

# currently:   *statechange*|*config_sync*|*io*|*fault*|*vdev*|*resilver*|*scrub*)
# suggested - remove *config_sync* which will be relatively noisy housekeeping
#
# if you want to ignore scrub starts and finishes with 0 errors
# add this after the case block so script exits early;
# else it proceeds to send alerts
if [[ "${ZEVENT_CLASS:-}" == *scrub_start* ]]; then
        exit 0
fi

if [[ "${ZEVENT_CLASS:-}" == *scrub_finish* ]]; then
    if zpool status | grep -q "0 errors"; then
        exit 0
    fi
fi

Hook into ZFS health check (zfs-health-check)#

We will create a monitoring script for capacity utilisation and fragmentation state:

# nano /usr/local/bin/zfs-health-check
#!/bin/bash

POOL="tank"
HOST=$(hostname)

CAP=$(zpool list -H -o capacity $POOL | tr -d '%')
FRAG=$(zpool list -H -o fragmentation $POOL | tr -d '%')

STATE_FILE="/tmp/zfs-health-last"
CURRENT="$CAP-$FRAG"
# if this run = last run, exit early
if [ -f "$STATE_FILE" ] && grep -q "$CURRENT" "$STATE_FILE"; then
    exit 0
fi
# else update state and proceed to do checks
echo "$CURRENT" > "$STATE_FILE"

MSG=""

# Capacity checks
if (( CAP >= 85 )); then
    MSG+="🔴 CRIT: Pool $POOL usage ${CAP}%
"
elif (( CAP >= 75 )); then
    MSG+="🟡 WARN: Pool $POOL usage ${CAP}%
"
fi

# Fragmentation checks
if (( FRAG >= 45 )); then
    MSG+="🔴 CRIT: Fragmentation ${FRAG}%
"
elif (( FRAG >= 30 )); then
    MSG+="🟡 WARN: Fragmentation ${FRAG}%
"
fi

# Send alert if needed
if [ -n "$MSG" ]; then
    /usr/local/bin/telegram-alert \
"$HOST \
$MSG"
fi

Make executable:

chmod +x /usr/local/bin/zfs-health-check

Daemonise it with systemd:

# service; nano /etc/systemd/system/zfs-health-check.service
[Unit]
Description=ZFS Health Check

[Service]
Type=oneshot
ExecStart=/usr/local/bin/zfs-health-check

# timer; nano /etc/systemd/system/zfs-health-check.timer
[Unit]
Description=Run ZFS health check every hour

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target

Enable and start:

systemctl daemon-reload
systemctl enable --now zfs-health-check.timer

Testing the script is trickier; as workaround you can edit the script:

# to test it - you can set CRIT = capacity >= 0 and fragmentation >= 0
# then run it, but change it back to desired thresholds after successful test:
/usr/local/bin/zfs-health-check

Hook smartd into Telegram#

This one is easy; if smartd.conf has -M exec /usr/share/smartmontools/smartd-runner then it runs all scripts in /etc/smartmontools/run.d.

So if you have already:

-rwxr-xr-x 1 root root 231 Oct 10  2019 10mail
-rwxr-xr-x 1 root root 604 Mar 28 16:02 20logging

We just add a new file, 30telegram:

# nano /etc/smartmontools/run.d/30telegram
#!/bin/bash
# smartd-runner passes a temp file path as $1
if [ -f "$1" ]; then
    MSG=$(cat "$1")
else
    MSG="$*"
fi

exec /usr/local/bin/telegram-alert "[SMART] $MSG"

Fix permissions:

chmod +x 30telegram
chown root:root 30telegram

Restart smartd:

systemctl restart smartd

Test that smartd-runner is calling it; noting that the runner calls all scripts in the directory; in our case, mail, logging, and finally telegram. If any script is failing you should see the output from it, eg: if you haven’t setup mail, 10mail will exit 1. To prevent that error - if you won’t use 10mail, remove chmod -x it.

If we test like:

echo "smartd test message" | /usr/share/smartmontools/smartd-runner

Then on telegram we should receive a message like:

[hostname] [SMART] smartd test message

Outro#

Hopefully you managed to get ZFS, snapshots, NFS, SMB, smartd, ZED, zfs-health-check, and Telegram all singing and dancing.

Well, from here I would say well done and enjoy; here is what this post did not cover (yet):

  • Alert scripts can all do with:
    • More robust error handling
    • Cleaner formatting
    • Verbosity tweaks / extra outputs / extra filtering
    • Move to asynchronous non-blocking
    • Handle retries in case of WAN failures
  • Post can do with:
    • Images / screenshots from eg: Windows client