Getting PCI Passthrough working on G6 HP Proliant Servers with Proxmox


This is a quick stub post to show how I finally got PCI pass-through working on my HP DL360 G6 server, since it would show an error such as

DMAR-IR: This system BIOS has enabled interrupt remapping interrupt remapping is being disabled.

If you actually tried to launch a machine with a PCI device attached, it would say:

Device is ineligible for IOMMU domain attach due to platform RMRR requirement.  Contact your platform vendor.

From what I can tell this is because the system uses RMRR to control fan speeds, and having these bits set causes the driver to disable IOMMU. There is a simple change you can make to the Proxmox kernel source code to get passed this error and use PCI devices with VMs without issue.

sed 's/device_is_rmrr_locked(dev)/false/' ./$UBUNTU_NAME/drivers/iommu/intel-iommu.c

After making that change and recompiling the kernel, it works perfectly fine, but the system will still say DMAR-IR interrupts have been disabled, however it will work fine.

I am indebted to this Reddit post and the script that they put together that was created from their research, which let me get my DL360 G6 working. They have a DL380 but I am pleased it works on my system. I don’t have a Reddit account, but show the original author some love if you do!

I’ve quoted their script below just in-case the original pastebin link in their Reddit post goes down.

#Proxmox 6.2 over Ubuntu Focal Fossa patch for IOMMU.
#this can be done with Proxmox vanilla, just run the script!
#based heavily on
#move to working dir
cd "${0%/*}"
#Proxmox 4.4 kernel modding guide dependencies (see the above link)
#apt-get install git screen fakeroot build-essential devscripts libncurses5 libncurses5-dev libssl-dev bc flex bison libelf-dev libaudit-dev libgtk2.0-dev libperl-dev libperl-dev asciidoc xmlto gnupg gnupg2
#Debian kernel build dependencies
#apt-get install build-essential linux-source bc kmod cpio flex cpio libncurses5-dev
#zfsonlinux has hidden dependencies
#apt-get install dh-python python3-cffi python3-setuptools python3-sphinx python3-all-dev
#even more dependencies
#apt-get install libdw-dev libiberty-dev libnuma-dev libslang2-dev lz4
#full list of dependencies, though it may contain duplicates
apt-get install git screen fakeroot build-essential devscripts libncurses5 libncurses5-dev libssl-dev bc flex bison libelf-dev libaudit-dev libgtk2.0-dev libperl-dev libperl-dev asciidoc xmlto gnupg gnupg2 build-essential linux-source bc kmod cpio flex cpio libncurses5-dev dh-python python3-cffi python3-setuptools python3-sphinx python3-all-dev libdw-dev libiberty-dev libnuma-dev libslang2-dev lz4
#rm -r ./tmp/proliant-iommu-patch/
#TODO Only if the user says so on finishing compilation - Ubuntu-focal file is weird so we have to remove
#rm -r ./pve-kernel
#mkdir ./tmp/proliant-iommu-patch
#cd ./tmp/proliant-iommu-patch
git clone git://
#mkdir ./pve-kernel
cd ./pve-kernel
#use a proxmox makefile bug to only get the submodules. THIS iS A BAD IDEA
make -j 2
cd ./submodules
#saves the hassle of Proxmox using git on its own
#git clone git://
#TODO UBUNTU_NAME= "$(ls | grep -m 1 ubuntu)" > used for compatibility
UBUNTU_NAME="$(ls | grep -m 1 ubuntu)"
#if [ ! -d ./$UBUNTU_NAME/drivers ]
#    #git clone git://
#    #git clone git://$UBUNTU_NAME-kernel
#    git clone
#    echo ""
#    #git clone $REPOSRC $LOCALREPO
#    echo $UBUNTU_NAME + " is already cloned, skipping!"
#    #cd $LOCALREPO
#    #git pull $REPOSRC
#mv -T ./mirror_$UBUNTU_NAME-kernel ./$UBUNTU_NAME
sed 's/device_is_rmrr_locked(dev)/false/' ./$UBUNTU_NAME/drivers/iommu/intel-iommu.c
#cat ./iommu/intel-iommu.c | grep 'device_is_rmrr_locked'
#named according to proxmox tutorial :)
#diff -u ./iommu/intel-iommu.c ./ubuntu-focal/drivers/iommu/intel-iommu.c > ./remove_mbrr_check.patch
#move back to working dir - TODO this
#cd "${0%/*}"
cd ../
echo "building"
find ./ -name '*.deb' -exec cp -prv '{}' '../' ';'
#optional TODO
##dpkg -i *.deb
cd ../
echo "
#find ./ -name '*.deb' -exec dpkg -i '{}' ';'
dpkg -i *.deb
echo \"=======================================================================\"
echo \"Please reboot your system. Tell u/oezingle if your machine doesn't boot\"
echo \"=======================================================================\"
" > ./
chmod 0777 ./
rm -R ./pve-kernel
echo ""
echo ""
echo "===================================================================="
echo "Install all .deb files and update-grub, or run"
echo "===================================================================="

New Extensions for Email Blocking


I’ve since added a new amount of file extensions that I would recommend that people running mail-servers also block.

Originally I only blocked a few attachments.


However now I’ve added a few more based on suggestions from various sources including extensions that Microsoft recommends to block for users of their Exchange server.


It should be relatively easy to copy the above into a regular expression suitable for your mail environment. If you think there’s a way I can optimise this list, please let me know 🙂

Note, this list used to contain the following, but I managed to optimise the expressions and remove duplicates, so the original is as follows (for reference)


HiFiPi – A High Resolution Audio Player


This is a quick post to showcase our HiFiPi. I’ll try and update it in the future, but knowing me I’ll probably forget.

The software I used is called Volumio and it’s available here. They have some additional features available as a subscription service which you might like, but the core software is open source and free. They also sell pre-made high quality devices if you’re not into DIY 🙂

I seem to remember having a couple of issues getting it configured the way I wanted it, but I didn’t document any of it, so that will have to wait until I can figure it out. Overall nothing too taxing though.

Transgender Day Of Remembrance – TDOR 2020

Tealight focused in front with out of focus candles in the background

It seems 2020 hasn’t been a good year for many people and again we’ve lost a lot of people we care about. None more so than our friends in the transgender community. A community of people who often face disadvantage. Normally I’d write grandiose passages to speak of hope but this year I think many of us are at our operational limits.

It’s been 6 months since a friend of a friend decided that they’d had enough, there was nothing left for them in this world and they took their own life. That moment many of us were frantically trying (but ultimately fruitlessly) to save a life and for a time it showed a sense of cohesion amongst people and friendships seemed to form.

That feeling of community and cohesion faded quickly into obscurity after we made our memorial broadcast as we were told not to talk about it any more, despite having to be quiet for a week afterwards due to instructions from the police. Therefore when we could talk about the events free from any legal implications, we were restricted from doing so and we couldn’t decompress or process what had happened ourselves and it makes me feel like the words echoed on our memorial page are hollow. “You have friends and you are loved.

I don’t feel loved by many either, so I can relate. I am truly sorry that our world is a vapid, emotionally vacant, uncaring place with a disgusting excuse for compassion.

While we may not have really known each other, we ended up embroiled in the entire situation (and very emotionally invested) as all we wanted was to try and help. We didn’t want any personal gain or kudos from helping. One of the reasons we helped (besides genuine human decency) is because it could very easily have been one of us and I like to think that people I didn’t know would want to help too.

I light candles on this day to guide wayward spirits home. I hope you all find your way.

Edited on 22/11/2020Paragraph added, sentence order adjusted slightly, and some words added to improve readability.

Review of PrettyLinks WordPress Plugin.


Since I appear to have angered the all-knowing spam detector, they wouldn’t allow me to post my review, so I thought I’d post it on my own site.

I really want to like this plugin, but the nag screen is incredibly annoying

The features I need are well covered by the free / lite version of this plugin, however it regularly puts up a window asking me to update to the pro version, only giving me the option of “Yes” or “Maybe Later”, there isn’t an option to say “No thanks, I’m happy with the free version, I have no need of the enhanced functionality.”

It adds lots of links that will redirect you from your admin panel to the developer’s page to buy it, so it is incredibly easy to click the wrong button and be redirected to a site you didn’t intend to. I understand they need to show the pages off to people that might have use of the advanced features, but there should be an option to hide all the none free stuff I don’t need.

Since the code is GPL I could go through and maintain a fork of the plugin with all the stuff removed, but honestly I don’t have the time so will probably try another plugin.

I looked at buying the pro version, but it’s not a one time payment, it would be a yearly subscription, at the lowest price is £50/year, which is just not economical for my hobby site for myself and friends. I’m starting to like pro plugins that are one time payments with a certain amount of timed support available, if you need more support after that pay for it, and you get upgrades for that major version number. Your requirements may differ.

Grandstream GXP1610 Reboot-o-matic


I wrote a nice little script to fix a problem I’ve been having with my work VOIP phone. It would lose connection but the screen wouldn’t let you know it had. I didn’t notice that it hadn’t been connected for *two* weeks until someone left a voicemail.

The phone did have a built in SSH interface, which had a reboot command, so I tried using ssh -i key admin@phone < commands.txt to feed it a bunch of commands. I had to pipe a text command to ssh because it doesn’t use a ‘real’ shell on the phone, just Grandstream’s proprietary command interface, which doesn’t accept commands directly when called from ssh.

This was okay, it restarted the phone but it would also reset the password back to ‘admin’. Not very secure really… So I looked for a way to reboot it using the web interface. I fired up Firefox’s network debugger and started to reverse engineer how commands were processed.

I worked out what I needed to do to login as an admin, and then issue the reboot command. It worked. Nice.

sid=$(curl -k -s -c /tmp/cookies.txt -d"password=hunter2" --referer | sed -r 's|.*"sid": "([0-9a-z]+)".*|\1|' )
curl -k -s -b /tmp/cookies.txt -d"request=REBOOT&amp;sid=${sid}" --referer
rm /tmp/cookies.txt

Really though, I wanted a better solution, sure I could reboot the phone every day to make sure it’s working but what would be awesome would be if my script could check to see if the phone was connected to my SIP account, and if it wasn’t, or there was some kind of error, it could reboot it or at least tell me there was an error.

So I wrote version 1 of my script and got it working, when the SIP connection isn’t registered, it will restart the phone.

# Grandstream GXP1610 Reboot-o-matic v1
# Authored on 12/10/2020
# by jcx
# Licence: GPLv3 (or at your option, any later version.)
# Usage: gsreboot [IP/Hostname]
sid=$(curl -k -s -c /tmp/cookies.txt -d"password=hunter2" https://${1}/cgi-bin/dologin --referer https://${1} | sed -r 's|.*"sid": "([0-9a-z]+)".*|\1|' )
status=$(curl -k -s -b /tmp/cookies.txt -d"request=vendor_fullname:P35:PAccountRegisteredServer1:PAccountRegistered1" https://${1}/cgi-bin/api.values.get --referer https://${1} | sed -r 's|.*"PAccountRegistered1": "([0-9a-z]+)".*|\1|' )
if [ ${status} != 1 ]
echo "Requesting reboot on ${1} ..."
curl -k -s -b /tmp/cookies.txt -d"request=REBOOT&amp;sid=${sid}" https://${1}/cgi-bin/api-sys_operation --referer https://${1}
rm /tmp/cookies.txt

Now I just need some way to automate it, which is where cron comes in. Cron will run a command however often you like, so I just set it to every 5 minutes to do a check, and now I won’t miss any more important work phone calls.

*/5 * * * * /usr/local/bin/

Okay, so the first version of the script I wrote, while it works, it isn’t very elegant. It didn’t really report any error messages and wasn’t user-configurable so I’ve rewritten it (v2!) and now it supports some options, and has more sensible error messages.

# Grandstream GXP1610 Reboot-o-matic v2
# Authored on 12/10/2020
# by jcx
# Licence: GPLv3 (or at your option, any later version)
# Please edit the password below to be the admin account on your GXP1610.
# I defined it here so that you don't need to use it on the command line.
# You shouldn't need to make any other changes.
#### Begin Script
if [ -z ${1} ]
    echo "----------------"
    echo "gsreboot2: GXP1610 Reboot-o-matic v2 by jcx ( licenced under GPLv3"
    echo "Usage: [IP/Hostname] [Protocol: http/https] [Ignore Certificate Errors: Y/N]"
    echo "Example: https Y"
    echo "----------------"
    echo "This will connect to the Grandstream phone on"
    echo "using https and will ignore any certificate errors."
    echo "Use with cron every 5 minutes, as it takes the phone about 3 minutes to boot."
    echo "Don't forget to change the password at the top of the script!"
if [ -f "/tmp/gsreboot2.txt" ]
    rm "/tmp/gsreboot2.txt"
if [ -z ${2} ]
if [ "${2}" = "https" ]
if [ "${2}" = "http" ]
if [ -z ${3} ]
if [ "${3}" = "Y" ]
    certignore="-k "
if [ "${3}" = "N" ]
sid=$(curl ${certignore}-s --connect-timeout 10 -c /tmp/gsreboot2.txt -d"password=${password}" ${proto}://${1}/cgi-bin/dologin --referer ${proto}://${1} | sed -r 's|.*"sid": "([0-9a-z]+)".*|\1|' )
status=$(curl ${certignore}-s --connect-timeout 10 -b /tmp/gsreboot2.txt -d"request=vendor_fullname:P35:PAccountRegisteredServer1:PAccountRegistered1" ${proto}://${1}/cgi-bin/api.values.get --referer ${proto}://${1} | sed -r 's|.*"PAccountRegistered1": "([0-9a-z]+)".*|\1|' )
if [ "${status}" = "0" ]
    echo "VOIP account not registered..."
    echo "Requesting reboot on ${1} ..."
    request=$(curl ${certignore}-s --connect-timeout 10 -b /tmp/gsreboot2.txt -d"request=REBOOT&amp;sid=${sid}" ${proto}://${1}/cgi-bin/api-sys_operation --referer ${proto}://${1} | sed -r 's|.*"body": "([0-9a-z]+)".*|\1|' )
    if [ "${request}" = "savereboot" ]
        echo "Reboot request has been acknowledged."
    if [ -f "/tmp/gsreboot2.txt" ]
        rm "/tmp/gsreboot2.txt"
if [ "${status}" != "1" ]
    echo "Error: Cannot determine status of VOIP account."
# Enable this code if you want output on success... disabled by default because it works with cron.
#if [ "${status}" = "1" ]
#    then
#    echo "Success! Your VOIP account is active. No reboot required."
if [ -f "/tmp/gsreboot2.txt" ]
    rm "/tmp/gsreboot2.txt"

Its grown from 18 lines of code to 109(!). This isn’t bad, considering before I wrote these scripts, I’d never written anything in shell / bash script before. So I replaced the entry in my crontab to run the new script every 5 minutes.

*/5 * * * * /usr/local/bin/ https Y

Below is what it looks like in my email client on my Linux box when the SIP account is not registered and it needs a reboot.

Date: Mon, 12 Oct 2020 00:10:07 +0100
From: Cron Daemon <root@localhost>
To: jcx@localhost
Subject: Cron <jcx@localhost> /usr/local/bin/ https Y
VOIP account not registered...
Requesting reboot on ...
Reboot request has been acknowledged.

If the script encounters an error, it will email an error response. This looks like the following:

Date: Mon, 12 Oct 2020 00:35:07 +0100
From: Cron Daemon <root@localhost>
To: jcx@localhost
Subject: Cron <jcx@localhost> /usr/local/bin/ https Y
Error: Cannot determine status of VOIP account.

I hope this proves useful to you, it certainly has to me. Not only because my phone will always be connected but I also learnt how to do some basic shell scripting. Have a great day!

Setting Up Auto Mounting Encrypted Raid Disks


This is a little guide (currently under construction) for how I handle encrypted disks on Linux. This won’t be the ultimate ‘tin foil hat’ guide, as the attack vector this is intended to protect from is physical theft of the hardware, so that the data can’t be accessed from elsewhere. It obviously will not handle a targeted hacking attempt or the $5 wrench method, but I believe it gives security and convenience to a level appropriate for me.

xkcd 538: describing the $5 wrench method of breaking security.

The reason this started is because my physical health is deteriorating and getting up to enter a password at the console on every reboot is tiresome. Therefore I came up with a new way of handling encrypted drives to not only increase security but also make things a bit more convenient.

Of course before following any of these instructions, you should be aware of my standard disclaimer.

Caution – You need to secure the location of where you store your key. If you fail to secure your key with an appropriate mechanism, this entire exercise is fruitless.

Examples include: IP restricting access to your key provisioning system, using a strong username and password, using an easy to revoke token based storage mechanism, verifying HTTPs transfer certificates and countless others.

Included below is a method similar to what I use to secure where I store my keys.

Create a keyfile

dd bs=256 count=1 if=/dev/random | base64 > data-keyfile

Upload the keyfile somewhere, for example a HTTPS server with a valid certificate, or S3 or Azure key storage, and then make a script to download the key from where you put it. If you’re storing your key on a HTTPS server, here is an example htaccess file to secure access to the directory to specific IPs and a user/password section to further increase security. This works with Apache 2.4 but the syntax may be different for later versions.

order deny,allow
deny from all
allow from

Options -Indexes
AuthType Basic
AuthName "Restricted Access"
AuthUserFile "/secure/path/to/htpasswd"
Require valid-user

Once you have uploaded it somewhere don’t forget to delete the original source file securely from your system (for example with shred).

set -e
# Request the file from somewhere, maybe blob storage, asure, S3 or HTTPS Server, then pipe it through `base64 -d` to decode it from base64
curl -s --basic --user username:password "" | base64 -d

Then move the script somewhere and give it the right permissions

# Ensure the owner of this file is "root"
chown root:root /etc/luks/
# Allow only the owner (root) to read and execute the script
chmod 0500 /etc/luks/

Create the raid

# if all drives are already blank and ready to be added. Replace drives as appropriate.
mdadm --create /dev/md2 -l 1 -n 2 /dev/sdc1 /dev/sdd1
# if you need to create a 'degraded' array with a drive missing.
mdadm --create /dev/md2 -l 1 -n 2 /dev/sdc1 missing

Then encrypt the array

# Encrypt the disk
# Replace md2 with the correct array!
/etc/luks/ | cryptsetup -d - -v luksFormat /dev/md2

# Open the encrypted volume, with the name "data"
# Replace md2 with the correct array!
/etc/luks/ | cryptsetup -d - -v luksOpen /dev/md2 data

# Create a filesystem on the encrypted volume
mkfs.ext4 -F /dev/mapper/data

# Close the encrypted volume
cryptsetup -v luksClose data

Find the encrypted partitions UUID

$ lsblk --fs
NAME    FSTYPE      LABEL           UUID
└─sdc1         linux_raid_mem server:1 a38cbabe-0f12-3643-f3232-998822c5d42
  └─md2        crypto_LUKS             a17db19d-5037-4cbb-b50b-c85e3e074864 

Then create a script to run on boot to automount

if [ -b "/dev/mapper/data" ]
		if [[ $(findmnt -M "/disks/data") ]]; then
    		echo "Not mounted but unlocked... trying to mount..."
	mount -t ext4 -o errors=remount-ro /dev/mapper/data /disks/data
		curl -s --basic --user username:password "" | base64 -d | /sbin/cryptsetup -d - -v luksOpen /dev/disk/by-uuid/a17db19d-5037-4cbb-b50b-c85e3e074864 data
		mount -t ext4 -o errors=remount-ro /dev/mapper/data /disks/data

if [[ $(findmnt -M "/disks/data") ]]; then
# Anything you want to run after the disks are mounted
		echo "All disks mounted, starting services..."
		echo "Starting samba..."
		systemctl start smbd

and add it to root’s crontab on reboot.

# m h  dom mon dow   command
@reboot sleep 30 && /etc/luks/

Don’t forget to disable any services you don’t want to run until the encrypted drives are mounted, for example samba

systemctl disable smbd

Create the mount point

mkdir /disks/data

And finally a script to stop encrypted drives (if required)

echo "Stopping Samba..."
systemctl stop smbd

if [[ $(findmnt -M  "/disks/data") ]]; then
    echo "/disks/data is mounted, trying to unmount..."
	umount /dev/mapper/data
    echo "Attempting to close luks on /dev/mapper/data ..."
	if [ -b /dev/mapper/data ]
		/sbin/cryptsetup -d - -v luksClose data
	if [ -b /dev/mapper/data ]
    	echo "/disks/data is not mounted, but is unlocked, will attempt to close ...."
	/sbin/cryptsetup -d - -v luksClose data
	echo "/disks/data is not unlocked or mounted, nothing to do."

This work was inspired by an article on by Alessandro Segala and adapted/changed to meet my requirements.

Debugging Windows 10 at Startup


It’s almost impossible to be able to hit F8 during Windows 10’s start up. The “official” way to get into the boot menu is to let Windows 10 start and get to the login screen, hold the shift key and click “Restart”. The problem with this is, what if you can’t get to the login screen?

Many times I’ve had a simple issue that could be fixed in Safe Mode or using the basic graphics mode available from the boot menu. I’ve found a method that makes this debugging easy and gives you plenty of time to be able to press F8 if you need to on boot, without taking too much time away from the actual boot. It’s a user configurable timeout too, so you can set it to what you want.

Of course before following any of these instructions, you should be aware of my standard disclaimer.

Firstly, enable the Legacy Bootloader, by opening an administrative command prompt.

bcdedit /set "{current}" bootmenupolicy legacy

This will enable the old style operating system selector from Windows 7. Next you set it to display the menu with the following command.

bcdedit /set {bootmgr} displaybootmenu yes

Finally you control how long the timeout is. The default 30 seconds is quite a long time to wait if you don’t press any key, so I use the timeout of 5 seconds, which gives me ample time if I need to get into the advanced boot options menu, but it doesn’t slow down the boot that much if I don’t.

bcdedit /set {bootmgr} timeout 5

That’s it! If you ever need to debug a simple start up issue, you don’t have to find your rescue CD, or reset during boot to launch “startup repair”. It’s saved my skin so many times already 🙂

Audio Terminal Bell (Software Bell) in Xubuntu with xfce-terminal


I have wanted a software audio based terminal bell in linux for years. Similar to in PuTTY on Windows you can chose any arbritary wav sound file as your terminal bell sound, I wanted this functionality on Linux, and I have wasted lots of time over the years trying to get this working. I haven’t had much luck… until today!

I was setting up a new Xubuntu 18.04LTS machine and was going through the preferences in xfce-terminal and noticed it had an option for “Audible Bell” in the advanced features menu. I turned it on and it didn’t work, but it prompted me to try and find a solution again.

Here’s the commands I used to get it working.

sudo apt-get install gnome-session-canberra sox
xfconf-query -c xsettings -p /Net/EnableEventSounds -s true
xfconf-query -c xsettings -p /Net/EnableInputFeedbackSounds -s true
xfconf-query -c xsettings -p /Net/SoundThemeName -s "freedesktop"

Then you need to add the following to the end of your .profile file in your home directory (~/.profile)


Then add the following to /etc/pulse/

# audible bell
load-sample-lazy x11-bell /usr/share/sounds/freedesktop/stereo/bell.oga
load-module module-x11-bell sample=x11-bell

Then restart pulseaudio with

pulseaudio -k

Make sure your “System Sounds” is turned up in the Volume Control applet and finally make sure the following appears in ~/.config/xfce4/terminal/terminalrc under [Configuration]


You can also set this under “Preferences/Advanced/Audible Bell”. You will probably need to logout and logon again, but other than that everything should work. You can change the sound to a .oga file of your choice by changing the path of the sound in the load-sample-lazy command above.

RAID Drive Replacement


On the 20th May, I noticed an email from mdadm (the Linux Raid Administrator) saying that a Degraded Array event was detected. It looked like two drives went down at the same time (SDC and SDD). Before I had done any diagnosis of the problem, I had ordered two replacement refurbished drives.

I went for refurbished because getting new ‘affordable’ drives that don’t use SMR technology (Shingled Magnetic Recording) is difficult. SMR allows more capacity in a smaller area, however they are a lot slower drives once you have filled the 25GB cache and in Network Attached Storage systems, they are not ideal. (Even WD Red NAS drives use SMR and don’t disclose that!)

So I went for some refurbished Seagate Barracuda 2TB drives. These were cheap and they used CMR 🙂

After a bit more diagnosing and a reboot, it looked like the SDC drive was okay but was just knocked offline because SDD corrupted the SATA bus. That made me feel a little safer, as I don’t like running systems with no margins for failure. I did a full set of diagnostics on SDC and reintroduced it into the array and it did a data check and came back online just fine.

I then had to wait a little while for my refurbished drives to arrive from Germany. They took a couple of days to arrive, which I didn’t think was too bad considering the world is kinda messed up right now.

Once the drives had arrived, I started doing my usual round of tests on new drives, to make sure they’ve survived shipping, make sure I’ve not been sold a lemon and also to make sure they’re going to give a decent level of service.

My testing involves using the SMART self test feature, recording those results, zeroing the drive, recording those results, then overwrite the drive 4 times with different patterns and compare that back. Once that’s done I record the results and compare again to make sure there’s no problems that testing has uncovered.

Next comes partitioning the drive. I just copied the partition layout of one of the existing disks and wrote the partition table to the disk. I then asked mdadm to add the new partitions into the RAID devices (md0, md1, md2, md3), and it started to rebuild the missing drive onto the new blank. You can see in the screenshot it is about 9.2% through recovery of the largest md device, md1.

From discovery to fix, this entire process took about 5 days. Actual user input was only about an hour, plus checking back and forth to make sure the drive was behaving.

Of course, RAID is not backup, but it’s great if your system can take two drives failing and still run fine. I have a backup system on a seperate drive and cloud backups. This is because in 2010, I typed an F instead of a G and wiped out the last 10 years.

Checking back through the logs, the problem was first reported on the 5th, but I didn’t see the email alert until the 25th, but at least it’s all fixed now. I didn’t need two drives, but it’s good to have a ‘cold’ spare in stock now 🙂