Sunday, 14 December 2025

Predictable Network Names

It turns out that systemd's Predictable Network Interface Names are a bit of a misnomer.

The original change from eth0, eth1, etc. to the new v197 scheme described in the link above caught me out at one point way back during an Ubuntu Linux LTS upgrade [maybe from 14.04 to 16.04? I can't recall precisely] and left my system offline until I could get to the console and reconfigure the static IP network configuration from the old eth0 to the new interface name. I just switched the config over and continued to use the ens-prefixed name that systemd/udev picked.

More recently, a Proxmox Virtual Environment upgrade caught me out and I researched more. 

I understood that systemd/udev attempts to assign predictable names by using a versioned list of attributes to assemble the interface names in a predictable way. What I discovered was that unfortunately, not all attributes used in a given version are available from all kernels and so even if a system upgrade sticks to the same version of systemd/udev, a new or different kernel or driver version may add or drop support for reporting some of the attributes. This can result in a name being different after a kernel change.

Having found a good solution to (un)Predicatable Network Names that involves assigning a custom interface name based on MAC address, I implemented it on the various PVE hosts I was managing so that a future upgrade that affected the network interface naming would not cause havok.

I'm not on my 3rd pre-emptive fix on yet another system and had to hunt for the systemd.link docs again after not finding the recipe in this blog. So here it is:

Create an ordered collection of .link files, one per interface. Similar to other files processed in numerical order, by convention start with a two digit numeric prefix (in my case "10-") followed by the name you have chosen for the interface (in my case "en10g0" representing my ethernet 10Gb interface #0) and tack a .link on the end. Put it in the `/etc/systemd/network` directory.

e.g. `/etc/systemd/network/10-en10g0.link`

The contents of the file in my case would be something akin to:

[Match]
MACAddress=00:11:22:33:44:55
 
 [Link]
Name=en10g0 

If you get all your interface names configured similarly, then you can use those names everywhere else they get hard coded (such as in the `/etc/network/interfaces` file for Linux bridge devices, etc.) and then when you have caught and replaced them all (and have root access to the console just in case you missed something), reboot. If all is well, the network should come back up OK and then you should not have to worry about your interface names changing unless you change any hardware or MAC addresses. Personally, I feel that after messing with networking hardware is a much better time for networking to break because I'm alert and expecting it.

Wednesday, 24 September 2025

iDRAC email not sending - tests fail

 I was recently trying to configure iDRAC 9 on some servers to send email after discovering some unreliability in the inlet temperature of some servers located in someone else's server room.

Configuring the mail gateway in iDRAC involves setting its name, port, whether or not to use one of a couple encryption standards for the connection. and any required authentication. In another section of the UI, you can enter an email address and enable (State checkbox checked) and apply it. Then you can send a test email from that UI.

 My test emails were failing with the error:

RAC0225: Sending the test mail failed
 

At first I thought I might have the wrong SMTP gateway settings but double-checking confirmed that what I had specified (hostname and IP with no auth and no encryption) was working elsewhere.

 Then I thought it have been a firewall issue. I dug into pf rules on the relevant OpenBSD firewall and after not seeing anything obvious that would block that traffic I got to a spot where I was logging on the default block rule (block log all) and watching the log interface with tcpdump (tcpdump -n -e -ttt -i pflog0 src host myiDRACHost) and not seeing any blocked traffic. Maybe some other rule was blocking it... or maybe there was no traffic. While hunting for more info I came across this reply on a thread about the RAC0225 error above.

 Maybe my iDRAC didn't have a DNS server set and therefore couldn't resolve my SMTP host. After digging into my iDRAC networking settings, that turned out to be the cause of the error for me. Once I set my DNS server IP in iDRAC, testing the alert emails resulted in a "Success!" message from iDRAC; however... I did not receive the test emails despite trying a couple addresses.

 Hmm.... off to the mail gateway to have a look in its logs. That turned up this error message:

status=bounced (host ASPMX.L.GOOGLE.COM[142.250.31.27] said: 550-5.7.1 [my_mail_gateway_IP] Messages missing a valid Message-ID header are not 550-5.7.1 accepted. For more information, go to 550-5.7.1  https://support.google.com/mail/?p=RfcMessageNonCompliant and review 550 5.7.1 RFC 5322 specifications. af79blahbe357-85cblahdeb7si19blah85a.553 - gsmtp (in reply to end of DATA command)) 

 Searching for that message revealed plenty of online chatter about the change in GMail policy. iDRAC was mentioned by a few people as giving them this same problem. Interestingly, my search results on DuckDuckGo included a response generated by Duck.ai. Expanding it revealed this:

Modify Postfix Settings (if applicable)

If you are using Postfix as your SMTP server, you can add the following line to your main.cf configuration file:

  • always_add_missing_headers = yes

This setting instructs Postfix to add missing headers to incoming messages.

 After double-checking the postconf man page, I applied this setting, restarted postfix, sent another test message, and received the email!

Wednesday, 10 September 2025

ssh to old cisco switches and other old sshd implementations from newer ssh clients (version 8.8+)

I recently ran into a problem connecting to some cisco nexus switches that have an older sshd implementation that only works with ssh-rsa public keys and SHA1 checksums.

The initial symptom was being unable to login using the public key method with a  "too many authentication failures" error. After chasing that down and reducing the number of keys offered by my ssh agent from 4 to 2, the problem then became that it would reject both and move on to password auth. Suddenly I was seeing password prompts instead of the previously working ssh-rsa public key auth taking me straight into a shell on the switch.

Running ssh verbosely with ssh -v showed that my ssh-rsa key (only still around for use in these switches) was not being accepted. Instead the verbose output showed:

debug1: Offering public key: id_rsa RSA SHA256:ILXl4YsDBLAHBLAHBLAHBLAmPhz/D0Et1TBsClg agent
debug1: send_pubkey_test: no mutual signature algorithm

And after it looked for more public key files on disk to try, it got to:

debug1: Next authentication method: keyboard-interactive
(amos@123.456.789.012) Password: 

Some more digging led me to:

https://superuser.com/questions/1778874/openssh-v8-client-talking-to-openssh-v6-7p1-server-no-mutual-signature-algorit

And in turn:

https://www.openssh.com/txt/release-8.8

with this lovely bit:

Potentially-incompatible changes
================================

This release disables RSA signatures using the SHA-1 hash algorithm
by default. This change has been made as the SHA-1 hash algorithm is
cryptographically broken, and it is possible to create chosen-prefix
hash collisions for <USD$50K [1]

For most users, this change should be invisible and there is
no need to replace ssh-rsa keys. OpenSSH has supported RFC8332
RSA/SHA-256/512 signatures since release 7.2 and existing ssh-rsa keys
will automatically use the stronger algorithm where possible.

Incompatibility is more likely when connecting to older SSH
implementations that have not been upgraded or have not closely tracked
improvements in the SSH protocol. For these cases, it may be necessary
to selectively re-enable RSA/SHA1 to allow connection and/or user
authentication via the HostkeyAlgorithms and PubkeyAcceptedAlgorithms
options. For example, the following stanza in ~/.ssh/config will enable
RSA/SHA1 for host and user authentication for a single destination host:

    Host old-host
        HostkeyAlgorithms +ssh-rsa
	PubkeyAcceptedAlgorithms +ssh-rsa

We recommend enabling RSA/SHA1 only as a stopgap measure until legacy
implementations can be upgraded or reconfigured with another key type
(such as ECDSA or Ed25519).

And although I already had a .ssh/config file section for these switches with the `HostkeyAlgorithms +ssh-rsa` directive and a couple others, I did not have the `PubkeyAcceptedAlgorithms +ssh-rsa` directive. Adding it allowed my public key auth connections to my older cisco switches work again.

In the end, my working .ssh/config section for these switches looks something like:

 Host '123.456.789.*'
        HostkeyAlgorithms +ssh-rsa
        PubkeyAcceptedAlgorithms +ssh-rsa
        KexAlgorithms +diffie-hellman-group1-sha1
        Ciphers +aes128-cbc
        IdentityAgent ~/.1password/agent.sock


Monday, 8 September 2025

Get sshd to listen on multiple ports when systemd sockets are in use (affecting at least some recent Debian and Ubuntu containers)

I was going crazy trying to have sshd listen on multiple ports in a Debian Linux container under Proxmox Virtual Environment. This had been working at some point before on Debian Bookworm just by specifying multiple `Port` lines in /etc/ssh/sshd_config. But something changed at some point (either in Debian Bookworm updates or the upgrade to Debian Trixie) that changed how sshd is handled and caused it to appear to sometimes work and sometimes not.

I finally found this rude but helpful ServerFault answer:

https://serverfault.com/a/1142005/997178

It explains that sshd listen addresses and ports are now configured using systemd sockets. Setting them in sshd_config does nothing.

See /usr/share/doc/openssh-server/README.Debian.gz (use zcat) and pay special attention to the section near the end on systemd sockets.

Apparently this has been the default in Ubuntu for a while and recently became the pattern for Debian too.

Also see https://manpages.debian.org/stable/systemd/systemd.socket.5.en.html for info about sockets and the ListenStream option.

The final solution for me was to create `/etc/systemd/system/ssh.socket.d/listen.conf` containing:

[Socket]

#Clear ListenStream:

  ListenStream=

#Set new values. Multiple allowed:

  ListenStream=22

  ListenStream=2222

Growing Ubuntu LVM root filesystem inside a VM under Proxmox PVE

I wanted to grow the root filesystem of an Ubuntu 18.04 VM hosted in Proxmox PVE from 500GB to 2TB.

The VM is well backed up using Proxmox Backup Server and I have been through similar efforts with other VMs in other environments so I felt confident enough to proceed. 

I started by shutting down the VM and using the PVE GUI  (Hardware > Hard Disk > Disk Action > Resize) and specifying 2000 in the GB box. Then I started the VM, confirmed the change and shutdown again to take a snapshot before proceeding.

After booting the VM again I started to work through what I thought would be a relatively straightforward process of growing the partition, growing the physical volume, growing the logical volume, and then growing the filesystem.

It wasn't working as well as I expected. Searching online showed how to add a new partition of type `8e` (Linux LVM) to consume the free space, then creating a physical volume on it, then adding it to the volume group, then extending the root volume.

I thought this was a bit messy and thought there must be a way to just grow the existing partition without having to create a new one. Running `growpart /dev/sda 5` seemed to suggest that it was able to add a bunch of space to the partition but none of the pv, vg, and lv display, extending, or resizing commands seemed to suggest that the free space was available or that growth was possible.

In the end I perhaps jumped the gun by removing the swap LV thinking that it was blocking the growth of the root LV. Then I realized that the /dev/sda5 partition hosting the VG was an extended partition and that /dev/sda2 (the primary partition assigned to hold extended partitions) might need to be grown before /dev/sda5.

So what I ended up with was more or less:

 

swapoff -a

lvremove hostname-vg/swap_1 

growpart /dev/sda 2 

growpart /dev/sda 5

pvresize /dev/sda5 

lvextend hostname-vg/root /dev/sda5

resize2fs /dev/hostname-vg/root

 

But I'm not sure that I actually needed to remove the swap and was maybe just missing the `growpart /dev/sda 2` before the `growpart /dev/sda 5` and the rest to make it work. The next time I need to do this... try without removing swap first.

Wednesday, 26 March 2025

Modern Ubuntu web kiosk using chromium as the browser engine

 I have been working to prepare a digital atlas exhibit for the Natillik Heritage Centre in Gjoa Haven, Nunavut, Canada. Working with Indigenous communities and various communities of practice to help them build digital atlases is something we have been doing at the Geomatics and Cartographic Research Centre for more than a couple of decades now.

 There is more and more interest in local ownership and stewardship over these data systems and I have deployed atlas and knowledgebank systems in a handful of locations before, sometimes just with a passive status display (for example, a TV showing the live location and updates provided by Inuit hunters) and other times with a large format touchscreen allowing for interaction with the atlas and the various multi-media within.

 The technology has changed over the years so each project starts with a fresh review of what's possible and what's best given the needs. For a previous project I chose to use a Windows 7 client running proprietary SiteKiosk software to drive the browser via a 3M Touch Display. The server ran SmartOS (clever USB key boot with all disks in a ZFS mirrored-pair pool for everything else) with a Linux VM in a zone. The server was a Dell PowerEdge tower and was bigger, noisier, and hotter than ideal. The client was its own small form factor PC. Both were tucked under a diorama near the screen but this was not ideal.

 Fast forward to today and the small, fanless, big storage, and low wattage (but still powerful) computer situation is much better and Proxmox PVE is my hypervisor of choice. For this latest build I had hoped to get away with a single Proxmox system running the atlas server VM (Ubuntu) as well as handling the client duties somehow. Sadly, the OnLogic Karbon 700 SE system had some teething problems and was just not playing well with the Dell C6522QT touch display being used this time around. Fortunately, the display is designed to fit an optional OptiPlex Micro 7020 which would still leave us with two systems but at least it would be contained within the screen chassis.

 Wanting to go with a minimal Linux solution for the client this time, I started looking for kiosk options on Linux. The current best effort being made for this type of thing appeared to be Ubuntu Core running Ubuntu Frame with a browser kiosk snap. But the OptiPlex came with Windows 11 pre-installed and I had installed Ubuntu 24.04.01 Desktop alongside in a dual-boot configuration. I did not want to wipe the factory Windows 11 until I knew I wouldn't need to revert to it for the kiosk. My attempt to overwrite the Ubuntu Desktop partition with a pre-built Ubuntu Core image was flawed before it began and required some work to get to a better spot with Ubuntu Server instead.

 After experimenting with Ubuntu Frame and wpe-webkit-mir-kiosk on Ubuntu Server rather than Ubuntu Core, I was able to get the kiosk pieces working but the wpe-webkit-mir-kiosk browser engine didn't play well with the atlas system it had to serve. The map tiles were being shuffled and the UI events seemed to be having problems. We only really support and test atlas stuff with Firefox and Chrome so I did some more research and then asked for some more help. The solution came quickly from one of the authors of Ubuntu Frame who pointed out that the chromium snap might do the trick and it did.

 Steps:

Install Ubuntu Server 24.04.02 and then:
```
sudo snap install ubuntu-frame
sudo snap set ubuntu-frame daemon=true
sudo snap install chromium
sudo snap set chromium url=https://clyderiveratlas.ca
sudo snap set chromium daemon=true
snap connect chromium:wayland
```
To change the URL:
```
sudo snap set chromium url=https://discourse.ubuntu.com/t/current-guide-for-a-firefox-based-kiosk-using-ubuntu-frame/57638
sudo snap restart chromium
```
 I am happy to report that the configuration persists and the ubuntu-frame and chromium snaps come up again after a reboot. The system is set to automatically boot after a loss of power so I think this part of the puzzle is more or less solved.

Thursday, 20 March 2025

Set up dual boot with Grub2 after messing things up

Working on a project recently, I made the mistake of thinking that a pre-built Ubuntu Core image could be written over an existing Ubuntu Desktop ext4 partition to switch the OS. Well first of all, Ubuntu Core pre-built images are for entire disks and include their own partitions including a bootloader that can self-heal the OS partition. So yeah... after doing this the grub2 bootloader I had installed on an EFI partition do longer found any OS to boot since it's `/boot/grub/grub.cfg` file was gone.

Following the suggestion in the post linked above, I created an Ubuntu Server USB Key installer and installed that on the partition I'd designated for Ubuntu. I had to specify manual layout instead of whole disk and choose the partition to be `/` but other than that, it was a normal Ubuntu server install. Sadly though, unlike when I installed Ubuntu Desktop the first time through, it did not notice the Windows 11 partition or offer to set up dual boot.

After some RTFM I found a bit about how the update-grub command by default runs os-prober to find and add other operating systems to the grub config at boot time. Simply running update-grub in my Ubuntu Server environment seemed to do the trick and Windows Boot Manager was found and added as a non-default option in the Grub2 boot menu. I rebooted, tried choosing Windows and it worked. Then again and back into Ubuntu Server by default. All is well.

Thursday, 13 March 2025

Dell Touchscreen Display USB Touch Interface not recognized

 I was having some issues setting up a touchscreen kiosk in Ubuntu 24.04 using a Dell 65 4K Interactive Touch Monitor - C6522QT paired with a Dell OptiPlex Micro 7020 installed in the built-in OptiPlex holder tray in the display.

 First I discovered that the instructions have you connect via HDMI 1 and USB 1 ports on the display. Unfortunately, the onboard Intel UHD Graphics 770 won't do 4K over HDMI. The monitor will scale for you but the resolution is lower. The touchscreen input was working in Windows and Linux.

 To get to the full 3840 x 2160 resolution, you need to connect using DisplayPort.

 After switching to DisplayPort and feeling quite chuffed with myself, I had to set the project aside for a few weeks. When I came back to it again, I did some Ubuntu updates, tinkered with trying to get some form of remote desktop working, and then happened to try using the touchscreen again. To my horror, it didn't respond at all. Research led me to learn that the Dell monitor uses an InGlass Display made by a company called FlatFrog that is supported by the hid-multitouch driver that has been in the mainline Linux kernel since version 3.5.

I'll skip the parts where I spend hours messing with my Ubuntu system and pick up where I start to try to enable the hid-multitouch driver and look for the device.

root# modprobe hid-multitouch
root# lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 8087:0033 Intel Corp. AX211 Bluetooth
Bus 001 Device 005: ID 413c:2514 Dell Computer Corp. Dell Universal Receiver
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

This is when I discover that the device isn't showing at all. I don't know what I'm looking for, but what's there is other stuff. Unplugging everything leaves me with:

root# lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 8087:0033 Intel Corp. AX211 Bluetooth
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

So I am now wondering why the USB touch device isn't showing up at all.

So I started flailing around some more switching ports on the computer, etc. and eventually wondered if there might be something that I've changed on the monitor. So I then had the brilliant idea to RTFM.

Lo and behold, on page 12 of the Dell C6522QT User's Guide is the magic "Input sources and USB Pairing" section. From this in inferred that because I had switched my display connection from HDMI 1 to DP, I needed to move the USB connection to from USB 1 to USB 3.

Boom:

root# lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 8087:0033 Intel Corp. AX211 Bluetooth
Bus 001 Device 009: ID 0424:2916 Microchip Technology, Inc. (formerly SMSC) USB2916 Smart Hub
Bus 001 Device 010: ID 25b5:00e6 FlatFrog FlatFrog DA-TK65P-20P2WE-M4-00e6
Bus 001 Device 011: ID 0424:284c Microchip Technology, Inc. (formerly SMSC) Hub Feature Controller

Bus 001 Device 012: ID 413c:2514 Dell Computer Corp. Dell Universal Receiver
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 002 Device 003: ID 0424:5916 Microchip Technology, Inc. (formerly SMSC) USB5916 Smart Hub

And the touchscreen started working. Woo!

Wednesday, 12 March 2025

Set search domain when VPN connected using Network Manager in Ubuntu 24.04



I recently assembled a Framework Laptop 13 and installed Ubuntu Desktop 24.04 on it. I have been working through numerous little things to make life easier for myself. One I found a solution to recently was setting a specific DNS resolver search domain when connected to a VPN.


nmcli c modify <vpn-settings-name> ipv4.dns-search '<domain>'


You should specify <vpn-settings-name> that corresponds to a VPN setting name in GUI. And <domain> is the domain name you want to search via DNS in the remote network.

Tuesday, 9 April 2024

Proxmox mount host storage in Linux Container

 On host in terminal run:

pct set <container_id_number> -mp0 /<host_dir>,mp=/<container_mount_point>

Tuesday, 2 April 2024

Windows 10 update KB5034441 fails with error 0x80070643 (how to ignore update)

In January 2024, Microsoft pushed out a broken update to the Windows Recovery Environment (WinRE) that got a couple things wrong:

1) If your WinRE partition didn't have enough space, the update would fail to install.

2) If you didn't have a WinRE partition at all, it would also try to install and fail.

Within a week or two, MS linked to info about how to resize a WinRE partition to address #1 above. As for #2, they suggested you just ignore the failing update.

My system was suffering from #2 and I was tired of seeing the failing attempt so I did a quick search for how to disable a specific update and found an article describing Microsoft's "Show or Hide Updates Tool".

 

Predictable Network Names

It turns out that systemd's Predictable Network Interface Names are a bit of a misnomer. The original change from eth0, eth1, etc. to t...