..

Failover Scenario

Some services in my homelab became crucial for my day-to-day life. Some of the services are easy to spin up on other or new instances, but others are not. I have some important services running in FreeBSD Jails, which aren’t that easy to migrate to other systems than FreeBSD. I want to migrate them step by step to a more common form like lxc, docker ore something similar, but I have 19 such FreeBSD Jails.

Crucial services

Following services run in FreeBSD Jails and are important to me:

  • I’m using nextcloud for all my files across all my devices (mobile phone, work laptop, private laptop, and even some docker volume backups to restore on the laptop). Then I also use the calendar, including one shared across my family. And last but almost the most important is the passwords-app of nextcloud (it was not well-designed some time ago, but I consider it safe now).
  • This blog
  • The homelab’s wiki which is based on DokuWiki
  • Gitea server for private projects, some configurations and ansible repositories

Failover options

My main server runs Proxmox and within I virtualized my TrueNAS. I passed all hard-disks through, except 2 mirrored NVMe SSD’s for Proxmox. I choose this setup, because I only want to run one device 24/7 for a low power consumption. Nevertheless, I want to have a NAS in form of TrueNAS, simply because I love it and in my opinion it does a very good job and I also like to have a proper Hyper-visor like Proxmox to have a broad range of options for my homelab.

2 node proxmox cluster with quorum device

Printer as quorum device.

Advantages:

  • auto failover

Disadvantages:

  • shared storage needed
  • high power consumption, because redundant devices –> overkill.?

3 node proxmox cluster

definitely overkill…

Backup TrueNAS node

My old NAS ran TrueNAS for almost three years, so there should be no problem with the hardware. The idea is to just replicate the main-NAS with the same pool names (especailly the sysdataset), so that I can replicate it with ZFS and if a FreeBSD Jail or the whole main-NAS brakes. I can just start the jails on the backup-NAS. With this plan my data is well secured on the backup-NAS and I can simply start my most needed jails like the reverse proxy, nextcloud, blogs, wiki and git.

BUT… I also got comfortable with proxmox and its possibilities. So there are also some LXC-Containers setup with ansible and there are also 22 Docker containers running. This stuff is not super important to me, but would be a nice to have on a backup system. So there are some further options:

  • New device with proxmox –> all possibilities
  • Virtualize Ubuntu with docker and a simply LXC Host on the backup-NAS
  • Virtualize a proxmox node on the backup-NAS

Implementation

First I went with this Idea; 2 node proxmox cluster with quorum device. It went well till I wanted to configure Proxmox to pass through pci devices (to be able to virtualize a second TrueNAS). The hardware of my old NAS did not support IOMMU Interrupt Remapping, which is mandatory according to the proxmox documentation:

It will not be possible to use PCI passthrough without interrupt remapping. Device assignment will fail with ‘Failed to assign device “[device name]”: Operation not permitted’ or ‘Interrupt Remapping hardware not found, passing devices to unprivileged domains is insecure.’ error.

There are some things one can try out, with unsafe interrupts, which I did, but it didn’t work out…

So there was only one option left; Backup TrueNAS node.

ZFS Dataset transfer

The implementation of that was surprisingly easy, thanks to the well integrated ZFS-Features. With TrueNAS one can just choose a new replication job, either on the main- or backup-NAS, then choose the right locations. If the chosen target is remote and both systems are TrueNAS you can also auto-generate the ssh connection. That means you have to provide the root password once during setup and TrueNAS will generate a public-key and upload it to the target system. In my case it also found the needed snapshots automatically (maybe that depends on the naming scheme).

Start / Stop backup-NAS for the replication

  • To start the backup-nas I configured a cron job on the main-NAS; command: wake vtnet0 00:11:22:33:44:55, time: 45 1 * * *
  • Then the backup-Nas has 15 min to come online and warm up…
  • At 02:00 the replication from the main-Nas/sysdataset to the backup-nas/sysdataset starts
  • At 02:15 an additional backup replication on the backupNas to a secondary pool starts
  • Half an hour later I configured a cron job on the backup-NAS, which shuts it down;

With that set up I can pretty much turn on the backup-Nas, and it starts the jails I need, if the main-NAS is broken down. The only problem was, that all jails started on boot, because that’s what is set on the backup-Nas. That is why I removed the start on boot option in all jails and added an init-script on the main-Nas, which starts the jails I want on boot up.

#!/bin/bash
jails=(
  "jail0"
  "jail1"
  ...
  "jailn"
 )

for item in "${jails[@]}"; do
  iocage start $item
done

Replication of the docker images and LXC-Containers

Now, I have a backup-NAS with the most important services, which is enough for the moment. But if I stumble over some low power pc’s somewhere, I will definitely set up a proxmox cluster to fill that gab.

To be continued …