Intel NUC11PAHi5
Baobei's Faceplant
2025 has been a bad year in many ways. Diamond runs a virtual machine
called Baobei, with Microsoft Windows, for financial and tax preparation
software that isn't available for Linux. On 2025-04-13 we started up Baobei
and it did an infinite loop in both cores, as judged by top
on the host,
Diamond. The framebuffer was initialized but there was no splash screen for
the Intel TianoCore
EFI pre-booter nor anything else on the framebuffer,
just a black screen. 20 minutes of patience yielded no joy. I destroyed the
VM and retried startup: same result but I destroyed it after 2 minutes.
Messages in syslog looked normal except for one item per startup attempt:
Apr 13 12:28:25 diamond kernel: [T175479] x86/split lock detection: #AC:
CPU 1/KVM/175479 took a split_lock trap at address: 0x7efb5050
I think the split lock is a red herring. See this
ProxMox wiki article on Split Lock Detection. An atomic instruction does
something to a datum that's in two cache rows, often because it's misaligned.
To ensure integrity the executing core has to lock out all the other CPUs.
This is terrible for performance, and in a virtual machine situation, split
locks can be used as a DoS attack, so hypervisors have mitigation strategies
that are even worse for performance. A few split lock traps aren't a big deal
(as in my case) but software that doesn't take alignment seriously can really
slow down itself and its neighboring cores. Apparently Steam is (or was in
2022) a big culprit.
We need our financial records now.
Rather than running around like a chicken minus its head (a popular
strategy this year), I'm putting together some preliminary plans and scoping
out options. In summary, the possibilities are:
- Plan A: Repair Baobei's image and carry on as before.
- A.1: Did the latest KVM update go awry? The two Linux VMs are
running fine; one uses EFI. I reverted to two previous revisions
(04-08 and 03-25) but it didn't help. Specific versions:
- libvirt-*11.1.0-2.1.x86_64
- qemu-*9.2.2-1.2.x86_64
- libvirt-*11.0.0-2.1.x86_64
- qemu-*9.2.1-1.1.x86_64
- A.2: Mount the image on the host, reinstall the booter, rebuild
the Windows equivalent of an initrd, verify all the packages and
replace if damaged, restore relevant data from backups if damaged.
This is what I do on Linux, but I don't have the expertise to do it
on Windows, if it's even possible.
- A.3: Wipe Baobei's storage area. Do write-read tests to detect
bad blocks. (Readonly tests, size 80Gb, no errors, 3 tries.)
Reinstall Windows. Reinstall user apps (TurboTax etc).
Reconfigure Windows for the user. Restore data from backups. This
is my most likely next attempt.
- Plan B: For the second time, buy a new machine and install Windows
on bare metal.
- How likely is it that whatever bit-rot wiped out Baobei, could
similarly affect a real machine? Are we really more reliable on
bare metal vs. a VM? On a real vs. virtual Linux box,
the VM is more accessible for diagnosis and repairs, but I wouldn't
know how to get access to either a real or a virtual Windows
machine's disc.
- We have been running TurboTax on a dedicated Windows machine since
1992. At some time post 2000 we moved it onto a VirtualBox VM.
This got icky and we moved back to a then-new Intel NUC 5i5RYH.
But Windows outgrew this hardware and we moved back to KVM on a
NUC 7i5BNH, later upgraded to a NUC 11PAHi5 (and upgraded to
Windows 11).
- I have the NUC 5i5RYH and NUC 7i5BNH in storage as spares. But
neither one has a TPM2, so Windows 11 won't run on them. Forget
(and junk) the old hardware.
- On Amazon I looked for NUCs. No generation 15 CPUs but they do
have gen 14's loaded up with steroidal memory, disc and price.
NUC14RVH Core Ultra 7 155H RAM 32Gb NVMe 1Tb, $909, and other
choices. None of the gen 13 and 12 ASUS NUCs looked attractive.
- I stumbled on a good review for Beelink GTi13, CPU i9-13900HK (14
cores, 20 threads, "up to" 5.4GHz), RAM 32Gb, SSD 1Tb (PCIe4.0,
looks like M.2 form), etc. $599 SB Beelink-XXY FBA. Also
i9-12900HK for $579. One disappointed reviewer of an ASUS NUC said
he had 5 Beelink machines, up to 5 years old, and all still running
well.
- Plan C: Rewrite the financial database for PostgreSQL on Linux.
- We still need something to run TurboTax on. Ben uses, and
encourages us to use, cloud TurboTax, but we're too paranoid for
that solution.
- SQL is sort of portable, and it's fairly likely that I could
tear apart a backed-up Microsoft Access database file and extract
usable SQL and (particularly) database content. The big labor
item would be the reports: Microsoft Access has a flexible report
generator and data entry form engine, and I would have to recreate
these by hand as CGIs in Perl. I've been wanting to do this for
years, but as a quick response to an outage, it's not going to fly.
Photo Credit