| Selection | Testing | Setup | Faceplant | Top | 
Check the box contents and record serial numbers. Try to get the Ethernet MAC address so it can be registered with the firewall. Recent (2017) NUCs have the MAC address and the serial number on a sticker on the bottom of the machine and also on the box.
What's in the box:
Intel Core i5.
Make a special backup of Diamond, in case of mistakes. [Done]
The new machine initially will be called Orion. Add it to hostdata.db and install relevant files everywhere (/etc/hosts, /etc/ethers, hostgroup.db, trusted-adr.fw). Reload the firewall, to accept the new MAC address. [Done.]
The machine is pre-assembled: no assembly steps. But it needs a name sticker. [Done]
Its temporary home will be atop Jacinth's cabinet. Steal Jacinth's monitor, keyboard and mouse. Connect Ethernet to the spare hub port. [Done]
Unplug the Ethernet (for the next several steps) and let Orion boot into Windows. Once it tries to start setup, shut it down. If it were dead on arrival, it would fail in this step. [It successfully did limited Windows Setup. It did a Wi-Fi scan and found the CouchNet SSID. Not dead on arrival.]
Update the BIOS using the flasher in the BIOS. [Done] Details here.
Set up options in the BIOS. [Done] Details here.
Download a recent Tumbleweed rescue disc, onto USB flash memory. Boot up Orion on this disc. [Done]
Execute /etc/kea/mkstatic.pl which replaces Orion's MAC address, so it will get the correct address when I plug in Ethernet. [Done, and Orion got its correct address.]
Partition the disc. [Done] Details here.
Copy partition contents from the old Diamond to Orion. [Done] Details here.
Boot up Orion on its own disc. Check out all services (checkout.sh). Fix any problems found. Details here.
Subsystem checkout and testing. I went through subsystem.shtml and did the listed tests; q.v. for the results. Discrepancies and fixups are listed here.
Do the power and speed measurements on the new machine. [Done] Details here.
 I need to make Diamond become Orion, and Orion become Diamond,
	without making errors, I created /s1/etc-orion and /s1/etc-diamond
	on both hosts, with a directory structure actually starting at the
	root, containing host-specific files.  I can swap one or the other
	into place by: 
	
 rsync -n -a -O /s1/etc-diamond/  /
	
 (-n for testing, and it will help to squeeze in 
	--log-format="%o %f" to make it report what it's copying.)
	
	
The files involved are:
For each directory, once it's populated, install it on the other host. Test that everything in the host's own dir is right (doesn't get selected for non-copying with rsync -n) and that all files would get non-copied (with rsync -n) from the other host's dir.
Burn bridges: interchange Diamond's and Orion's name and host keys; see the previous paragraph for which files those are and how to make the exchange. Details:
audit-scripts, then
restarter, and
checkout.sh. Check that LDAP, Kerberos, SSH and Apache (TLS) function with the new name. Repeat with the old machine (now Orion); formerly working LDAP and Kerberos KDC should be out of action.
 hostdata.db: Put Orion in the down
 hostgroup.  Interchange
	the MAC address for orionen0 and diamonden0.  Don't swap the MACs on
	br0, which by local policy are derived from the IPv4 address.  Rebuild
	/etc/ethers and /usr/diklo/lib/hostgroup.db and install on all hosts.
	[Done]
    
Shut down the old machine (Orion). Print up a note about the old machine's status. Put the old machine in its box in the storage area. [Done]
Move the new machine to Alice's desktop. Re-pair the Bluetooth mouse. Check that Cups can print. [Done]
As delivered, the machine has BIOS version 0040 dated 2021-04-14. Not too old, but I should check for recent updates. How to obtain the latest BIOS version:
OS Independent. Download type = BIOS.
BIOS Flash Update. This page is also accessible from the regular setup on F2. It shows the current BIOS version; we have version 0040 dated 2021-04-14.
Unknown Device. Scroll down to this line and hit Enter.
Intel NUC), hold down F2 to get into BIOS setup. Verify that the new BIOS version got installed.
This should be done after the BIOS update because that resets at least some options to factory defaults.
 Press F2 during booting to get into BIOS setup.  It shows 
	Intel NUC
 for about one second, and likely you have to hit F2
	while that screen is showing.  Alternatively, just hold down F2 when
	it starts the boot process.  
    
Main page: System Information.
Advanced (Devices): I changed only one of these.
 Cooling:  By default the Fan Control Mode
 is 
	Balanced
 but I changed it to Quiet
.  It has temp sensors
	on the CPU, the PCH (whatever that is), the memory, and the
	motherboard.  But its policy is to leave the fan off until the CPU
	temperature reaches 80 C, then turn the fan on full blast, and it 
	leaves a bit set so that at the next boot it shows a message that
	a thermal emergency was responded to; check for blocked airflow; 
	press any key to continue.  I reverted to Balanced
.  
    
Performance: Processor: I turned off hyperthread.
 Security:  Totally unlocked.  I left these alone.  In a public
	computer lab you will want to set a BIOS password.  These Security
	Features
 are enabled: virtualization; virtualization with direct
	I/O; Platform Trust Technology.
    
Power:
Boot: Both Legacy Boot and UEFI were enabled, so it says. Under Secure Boot, it's enabled, and this is supposed to disable Legacy Boot. But my kernel had an invalid signature so I disabled Secure Boot. (With a new kernel, is the signature once again valid?) Under Boot Priority, Windows Boot Manager came first. I switched to USB flash drive first, and I disabled PXE over IPv4+6. The USB drive persists even if not plugged in, unlike in some older BIOS versions.
To save and reboot, hit F10. It booted Windows anyway (shut down). To boot the rescue system, when booting hold down F10 and it will give you a menu on which you can choose Windows or the USB drive. That worked (whew!)
At this point there are three ways to proceed:
 Install OpenSuSE Tumbleweed, then restore
 the special
	backup onto Orion.  This is the most work but has fewest dicey
	steps.  See
	
	Holly Hosed, Tested Backup for details of the procedure.  
    
Partition Orion's disc by hand including creating filesystems and labels. Mount each partition in the rescue system and use rsync, or the equivalent with tar and ssh, to copy Diamond's content onto it. This is the procedure I will try first.
 Partition Orion's disc by hand (no filesystems).  For each 
	partition, copy Diamond's raw device onto Orion's (across the net).
	This method may not work because the partitions probably won't be
	exactly the same size, even though close, the filesystems will 
	appear to be corrupt, and repairs probably
 will work but
	this isn't assured.  
    
Should I take out Diamond's disc, connect it to Orion with my SATA to USB adapter, and not have to deal with and wait for the network? It's probably better to use the network connection even if slower — the USB adapter isn't that fast anyway.
Partition table on old Diamond:
| Nbr | Size | Role | Fsys | Label | 
|---|---|---|---|---|
| 1 | 1049kB | BIOS boot | -- | |
| 2 | 38.8MB | EFI | fat16 | |
| 3 | 23.1GB | Root | ext4 | |
| 4 | 23.1GB | Home | ext4 | |
| 5 | 8587MB | Swap | linux-swap(v1) | |
| 6 | 52.0GB | Old VM #1 | -- | |
| 7 | 55.4GB | Old VM #2 | -- | |
| 8 | 338GB | Extra | ext4 | 
Orion's disc: /dev/nvme0n1 Silicon Motion SM2263EN/SM2263XT SSD Controller, 238.47GiB, 250.0Gb, can't tell the actual vendor's name but it's in Guangdong, CN.
Partition table on Orion:
| Nbr | Size | Role | Fsys | Label | 
|---|---|---|---|---|
| 1 | 1 MiB | BIOS boot | None | None | 
| 2 | 40 MiB | EFI | FAT16 | EFI-11 | 
| 3 | 20 GiB | Root | Ext4 | ROOT-11 | 
| 4 | 32 GiB | Home | Ext4 | HOME-11 | 
| 5 | 16 GiB | Swap | linux-swap(v1) | SWAP-11 | 
| 6 | 170 GiB | Extra | Ext4 | S1-11 | 
The Yast2 partitioner populates /etc/fstab (on the rescue system) with lines for the mountable partitions created. It persists across reboots; there's a copy-on-write overlay area. Mount /dev/nvme0n1p3 (use /dev/disk/by-label/ROOT-11) on /mnt. Copy /etc/fstab to /mnt/etc/fstab-orion . Unmount the partition.
The goal at this point is to copy stuff from Diamond to Orion's disc. A few preliminary steps were needed.
 It got into a weird state.  For reasons unknown, the live CD
user linux
 (no password) could not log in; logs suggest that it
did log in but exited immediately.  I'm continuing as root on tty1.
 It has a correct /etc/resolv.conf, generated by netconfig, probably
from DHCP options that it was given.  ssh jimc@diamond uname -a
works (password required).  
 For easier copying I will give root@orion a key agent with jimc's secret
key. It will get this key by:
    
 rsync -a jimc@diamond:~/.ssh/ ~/ .ssh/  (which already exists).  
    
 eval $(ssh_agent)
    
 ssh-add /root/.ssh/id_rsa (give password)
    
 ssh diamond uname -a (works, no password needed)
 Command lines to suck a filesystem:  /boot/EFI, home and s1 are mounted
on their proper mount points, but not root, of course, so put it on
/mnt.  
But I'll first do EFI because it's very small
Guess what, Diamond, Jacinth, Iris don't have EFI booting, 
only Xena does.  Get it from Xena.  
Another gotcha:
rsync will set the owner and group to the numeric values in the
rescue disc's /etc/passwd and /etc/group for their names, 
which differ from what CouchNet is enforcing, leading to things not
working.  Head this off with the --numeric-ids option.  
 mount /dev/disk/by-label/ROOT-11 /mnt 
 rsync -a --one-file-system --numeric-ids --exclude lost+found xena:/boot/efi/ /boot/efi/ 
 rsync -a --one-file-system --numeric-ids --exclude lost+found diamond:/ /mnt/
 rsync -a --one-file-system --numeric-ids --exclude lost+found diamond:/home/ /home/
Seems to have arrived in good order, cross fingers. Root had close to 10.0Gb. If it saturates MOCA (100Mbit/sec) it would take 600sec (10min). Actual 19min, no error messages. /home has similar size and speed. I'm not copying /s1; it's all ancient special backups and ancient virtual machine images. But remember to re-create /s1/scr [done].
After copying content I need to change some items to stop being Diamond and start being Orion. This is still on the rescue disc.
grub2-install /dev/disk/by-label/ROOT-11. The safest way is to chroot into /mnt (root). You need to mount /proc, /sys and /dev on the new root's mount points. It seems to be OK to have them mounted in multiple places. Also
mount /dev/disk/by-label/EFI-11 /boot/efibut not if it's in /proc/mounts, check first. [Done, no errors reported.]
Initial problems with booting:
 Secure Boot Violation, invalid signature detected, check secure
	boot policy in setup.
  I didn't try to debug this; I just turned
	off Secure Boot.  In Setup-Boot-Secure Boot-Secure Boot (change
	to Disabled).  [Now Grub shows its menu.]
    
 No such device: UUID ending in e973 .  Checking on Diamond, this
	is Diamond's root partition (which is on Diamond, not Orion).  
	Back to the rescue system and chroot to Orion's root.  
	View /etc/default/grub and run 
	grub2-mkconfig -o /boot/grub2/grub.cfg
.
	grub2-mkconfig populated grub.cfg with 3dd0 which is /dev/nvme0n1p3
	(Orion's root).  Rebooting again.  
    
 It boots.  And hangs after enumerating USB 
	devices.  After about 60 secs it announces dracut-initqueue:
	starting timeout scripts
, still waiting for…
	(it gives lines from 
	/run/systemd/generator/systemd-cryptsetup@*.service, 
	grep for After=remote-fs-pre.target)
	After reporting this once per second for 30-60 secs, it recommends
	saving /run/initramfs/rdsosreport.txt on a USB flash drive.  
	It drops into emergency mode.  Give root password for maintenance.  
	The report looks useful but only usbcore is loaded and I can't
	mount USB memory.  Back to chroot jail.  My hypothesis is that there
	are host-dependent items in the initrd which are botched on the other
	host.  Mkinitrd should fix that [done].  Rebooting. 
    
It boots much more normally. It's on the net, so it says. The only failed service seen is slapd (LDAP), shouldn't have been attempted on Orion. Looking for lightdm greeter, but it displays nothing. Isn't that cute, it has Diamond's IP. Which was hardwired in /etc/sysconfig/network/ifcfg-br0 . Fixed, now both Orion and Diamond are happy (until exchanged).
Orion is using Diamond's SSH host keys and clients complain. I made separate dirs /etc/ssh-orion and /etc/ssh-diamond, with a symlink to the first one, and I added Orion's SSHFP records to DNS. Now it works. I did the same thing on Diamond. Later I reverted this set of dirs, and created equivalent directories with a lot more host-specific files.
checkout.sh discrepancies:
Wicked: If you do ping -4 -w 3 jacinth (192.9.200.193) it answers, but $ft/wicked reports failure on this test. [Self-healed, restarter may or may not have helped.]
 display-manager is still hosed.  Failed to create IPv4 VNC
	socket on [::]:5900 : not an IPv4 address
 (paraphrased).  
	Diamond has the same message but still starts.  /var/log/lightdm
	contains logfiles from 2017, needs a cleaner.  
	seat0-greeter.log: /var/lib/lightdm/.Xauthority permission denied.
	Owned by the wrong luser, why wasn't it copied over right?  
	(For the answer, and a fix, search for --numeric-ids .)
	[Re-owned, now it starts.]
    
Wonder of wonders, Avahi passed its test. Normally, if you sneeze it will fail.
cups is not wanted on Orion, but it's running and passes its test. Apparently the printer doesn't have to be connected for it to pass. Similar for kpropd krb5kdc postgresql unbound{2,3}. [Fixed by running audit-scripts which disabled all of them.]
krb-client.J failed because it has no host key for Orion. [Installed it, now test passes.]
 bluetooth-hci.J is not wanted (should be wanted), is dead, but
	even so the HCI is powered up (good).  Reason: Orion was not in
	hostgroup blue
.  [Added, problem solved.]
    
ldap is not wanted, enabled, failed. [Disabled it on Orion.]
apache2 botched TLS to orion.cft.ca.us because it had the wrong host key. [Installed correct host key, now it works.]
alsa-restore: need to regenerate /var/lib/alsa/asound.state for the new sound card. [rm /var/lib/alsa/asound.state and retest.]
firewall.J: botched vnc-server, http, https because the daemons are hosed (skipped). Remaining ports passed.
check-net.S: /etc/ethers needs a pro-forma host orionen0 with the hardware address, and the correct fake MAC on orion (br0). [Done, now passes test.]
Daemons that are enabled but should not be on Orion:
Rebooted, re-ran the test.
 The daily report had a bunch of permission and ownership issues. 
    This was already seen for /var/lib/lightdm/.Xauthority (preventing the
    display-manager from starting, see above).  I'm beginning to suspect 
    this scenario: rsync was used to import Diamond's files to Orion.  For
    owners and groups it sends the alphabetic names from Diamond (and 
    numeric), and Orion looks up the name in /etc/passwd or /etc/group,
    and uses the resulting numbers in the file's inode.  This is
    /etc/passwd or /etc/group on the rescue disc, not the CouchNet
    values, producing these error reports.  So I'm going to re-do the
    transfers.  
    
 rsync -a -O --one-file-system --numeric-ids --exclude lost+found diamond:/home/ /home/
    
 And similarly for / (root) and /boot/efi .  But I'm going to have
    to be cautious, to avoid overwriting files whose content is supposed
    to be different on Orion.  
These are discrepancies encountered when doing the tests listed in subsystem.shtml.
I tried to boot the kernel and initrd copied from Diamond. They needed these steps before I could boot them:
Secure Boot Violation, invalid signature detected.I turned off Secure Boot in BIOS. Now that I've gotten it booting, I wonder if I can turn it back on?
Speed test output in Kb/sec: SHA512 1 core 345663 4 cores 1382652 reading 87859 overall 346384 Comparison to the old Diamond: CPU about 2.5x, disc reading 9x.
Wake on LAN does not wake Orion from S5. It should be enabled in BIOS Power-Secondary. See BIOS Setup power section. [Done, now it wakes.]
 After hibernation, Orion does a full reboot, not resuming. 
	This happened on Xena too, but other hosts resume with no special
	configuration.  To fix, I created 
	/etc/dracut.conf.d/99-resume.J.conf containing 
	
 add_dracutmodules+=" resume "
	
 (a space is required before and after each module name).  
	#Comments OK.  Then execute mkinitrd --or-- dracut -vf .
	[Fixed, now it resumes from hibernation.]
	
 To see if an initrd has the resume module, do 
	lsinitrd /boot/initrd-… | less
 and look for 
	resume
 in the modules section.  
    
Now when it goes into S4, 2 secs later it wakes up, i.e. starts the boot process and diverts into restoring the saved image. There was no net traffic (pinger or ssh) to Orion but broadcast packets (ARP, IPv6 neighbor discovery, etc) are likely. Per /proc/acpi/wakeup these devices could wake the machine from S4: (and many more were disabled)
There are forum discussions of Bluetooth causing this problem, supposedly fixed in kernel 5.3.x. I stopped bluetooth and hibernated: didn't help, it woke up again.
 Someone mentioned that if you 
	echo PXSX > /proc/acpi/wakeup
	it will toggle the enabled status of that device.  I tried 
	disabling all of them, then hibernating: it disabled all but PXSX
	(Ethernet NIC).  Success: it stayed off for 60 sec.  Wake on LAN
	woke it up.  
	
Turning them on one at a time, most suspicious first: XHCI: wakes immediately. I put everything on except XHCI. Stayed asleep for 60sec. In setup, USB S4/S5 power was off; suppose I turn it on (and wake on USB from S5). After reboot, XHCI wakeup is enabled. Wonder of wonders, it stayed off for 60 sec. And it woke on USB. I'll want to keep an eye on power consumption, and maybe suppress waking on USB.
 Intel docs claim the i5-1135G7 chipset includes a watchdog
	timer, but the iTCO_wdt.ko driver was not loaded.  It would appear
	that this happens in the initrd, if Dracut configuration includes
	the watchdog module.  Most machines have this by default, but on
	Orion you have to configure it.  Create 
	/etc/dracut.conf.d/99-watchdog.J.conf containing
	
add_dracutmodules+=" watchdog "
	
(spaces required before and after the module name; #comments
	OK), and run mkinitrd or dracut -vf.  [Done]
    
Speed test on Jimc's benchmark. Columns in the output:
The test was run 3 times and the last one is reported. Actually the test is designed to be reasonably immune to buffer cache effects and scores vary only about 3% between repetitions. Numbers are in kbytes/sec.
| SHA512 | SHA512*cores | Disc read | Composite | Machine | 
|---|---|---|---|---|
| 120848 | 241696 | 12003 | 81531 | NUC5i5RYH | 
| 200479 | 400958 | 4964 | 127781 | NUC7i5BNH | 
| 346151 | 1384604 | 84795 | 345279 | NUC11PAHi5 | 
| Selection | Testing | Setup | Faceplant | Top |