Valid HTML 4.01 Transitional

Holly Hosed, Tested Backup

James F. Carter <jimc@jfcarter.net>, 2021-06-08

Today SuSE has a big update for aarch64, moving lots of stuff from /bin /lib to /usr/bin /usr/lib. This was done for x86_64 last week and went smoothly. But not on Holly. The result is that either the executor or the shared libraries have become missing, so if you do "ls filename" it says "/usr/bin/ls: no such file or directory" (even though it's there, per "echo /usr/bin/ls*". Specific error message:

Installation of filesystem-15.5-40.2.aarch64 failed:
Error: Subprocess failed. Error: RPM failed: Make a copy of `/bin'.
(It seems to have successfully made a copy of /bin.)

This upgrade was not exactly atomic: last week the firmware migrated from /lib/firmware to /usr/lib/firmware. But the drivers, specifically the out of kernel user compiled driver for (RTL) 88x1bu.ko, are looking for the firmware in /lib/firmware. A simple fix was to make a symlink from /lib/firmware to /usr/lib/firmware. Problem solved — yeah, sure. I suspect without real proof that the posttrans script for the filesystem package was not smart enough to recognize that it should just remove the symlink, not try to copy it to /usr/lib where ./firmware already exists. In any case, only part of the essential infrastructure formerly in /lib got copied over.

So I'm going to have to do something drastic to bring Holly back to life. The intervention will obviously involve restoring things from backup… and now is a very good opportunity to test if my backup system is actually saving everything important, and if access to the backup copy is actually feasible. (Other shops have had shortcomings in both these areas.) Fortunately all hosts including Holly were freshly backed up, according to the standard procedure, just before the system update that went awry.

Goals

Here are some goals for the recovery campaign:

Outline of the Plan

Executing the Plan

Stuffing the SD Card

Initial Boot and Login

Running Post_jump

Restore from Backup

Overall Conclusion

This disaster recovery exercise has been surprisingly successful.

Appendix: Same Issue on Claude

In the update of 2022-03-22, all hosts got all the way through the update with no problems, except Claude was the last to finish and got chomped. The mirror that I was using (provo-mirror.opensuse.org) had a weird event in which all the content vanished: the root (htdocs) directory got served but it was empty. I aborted the update, but some cache file was left in a state where libzypp complained that the baseurl of something unspecified lacked a host part, so it could not refresh from a different mirror. Removing /var/cache/zypp did not help. After much struggle I ran out of ideas and declared Claude to be a total loss. I'll need to reinstall the OS on Claude, then restore configuration and web content from backups.

I followed basically the same procedure as for Holly. But Claude is a virtual machine with its disc storage on a file on the host system (Jacinth). Unless I want to mess with the install disc, I'm going to be doing most of the process on the host with a --root option directing packages to the mounted guest filesystem.

Preparing to reinstall the OS on Claude.

Now trying to reinstall packages on Claude.