An interesting problem with opensolaris 2008.05 under qemu I have been fighting against today; it’s still beating me.
In my qemu image I had fortunately cloned the rootfs (must write a post on how to do that one day) which was very fortunate, as I was running off the clone I had the origional system available to boot from. If you are not running off a cloned image this recovery trick won’t help you at all though.
The problem itself is that damb boot_archive corruption issue that keeps appearing in blogs.
My problem is slightly different in that it keeps happening on every reboot, when the boot archive is written at shutdown its about a tenth of the size needed to be usable.
Anyway, I had a corruption on the clone, it wouldn’t boot saying the boot_archive was not found. So I…
- booted off the origional grub entry
- mounted my clone, mount -F zfs root/RPOOL/install-clone /mnt
- checked /mnt/platform/i86pc/boot_archive, gosh it was small; deleted it
- bootadm update-archive -R /mnt which said it updated both boot_archive and amd64/boot_archive
- checked, WTF???; the amd subdirectory one had the current timestamp but there was no boot_archive created in the main i86pc directory
- copied /platform/i86pc/boot_archive to /mnt/platform/i86pc/boot_archive
- another bootadm update-archive -R /mnt which did nothing, all up to date now ?
- umounted /mnt, a couple of sync’s, and shutdown from the desktop menu option in case my habbit os using init 5/6 was causing the issue
The cloned image then booted perfectly.
Did nothing with it, just shutdown again from the desktop menu option. It did not at any time during the shutdown mention it was updating the boot_archive, thats important.
Booted off the clone entry again. Same problem, no boot_archive found.
Booted of the origional grub entry again, mounted the clone on /mnt again, had a look, and the boot_archive had shrunk back down again to the size it was before I fixed it. So copied the working boot_archive back into the clone I had mounted, again. Shutdown, and booted off the clone image, and yes it worked just perfectly again; did a bootadm update_archive while it was running, and still nothing was updated.
Shutdown the clone image again. Haven’t had the heart to try to boot off it again yet. Something to continue with later I guess. Wasted enough time for today. Its a concern though. If it fails again I will just use zfs to destroy the clone and create another one from scratch to see if it keeps occurring; later.
Fortunately, I like playing with this sort of thing. While frustrating I am learning more about how opensolaris works with every new problem; which is a good thing, I suppose; although I only pull down the supposedly stable releases.