2006/525: New Boot Sparc

Jan Setje-Eilers setje at smack.eng.sun.com
Thu Aug 2 15:58:00 PDT 2007


 Thanks to everyone who commented so far.

 Here's an attempt summarize what I've gathered so far on the archive

 The primary concern is the editable files in the archive, not the ones
delivered and managed by the install and patching tools.

 Further, the most serious concern is the system stopping during boot
and requiring the administrator to sign off on its state. This is
followed by the concern about the effect of using old data.

 It sounds like it would be acceptable for whatever tool is used to
update these files to do whatever sync is needed atomically as part of
the update. Sadly for most of these files this tool is still a text
editor and we don't get to fix that in a patch release. :(

 However it seems like the files of concern fall into a category of
their own that is currently being dealt with in a manner that should
actually be reserved for out of sync kernel binaries. This suggest we
have the following three types of files in the archive:

 1) Kernel binaries (including modules and drivers)

	If any of these are out of sync and not just new, the concern
	is that things with miss-matched interfaces may be running and
	the kernel may act unpredictably.

	So, for these we stop hard in a panic like fashion and refuse
	to mount root until someone active takes responsibility for
	doing so.

	This is the classic check.

 2) Files that are either caches or only grow or can be safely re-read
    later. 

	These are what's currently in filelist.safe. These files do
	not cause the system to stop during boot and trigger an
	archive update later in boot. This is the refinement that went
	back into nv44 and u4.

 3) Files that etc/system that contain information that can't always
    be usefully processed later during boot, but if out of date _do
    not_ leave the system in a dangerously unstable state.

	These files are currently being treated just like the kernel
	binaries. However since the system is not dangerously unstable
	at this point, it is reasonable to drive on and mount root
	read-write.

	This means they should really get their own check.

	If this check fails, I propose the following:

		Print a warning to console.

		Leave a service (which won't block multi-user) in
		maintenance mode so the state is communicated via svcs
		-x.

		Drive on and mount root read-write.

		Update the archive.

		And potentially reboot immediately to the device we
		just booted from now that the archive is updated.

	The auto-reboot still makes us a little nervous, so it may be
	something that needs to be explicitly enabled based on site
	policy, but at least on sparc we have a solid idea of what
	boot device we booted from, so it may turn out to a reasonable
	default action to take.

	Clearly the exact service dependencies will differ depending
	on whether or not the system will automatically reboot to
	pick up the changed files.

 Ideas, thoughts, comments?

-jan





More information about the opensolaris-arc mailing list