2006/525: New Boot Sparc
Lori Alt
Lori.Alt at sun.com
Thu Aug 2 17:19:09 PDT 2007
Jan Setje-Eilers wrote:
> Thanks to everyone who commented so far.
>
> Here's an attempt summarize what I've gathered so far on the archive
>
> The primary concern is the editable files in the archive, not the ones
>delivered and managed by the install and patching tools.
>
> Further, the most serious concern is the system stopping during boot
>and requiring the administrator to sign off on its state. This is
>followed by the concern about the effect of using old data.
>
> It sounds like it would be acceptable for whatever tool is used to
>update these files to do whatever sync is needed atomically as part of
>the update. Sadly for most of these files this tool is still a text
>editor and we don't get to fix that in a patch release. :(
>
> However it seems like the files of concern fall into a category of
>their own that is currently being dealt with in a manner that should
>actually be reserved for out of sync kernel binaries. This suggest we
>have the following three types of files in the archive:
>
> 1) Kernel binaries (including modules and drivers)
>
> If any of these are out of sync and not just new, the concern
> is that things with miss-matched interfaces may be running and
> the kernel may act unpredictably.
>
> So, for these we stop hard in a panic like fashion and refuse
> to mount root until someone active takes responsibility for
> doing so.
>
> This is the classic check.
>
> 2) Files that are either caches or only grow or can be safely re-read
> later.
>
> These are what's currently in filelist.safe. These files do
> not cause the system to stop during boot and trigger an
> archive update later in boot. This is the refinement that went
> back into nv44 and u4.
>
> 3) Files that etc/system that contain information that can't always
> be usefully processed later during boot, but if out of date _do
> not_ leave the system in a dangerously unstable state.
>
> These files are currently being treated just like the kernel
> binaries. However since the system is not dangerously unstable
> at this point, it is reasonable to drive on and mount root
> read-write.
>
> This means they should really get their own check.
>
> If this check fails, I propose the following:
>
> Print a warning to console.
>
> Leave a service (which won't block multi-user) in
> maintenance mode so the state is communicated via svcs
> -x.
>
> Drive on and mount root read-write.
>
If the root is zfs, it will already have been mounted read-write
(there is no need to do a read-only mount since zfs has no fsck.)
so no need for a remount at this point. I don't think this changes
the overall logic here though.
>
> Update the archive.
>
> And potentially reboot immediately to the device we
> just booted from now that the archive is updated.
>
> The auto-reboot still makes us a little nervous, so it may be
> something that needs to be explicitly enabled based on site
> policy, but at least on sparc we have a solid idea of what
> boot device we booted from, so it may turn out to a reasonable
> default action to take.
>
> Clearly the exact service dependencies will differ depending
> on whether or not the system will automatically reboot to
> pick up the changed files.
>
> Ideas, thoughts, comments?
>
>-jan
>
>
>
>
More information about the opensolaris-arc
mailing list