replication of stuff in /usr/gnu
Stephen Hahn
sch at sun.com
Fri Jul 6 14:33:10 PDT 2007
* Garrett D'Amore <garrett at damore.org> [2007-07-06 13:29]:
> Stephen Hahn wrote:
> >* Garrett D'Amore <garrett at damore.org> [2007-07-06 11:54]:
> >>Stephen Hahn wrote:
> >>>* Garrett D'Amore <garrett at damore.org> [2007-07-06 10:59]:
> >>>>Stephen Hahn wrote:
> >>>>>* Garrett D'Amore <garrett at damore.org> [2007-07-03 15:12]:
> >>>>3) in the case of the GNU variants, I submit that the GNU variants of
> >>>>these trivial utilities actually have a negative impact on
> >>>>performance... the GNU utilities are clearly *larger* than the Sun
> >>>>versions, and I can only imagine that this has an increased negative
> >>>>impact on cache, etc.
> >>>>
> >>> But, to run the GNU variant, you must actually change your path to
> >>> invoke it--presumably meaning that you were willing to trade absolute
> >>> performance for a known variant (used more widely than the historic
> >>> Solaris version in most cases).
> >>>
> >>*BUT*, I may have good reasons to have GNU variants in my path first...
> >>because I prefer the GNU versions (for one reason or another... in my
> >>*particular* case I don't, but someone else might) because they offer
> >>different functionality.
> >>
> >>If this is the case, should I also pay a performance penalty for these
> >>other commands which have no difference? (In particular I'm thinking
> >>about /bin/true and similar commands which may be called from shell
> >>scripts.)
> >
> > This might be an argument for correct minimization boundaries, not for
> > exclusion from the largest set of offered components.
>
> I don't think there is *any* argument for installing the largest
> possible set of 3rd party components. There *is* an argument for
> installing the largest set of such components where they will offer an
> EOU enhancement. I think you're not getting this point... some of these
> utilities offer *no* benefit to Solaris or in portability to users
> coming from GNU systems. If there is no benefit, then why bother
> installing them in the first place.
I suppose I was thinking that, for some sites, the (not presently
existing) SUNWsunos-coreutils would be the package to eliminate. That
is, I don't believe that the surprise of not having a "true" that
matches my other command set is acceptable to these users.
> >>I agree that the impact is *small*, but why pay *any* such cost if there
> >>is no benefit in doing so?
> >>
> >
> > Because the path we take to arrive at this hypothetical "zero cost"
> > outcome was substantially more expensive than others. That appears to
> > be the crux of our disagreement: you believe that developer attention
> > should be focussed on refining this kind of integration; I don't.
> > There are larger problems to tackle.
>
> Why? Replacement of the binaries with symbolic links, or outright
> removal, would be a very, very cheap engineering effort. Probably less
> than 1 hour of engineering time. In fact, this argument has probably
> burned more time than the engineering effort involved!
No, it's worthwhile as a discussion, because it's trying to question
the "accept upstream" position that the GNU case series (and its
backing Project) holds.
As a counterpoint, an acceptable solution would be to delete the SunOS
duplicated components from ON (excepting nohup)?
> >>>>The cavalier attitude that "its only 200k" is symptomatic of a larger
> >>>>problem, which is that certain developers have stopped caring about
> >>>>performance, size, etc... the assumption is that Moore's Law overcomes
> >>>>sloppy engineering.
> >>>>
> >>>>
> >>> Actually, I don't feel that any of my reasoning in this space is
> >>> cavalier or sloppy...
> >>>
> >>>
> >>The 200k argument certainly was, IMO, cavalier. Just because you have
> >>infinite disk space (and other system resources) to burn doesn't mean
> >>everyone else does.
> >>
> >
> > I know of no supported system on which we would consider installing
> > SUNWgnu-coreutils where 200k is a factor. I know of no hypothetical
> > target system where installing SUNWgnu-coreutils couldn't be omitted
> > to achieve footprint goals.
>
> If everyone takes this attitude, then pretty soon all those 200k's add
> up. *That* is the cavalier attitude I'm having trouble with. Sort of
> like "think globally, act locally". If everyone leaves their lights on
> because "that way I won't have to turn them on when I walk into the
> room, and besides, its only 40W", then we have a statewide global energy
> crisis. Its not all that different a situation with developers and
> integrators putting everything *and* the kitchen sink on the media.
>
> Eventually its also true that we're going to hit some boundary, where
> stuff doesn't fit on a DVD. That 200k might be the difference between a
> single DVD and a multiple DVD installation in the future.
>
> Again, please try to think outside of just the single instance of 200k.
>
> Of course, if everyone says 'I don't care about 200k', then we get to
> where we are... bloated systems that struggle to run on 500MHz systems,
> and where I can no longer perform a default installation on a 4GB disk.
Let's not confuse media sizes with the set of architecturally approved
components. Additionally, let's not assert the equality of a
time-based consumption problem with a static allocation problem.
(It would have been helpful to have counterexamples.)
Isn't the answer to start exploring what goals a default install
should have, then? One set of people might like a default install to
be the minimum core from which they could assemble a useful
purpose-specific distribution; others might like the default install
to have all of their developer toolsets. Which is correct?
> >>I'm saying that this is just one more straw on the camel's back.
> >>
> >>For a large number of system utilities, I agree that the GNU versions
> >>offer an EOU enhancement. But I'm also saying that there is a
> >>significant set of GNU utilities for which this is *not* true.
> >>
> >>Understand, please, that GNU coreutils is intended for use on systems
> >>that do not otherwise have these utilities. But for systems like
> >>Solaris, which have a native version of commands like /bin/true,
> >>providing a 2nd version of the same command, which offers no difference
> >>in functionality, seems largely wasteful to me.
> >
> > I understand that, but I don't think it's a balanced assessment of
> > waste.
>
> Please explain, in one or two sentences, how having two versions of the
> true utility (or logname if you prefer, or pick one of the others I've
> identified) can be described as anything other than pointless waste.
If we are constrained to ship SunOS true by standard (or to preserve
compatibility), but a set of customers expect that a complete set of
GNU utilities is present, then I can only satisfy those requirements
by offering both. Since we had that latter aspect requested during
our discussions of coreutils, I decided to take it seriously.
> By the way, have you considered that having two versions means that
> someone has to sustain both versions? This includes QA, packaging,
> etc. Putting binaries on the system is not free in terms of human cost,
> even if you ignore machine resources.
In fact, we do spend time on this very topic in SFW, and spend a lot
of time on the upfront process work to make the production as
reproducible as any consolidation. Those processes happen primarily
at the package or entire-component level, so cost comparisons of an
individual binary object are difficult.
> >>>>But then again, I've had in the past year to make Linux work on an 8MB
> >>>>system, and had to develop a thin-client application that fit within
> >>>>512K of flash. And more recently, I've been working on IP forwarding
> >>>>performance, where each extra branch in IP costs about .1% to .2% hit
> >>>>in the number of packets per second that the system can forward.
> >>>>
> >>>>Waste is still bad. Moore's Law notwithstanding. I will still tend to
> >>>>hunt down (and destroy, as much as possible) bloat where I find it,
> >>>>particularly where that bloat serves no useful purpose.
> >>>>
> >>>>
> >>> I suspect strongly there are more rewarding veins of waste to mine.
> >>>
> >>>
> >>Probably true. But yours is also low-hanging. And more to the point, I
> >>hope to prevent *further* growth here.
> >>
> >>I would really, really like to see justification for *any* new utility
> >>added to the system. Where there is different functionality that users
> >>or layered software will notice, then I agree the EOU probably
> >>overrides. But where there is no difference, then I'd argue against
> >>wasting the system resources.
> >>
> >
> > Well, we'll be having this discussion again, I suppose, as examination
> > of this kind for "upstream integration" cases seems in itself wasteful.
>
> Maybe. But if you don't know what you're integrating, or why you are
> integrating it, then how are you testing it? How do you even know it
> works? I think you've elided a major cost of integration of software in
> your analysis.
(Huh? Are you now asserting that the components were integrated
without testing?)
My point is that our architectural choices, when dealing with an
upstream components, should be constrained: there is a cost to being
different. Once we're past big architectural aims, like compliance
with filesystem(5) or having a coherent administrative model, we need
to treat the definition of completeness as something that originates
upstream.
> > Perhaps you could come up with some guidelines for easier
> > minimization, rather than causing each proposed supported component to
> > be subject to some (testable?) assertions about waste.
>
> I do believe that a guideline is very simple. If there is some tangible
> benefit to users or to Sun in having multiple versions of a utility,
> then that is justification enough. But when there isn't any tangible
> benefit, then we should avoid duplication.
So the benefit here is "reducing differences against the upstream's
default install".
> In any case, the minimization lines are wrong here anyway. I can
> readily see cases where someone will want GNU diff, or perhaps some
> other utility that is part of coreutils, but also be extremely sensitive
> to disk space considerations.
We aren't going to be able to provide packaging boundaries for all
possible combinations of binaries. Finding a useful set of boundaries
that allow us to reach some commonly held target installations would
be an achievable engineering goal.
> I've recently been through the pain of trying to get a Solaris system to
> fit within 2GB. It couldn't be easily done, without a lot of time
> figuring out what I could safely remove, and what I couldn't, without
> impacting the system's usability as a host for running the NIC driver
> test suites. (I didn't need graphics, etc.) I spent time manually
> trying to identify cases, in some cases 200k at a time! So I'm
> particularly sensitive here.
Having that list would be helpful in various discussions, I expect.
Certainly Glynn and David have been probing equivalent choices in the
Indiana discussion (as, I presume, have the Nexenta team in their
forum).
- Stephen
--
sch at sun.com http://blogs.sun.com/sch/
More information about the opensolaris-arc
mailing list