replication of stuff in /usr/gnu

Stephen Hahn sch at sun.com
Fri Jul 6 14:33:10 PDT 2007


* Garrett D'Amore <garrett at damore.org> [2007-07-06 13:29]:
> Stephen Hahn wrote:
> >* Garrett D'Amore <garrett at damore.org> [2007-07-06 11:54]:
> >>Stephen Hahn wrote:
> >>>* Garrett D'Amore <garrett at damore.org> [2007-07-06 10:59]:
> >>>>Stephen Hahn wrote:
> >>>>>* Garrett D'Amore <garrett at damore.org> [2007-07-03 15:12]:
> >>>>3) in the case of the GNU variants, I submit that the GNU variants of 
> >>>>these trivial utilities actually have a negative impact on 
> >>>>performance... the GNU utilities are clearly *larger* than the Sun 
> >>>>versions, and I can only imagine that this has an increased negative 
> >>>>impact on cache, etc.
> >>>>        
> >>> But, to run the GNU variant, you must actually change your path to
> >>> invoke it--presumably meaning that you were willing to trade absolute
> >>> performance for a known variant (used more widely than the historic
> >>> Solaris version in most cases).
> >>>      
> >>*BUT*, I may have good reasons to have GNU variants in my path first... 
> >>because I prefer the GNU versions (for one reason or another... in my 
> >>*particular* case I don't, but someone else might) because they offer 
> >>different functionality.
> >>
> >>If this is the case, should I also pay a performance penalty for these 
> >>other commands which have no difference?  (In particular I'm thinking 
> >>about /bin/true and similar commands which may be called from shell 
> >>scripts.)
> > 
> >  This might be an argument for correct minimization boundaries, not for
> >  exclusion from the largest set of offered components.
> 
> I don't think there is *any* argument for installing the largest 
> possible set of 3rd party components.  There *is* an argument for 
> installing the largest set of such components where they will offer an 
> EOU enhancement.  I think you're not getting this point... some of these 
> utilities offer *no* benefit to Solaris or in portability to users 
> coming from GNU systems.  If there is no benefit, then why bother 
> installing them in the first place.

  I suppose I was thinking that, for some sites, the (not presently
  existing) SUNWsunos-coreutils would be the package to eliminate.  That
  is, I don't believe that the surprise of not having a "true" that
  matches my other command set is acceptable to these users.

> >>I agree that the impact is *small*, but why pay *any* such cost if there 
> >>is no benefit in doing so?
> >>    
> > 
> >  Because the path we take to arrive at this hypothetical "zero cost"
> >  outcome was substantially more expensive than others.  That appears to
> >  be the crux of our disagreement:  you believe that developer attention
> >  should be focussed on refining this kind of integration; I don't.
> >  There are larger problems to tackle.
> 
> Why?  Replacement of the binaries with symbolic links, or outright 
> removal, would be a very, very cheap engineering effort.  Probably less 
> than 1 hour of engineering time.  In fact, this argument has probably 
> burned more time than the engineering effort involved!

  No, it's worthwhile as a discussion, because it's trying to question
  the "accept upstream" position that the GNU case series (and its
  backing Project) holds.  

  As a counterpoint, an acceptable solution would be to delete the SunOS
  duplicated components from ON (excepting nohup)?
  
> >>>>The cavalier attitude that "its only 200k" is symptomatic of a larger 
> >>>>problem, which is that certain developers have stopped caring about 
> >>>>performance, size, etc... the assumption is that Moore's Law overcomes 
> >>>>sloppy engineering.
> >>>>   
> >>>>        
> >>> Actually, I don't feel that any of my reasoning in this space is
> >>> cavalier or sloppy...
> >>> 
> >>>      
> >>The 200k argument certainly was, IMO, cavalier.  Just because you have 
> >>infinite disk space (and other system resources) to burn doesn't mean 
> >>everyone else does.
> >>    
> > 
> >  I know of no supported system on which we would consider installing
> >  SUNWgnu-coreutils where 200k is a factor.  I know of no hypothetical
> >  target system where installing SUNWgnu-coreutils couldn't be omitted
> >  to achieve footprint goals.  
> 
> If everyone takes this attitude, then pretty soon all those 200k's add 
> up.  *That* is the cavalier attitude I'm having trouble with.  Sort of 
> like "think globally, act locally".  If everyone leaves their lights on 
> because "that way I won't have to turn them on when I walk into the 
> room, and besides, its only 40W", then we have a statewide global energy 
> crisis.  Its not all that different a situation with developers and 
> integrators putting everything *and* the kitchen sink on the media.
> 
> Eventually its also true that we're going to hit some boundary, where 
> stuff doesn't fit on a DVD.  That 200k might be the difference between a 
> single DVD and a multiple DVD installation in the future.
> 
> Again, please try to think outside of just the single instance of 200k.
> 
> Of course, if everyone says 'I don't care about 200k', then we get to 
> where we are... bloated systems that struggle to run on 500MHz systems, 
> and where I can no longer perform a default installation on a 4GB disk.
 
  Let's not confuse media sizes with the set of architecturally approved
  components.  Additionally, let's not assert the equality of a
  time-based consumption problem with a static allocation problem.

  (It would have been helpful to have counterexamples.)

  Isn't the answer to start exploring what goals a default install
  should have, then?  One set of people might like a default install to
  be the minimum core from which they could assemble a useful
  purpose-specific distribution; others might like the default install
  to have all of their developer toolsets.  Which is correct?

> >>I'm saying that this is just one more straw on the camel's back.
> >>
> >>For a large number of system utilities, I agree that the GNU versions 
> >>offer an EOU enhancement.  But I'm also saying that there is a 
> >>significant set of GNU utilities for which this is *not* true.
> >>
> >>Understand, please, that GNU coreutils is intended for use on systems 
> >>that do not otherwise have these utilities.  But for systems like 
> >>Solaris, which have a native version of commands like /bin/true, 
> >>providing a 2nd version of the same command, which offers no difference 
> >>in functionality, seems largely wasteful to me.
> > 
> >  I understand that, but I don't think it's a balanced assessment of
> >  waste.
> 
> Please explain, in one or two sentences, how having two versions of the 
> true utility (or logname if you prefer, or pick one of the others I've 
> identified) can be described as anything other than pointless waste.

  If we are constrained to ship SunOS true by standard (or to preserve
  compatibility), but a set of customers expect that a complete set of
  GNU utilities is present, then I can only satisfy those requirements
  by offering both.  Since we had that latter aspect requested during
  our discussions of coreutils, I decided to take it seriously.

> By the way, have you considered that having two versions means that 
> someone has to sustain both versions?  This includes QA, packaging, 
> etc.  Putting binaries on the system is not free in terms of human cost, 
> even if you ignore machine resources.
 
  In fact, we do spend time on this very topic in SFW, and spend a lot
  of time on the upfront process work to make the production as
  reproducible as any consolidation.  Those processes happen primarily
  at the package or entire-component level, so cost comparisons of an
  individual binary object are difficult.

> >>>>But then again, I've had in the past year to make Linux work on an 8MB 
> >>>>system, and had to develop a thin-client application that fit within 
> >>>>512K of flash.  And more recently, I've been working on IP forwarding 
> >>>>performance, where each extra branch in IP costs about .1% to .2% hit 
> >>>>in the number of packets per second that the system can forward.
> >>>>
> >>>>Waste is still bad.  Moore's Law notwithstanding.  I will still tend to 
> >>>>hunt down (and destroy, as much as possible) bloat where I find it, 
> >>>>particularly where that bloat serves no useful purpose.
> >>>>   
> >>>>        
> >>> I suspect strongly there are more rewarding veins of waste to mine.
> >>> 
> >>>      
> >>Probably true.  But yours is also low-hanging.  And more to the point, I 
> >>hope to prevent *further* growth here.
> >>
> >>I would really, really like to see justification for *any* new utility 
> >>added to the system.  Where there is different functionality that users 
> >>or layered software will notice, then I agree the EOU probably 
> >>overrides.  But where there is no difference, then I'd argue against 
> >>wasting the system resources.
> >>    
> >
> >  Well, we'll be having this discussion again, I suppose, as examination
> >  of this kind for "upstream integration" cases seems in itself wasteful.
> 
> Maybe.  But if you don't know what you're integrating, or why you are 
> integrating it, then how are you testing it?  How do you even know it 
> works?  I think you've elided a major cost of integration of software in 
> your analysis.

  (Huh?  Are you now asserting that the components were integrated
  without testing?)

  My point is that our architectural choices, when dealing with an
  upstream components, should be constrained:  there is a cost to being
  different.  Once we're past big architectural aims, like compliance
  with filesystem(5) or having a coherent administrative model, we need
  to treat the definition of completeness as something that originates
  upstream.

> >  Perhaps you could come up with some guidelines for easier
> >  minimization, rather than causing each proposed supported component to
> >  be subject to some (testable?) assertions about waste.
> 
> I do believe that a guideline is very simple.  If there is some tangible 
> benefit to users or to Sun in having multiple versions of a utility, 
> then that is justification enough.  But when there isn't any tangible 
> benefit, then we should avoid duplication.
 
  So the benefit here is "reducing differences against the upstream's
  default install".  
  
> In any case, the minimization lines are wrong here anyway.  I can 
> readily see cases where someone will want GNU diff, or perhaps some 
> other utility that is part of coreutils, but also be extremely sensitive 
> to disk space considerations.

  We aren't going to be able to provide packaging boundaries for all
  possible combinations of binaries.  Finding a useful set of boundaries
  that allow us to reach some commonly held target installations would
  be an achievable engineering goal.

> I've recently been through the pain of trying to get a Solaris system to 
> fit within 2GB.  It couldn't be easily done, without a lot of time 
> figuring out what I could safely remove, and what I couldn't, without 
> impacting the system's usability as a host for running the NIC driver 
> test suites.  (I didn't need graphics, etc.)  I spent time manually 
> trying to identify cases, in some cases 200k at a time!  So I'm 
> particularly sensitive here.

  Having that list would be helpful in various discussions, I expect.
  Certainly Glynn and David have been probing equivalent choices in the
  Indiana discussion (as, I presume, have the Nexenta team in their
  forum).

  - Stephen

-- 
sch at sun.com  http://blogs.sun.com/sch/



More information about the opensolaris-arc mailing list