replication of stuff in /usr/gnu

Garrett D'Amore garrett at damore.org
Fri Jul 6 13:26:23 PDT 2007


Stephen Hahn wrote:
> * Garrett D'Amore <garrett at damore.org> [2007-07-06 11:54]:
>   
>> Stephen Hahn wrote:
>>     
>>> * Garrett D'Amore <garrett at damore.org> [2007-07-06 10:59]:
>>>       
>>>> Stephen Hahn wrote:
>>>>         
>>>>> * Garrett D'Amore <garrett at damore.org> [2007-07-03 15:12]:
>>>>> (Although you find --help and --version pointless, many *ix users now
>>>>> expect commands to process these options.)
>>>>>      
>>>>>           
>>>> I would expect that to be true for commands that actually have some 
>>>> non-trivial purpose.  E.g. for GNU grep this argument holds water.  But 
>>>> for "true" to have --version and --help?  It seems a bit of a stretch.
>>>>
>>>> If we expect *all* commands to support this, then we should just 
>>>> integrate that into our own command set.  But I think that this is not a 
>>>> realistic approach.
>>>>    
>>>>         
>>>  (It seems this week that all threads lead to CLIP.)  The CLIP spec
>>>  specifically asks for help and version options in 6.1 and 6.2.  That
>>>  is, all long option commands should have --version and --help, and
>>>  that short option ("getopt-compliant") commands should support -V and
>>>  -?.
>>>
>>>  So, we've already elected that approach.
>>>       
>> Even for commands that take *no* options?  That's a bit of a surprise!
>>     
>  
>   The CLIP classification appears to place a command with no options in
>   the "getopt-compliant" class.  
>   

I should review that spec then, I suppose.  There maybe conflict here 
with POSIX and backwards compatibility (some of the commands listed take 
an optional argument, but do not use getopt.  That would lead to 
different treatment of options like -?.)


>   
>>>> 3) in the case of the GNU variants, I submit that the GNU variants of 
>>>> these trivial utilities actually have a negative impact on 
>>>> performance... the GNU utilities are clearly *larger* than the Sun 
>>>> versions, and I can only imagine that this has an increased negative 
>>>> impact on cache, etc.
>>>>    
>>>>         
>>>  But, to run the GNU variant, you must actually change your path to
>>>  invoke it--presumably meaning that you were willing to trade absolute
>>>  performance for a known variant (used more widely than the historic
>>>  Solaris version in most cases).
>>>       
>> *BUT*, I may have good reasons to have GNU variants in my path first... 
>> because I prefer the GNU versions (for one reason or another... in my 
>> *particular* case I don't, but someone else might) because they offer 
>> different functionality.
>>
>> If this is the case, should I also pay a performance penalty for these 
>> other commands which have no difference?  (In particular I'm thinking 
>> about /bin/true and similar commands which may be called from shell 
>> scripts.)
>>     
>  
>   This might be an argument for correct minimization boundaries, not for
>   exclusion from the largest set of offered components.  (As an aside,
>   the shells have direct support for most of the commands we're
>   discussing...)
>   

Yes, I understand that most shells do have builtins for some of the 
commands.

I don't think there is *any* argument for installing the largest 
possible set of 3rd party components.  There *is* an argument for 
installing the largest set of such components where they will offer an 
EOU enhancement.  I think you're not getting this point... some of these 
utilities offer *no* benefit to Solaris or in portability to users 
coming from GNU systems.  If there is no benefit, then why bother 
installing them in the first place.


>> Note on shared systems (such as Sun Ray servers), the cache impact 
>> affects *multiple* users, the performance impact is not limited to just 
>> the user using the GNU program.
>>     
>
>   Again, this scenario argues against any additions to the system,
>   because any potential application changes the working set for whatever
>   cache you'd like to examine.
>   

It argues against *senseless* additions.  When there is value added, 
then by all means, we should add away.  But when there is no conceivable 
benefit, then why are we doing it?

>   
>> I agree that the impact is *small*, but why pay *any* such cost if there 
>> is no benefit in doing so?
>>     
>  
>   Because the path we take to arrive at this hypothetical "zero cost"
>   outcome was substantially more expensive than others.  That appears to
>   be the crux of our disagreement:  you believe that developer attention
>   should be focussed on refining this kind of integration; I don't.
>   There are larger problems to tackle.
>   

Why?  Replacement of the binaries with symbolic links, or outright 
removal, would be a very, very cheap engineering effort.  Probably less 
than 1 hour of engineering time.  In fact, this argument has probably 
burned more time than the engineering effort involved!

>   
>>>> The cavalier attitude that "its only 200k" is symptomatic of a larger 
>>>> problem, which is that certain developers have stopped caring about 
>>>> performance, size, etc... the assumption is that Moore's Law overcomes 
>>>> sloppy engineering.
>>>>    
>>>>         
>>>  Actually, I don't feel that any of my reasoning in this space is
>>>  cavalier or sloppy...
>>>  
>>>       
>> The 200k argument certainly was, IMO, cavalier.  Just because you have 
>> infinite disk space (and other system resources) to burn doesn't mean 
>> everyone else does.
>>     
>  
>   I know of no supported system on which we would consider installing
>   SUNWgnu-coreutils where 200k is a factor.  I know of no hypothetical
>   target system where installing SUNWgnu-coreutils couldn't be omitted
>   to achieve footprint goals.  
>   

If everyone takes this attitude, then pretty soon all those 200k's add 
up.  *That* is the cavalier attitude I'm having trouble with.  Sort of 
like "think globally, act locally".  If everyone leaves their lights on 
because "that way I won't have to turn them on when I walk into the 
room, and besides, its only 40W", then we have a statewide global energy 
crisis.  Its not all that different a situation with developers and 
integrators putting everything *and* the kitchen sink on the media.

Eventually its also true that we're going to hit some boundary, where 
stuff doesn't fit on a DVD.  That 200k might be the difference between a 
single DVD and a multiple DVD installation in the future.

Again, please try to think outside of just the single instance of 200k.

Of course, if everyone says 'I don't care about 200k', then we get to 
where we are... bloated systems that struggle to run on 500MHz systems, 
and where I can no longer perform a default installation on a 4GB disk.

>   
>> I'm saying that this is just one more straw on the camel's back.
>>
>> For a large number of system utilities, I agree that the GNU versions 
>> offer an EOU enhancement.  But I'm also saying that there is a 
>> significant set of GNU utilities for which this is *not* true.
>>
>> Understand, please, that GNU coreutils is intended for use on systems 
>> that do not otherwise have these utilities.  But for systems like 
>> Solaris, which have a native version of commands like /bin/true, 
>> providing a 2nd version of the same command, which offers no difference 
>> in functionality, seems largely wasteful to me.
>>     
>  
>   I understand that, but I don't think it's a balanced assessment of
>   waste.
>   

Please explain, in one or two sentences, how having two versions of the 
true utility (or logname if you prefer, or pick one of the others I've 
identified) can be described as anything other than pointless waste.

By the way, have you considered that having two versions means that 
someone has to sustain both versions?  This includes QA, packaging, 
etc.  Putting binaries on the system is not free in terms of human cost, 
even if you ignore machine resources.

>   
>>>> But then again, I've had in the past year to make Linux work on an 8MB 
>>>> system, and had to develop a thin-client application that fit within 
>>>> 512K of flash.  And more recently, I've been working on IP forwarding 
>>>> performance, where each extra branch in IP costs about .1% to .2% hit in 
>>>> the number of packets per second that the system can forward.
>>>>
>>>> Waste is still bad.  Moore's Law notwithstanding.  I will still tend to 
>>>> hunt down (and destroy, as much as possible) bloat where I find it, 
>>>> particularly where that bloat serves no useful purpose.
>>>>    
>>>>         
>>>  I suspect strongly there are more rewarding veins of waste to mine.
>>>  
>>>       
>> Probably true.  But yours is also low-hanging.  And more to the point, I 
>> hope to prevent *further* growth here.
>>
>> I would really, really like to see justification for *any* new utility 
>> added to the system.  Where there is different functionality that users 
>> or layered software will notice, then I agree the EOU probably 
>> overrides.  But where there is no difference, then I'd argue against 
>> wasting the system resources.
>>     
>
>   Well, we'll be having this discussion again, I suppose, as examination
>   of this kind for "upstream integration" cases seems in itself wasteful.
>   

Maybe.  But if you don't know what you're integrating, or why you are 
integrating it, then how are you testing it?  How do you even know it 
works?  I think you've elided a major cost of integration of software in 
your analysis.

>   Perhaps you could come up with some guidelines for easier
>   minimization, rather than causing each proposed supported component to
>   be subject to some (testable?) assertions about waste.
>   

I do believe that a guideline is very simple.  If there is some tangible 
benefit to users or to Sun in having multiple versions of a utility, 
then that is justification enough.  But when there isn't any tangible 
benefit, then we should avoid duplication.

In any case, the minimization lines are wrong here anyway.  I can 
readily see cases where someone will want GNU diff, or perhaps some 
other utility that is part of coreutils, but also be extremely sensitive 
to disk space considerations.

I've recently been through the pain of trying to get a Solaris system to 
fit within 2GB.  It couldn't be easily done, without a lot of time 
figuring out what I could safely remove, and what I couldn't, without 
impacting the system's usability as a host for running the NIC driver 
test suites.  (I didn't need graphics, etc.)  I spent time manually 
trying to identify cases, in some cases 200k at a time!  So I'm 
particularly sensitive here.

    -- Garrett




More information about the opensolaris-arc mailing list