[driver-discuss] coherent driver event logging behavior

Dana H. Myers Dana.Myers at Sun.COM
Fri May 18 16:14:39 PDT 2007


Garrett D'Amore wrote:
> Dana H. Myers wrote:
>> Garrett D'Amore wrote:
>>> Dana H. Myers wrote:
>>>> Pete Bentley wrote:
>>>>>
>>>>> Log file spam from a flapping link seems to me to be in the same 
>>>>> category as the spam you get from a disk with retriable errors... 
>>>>> It's irksome that it fills the log, but you wouldn't want to miss it.
>>>> A disk with retry-able errors, you might replace the disk.  A 
>>>> flapping link... you
>>>> might not be able to do anything other than ifconfig down that 
>>>> interface or just
>>>> unplug it.  So you end up with a log full of messages that 
>>>> something isn't right
>>>> in the world, and there's really nothing you can do about it.
>>>
>>> ifconfig down or replace it is _precisely_ the action you want to 
>>> take -- further, you really want to know that you need to take the 
>>> action.
>> Replace *what* ?  It's not clear from the message what's wrong at all.
>> ifconfiging it down is just a way to make the messages stop, and doesn't
>> fix anything.
>
> Replace the cable, nic, or link partner.  Which is a matter for 
> further analysis.  Ifconfig down is the right action to take if you 
> have a link that you aren't using that is disconnected.  If its a link 
> you care about, then you need to resolve the problem.
The message in the log doesn't necessarily indicate a problem, and 
there's always a
couple of messages generated on system boot even if nothing at all is 
wrong.  So
we have these messages being generated that usually indicate nothing is 
wrong,
but sometimes indicate something is wrong - it depends on how often they 
happen
and why.

If a user happens to look in /var/adm/messages, how can they tell the 
difference?

This reminds me of when I told someone to check for an interrupt storm.  
They
asked me "how can I tell if I'm having an interrupt storm?".  I tried 
explaining
the difference between a high rate of interrupts - which is expected 
sometimes - and
a storm.  It wasn't as easy as it sounds.  Just getting link messages is 
not abnormal,
and a flapping link doesn't mean that there's anything wrong with the 
Solaris box.

It would perhaps be more interesting to log a message when link status
changes "too often" - but then that requires one to decide how often 
that is.

I'm not really passionate about the link status messages because I'm adept
at figuring when I can ignore them, but they're a fair example of something
that border on annoying most of the time and are infrequently useful.  Every
driver could have a few things like this if we're not careful, which is 
really my
point.  Other than legacy status, why is this particular message any 
more entitled
than dozens of others?

Dana




More information about the driver-discuss mailing list