[driver-discuss] Am I understanding this correctly? -- potential e1000g bug

Kerry Shu Kerry.Shu at Sun.COM
Thu Sep 17 09:45:04 PDT 2009



Jason King wrote:
> On Thu, Sep 17, 2009 at 10:16 AM, Garrett D'Amore <gdamore at sun.com> wrote:
>> Look closely at the stack.  You'll notice that a PIL9 interrupt
>> *interrupted* e1000g while it was servicing an interrupt.  I don't think
>> e1000g is at fault here.  Something else is doing it.
> 
> This is probably my lack of knowledge about how solaris handles
> interrupts, but with doing a little digging:
> 
>>  0xffffff0007c49c60::findstack -v
> stack pointer for thread ffffff0007c49c60: ffffff0007c49b30
>   ffffff0007c49bb0 rm_isr+0xaa()
>   ffffff0007c49c00 av_dispatch_autovect+0x7c(10)
>   ffffff0007c49c40 dispatch_hardint+0x33(10, 6)
>   ffffff0007c4f450 switch_sp_and_call+0x13()
>   ffffff0007c4f4a0 do_interrupt+0x9e(ffffff0007c4f4b0, b)
>   ffffff0007c4f4b0 _interrupt+0xba()
> 
> I'm assuming this portion of the stack dump is what you're talking
> about... looking at the function signature for dispatch_hardint -- the
> new vector is 10, and the old ipl is 6.
> 
>> ::interrupts -d
> IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# Driver Name(s)
> 3    0xb1 12  ISA    Edg Fixed  0   1     0x0/0x3   asy#1
> 4    0xb0 12  ISA    Edg Fixed  0   1     0x0/0x4   asy#0
> 6    0x41 5   ISA    Edg Fixed  0   1     0x0/0x6   fdc#0
> 7    0x42 5   ISA    Edg Fixed  1   1     0x0/0x7   ecpp#0
> 9    0x81 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
> 15   0x43 5   ISA    Edg Fixed  0   1     0x0/0xf   ata#1
> 16   0x83 9   PCI    Lvl Fixed  1   4     0x0/0x10  hci1394#0, uhci#3, uhci#0,
> nvidia#0
> 17   0x87 8   PCI    Lvl Fixed  0   1     0x0/0x11  audio810#0
> 18   0x86 9   PCI    Lvl Fixed  1   1     0x0/0x12  pci-ide#1
> 19   0x85 9   PCI    Lvl Fixed  0   1     0x0/0x13  uhci#1
> 23   0x84 9   PCI    Lvl Fixed  1   1     0x0/0x17  ehci#0
> 26   0x40 5   PCI    Lvl Fixed  1   1     0x1/0x2   aac#0
> 48   0x60 6   PCI    Lvl Fixed  1   1     0x2/0x0   e1000g#0
> 72   0x82 7   PCI    Edg MSI    0   1     -         pcie_pci#0
> 73   0x30 4   PCI    Edg MSI    0   1     -         pcie_pci#2
> 74   0x44 5   PCI    Edg MSI    0   1     -         adpu320#0
> 160  0xa0 0          Edg IPI    all 0     -         poke_cpu
> 192  0xc0 13         Edg IPI    all 1     -         xc_serv
> 208  0xd0 14         Edg IPI    all 1     -         kcpc_hw_overflow_intr
> 209  0xd1 14         Edg IPI    all 1     -         cbe_fire
> 210  0xd3 14         Edg IPI    all 1     -         cbe_fire
> 240  0xe0 15         Edg IPI    all 1     -         xc_serv
> 241  0xe1 15         Edg IPI    all 1     -         apic_error_intr
> 
> That makes sense -- e1000g#0 is IPL 6, however shouldn't there then be
> an entry somewhere in there with a VECT value of 0x0a and an IPL of 9?
>  Or do i still have more learning to do?
> 

What you are looking for is 0x10, not 0x0a. Looks to me, here you have
IRQ# 16 interrupt (might be either hci1394#0, uhci#3, uhci#0, or
nvidia#0) preempting e1000g#0 interrupt. I guess such situation happened
frequently since you felt system freeze. So are you running something
that let both e1000g0 and other 4 driver instances at IRQ# 16 busy? For
example, are you putting heavy load on both network and graphics?

Regards,
Kerry


More information about the driver-discuss mailing list