[driver-discuss] GLD3 NIC driver performance tuning

Tom Chen chentom60 at hotmail.com
Wed Jun 3 11:39:54 PDT 2009


Paul,

The receive side is limiting the performance. Even with one netperf session, 
on the receiving side, one CPU is at 0% idle and the other is at 50% idle. 
Looking at the profiler output, most of the time is spent elsewhere, not in 
the driver. See the output:



Profiling interrupt: 8758 events in 45.139 seconds (194 events/sec)



Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller

-------------------------------------------------------------------------------

 1436  16%  16% 0.00     2372 cpu[1]                 disp_getwork

  968  11%  27% 0.00     5989 cpu[0]                 bcopy

  953  11%  38% 0.00     7363 cpu[0]                 verify_and_copy_pattern

  671   8%  46% 0.00     2824 cpu[1]                 copy_pattern

  609   7%  53% 0.00     2219 cpu[1]                 idle

  467   5%  58% 0.00     2832 cpu[1]                 copyout_more

  421   5%  63% 0.00     6747 cpu[0]                 bcopy_more

  374   4%  67% 0.00     6208 cpu[0]                 getpcstack

  259   3%  70% 0.00     4768 cpu[1]                 kmem_cache_free_debug

  203   2%  73% 0.00     7278 cpu[0]                 kmem_cache_alloc_debug

  191   2%  75% 0.00     5359 cpu[0]                 mutex_enter

  115   1%  76% 0.00     6726 cpu[0]                 tcp_rput_data

  114   1%  77% 0.00     6686 cpu[0]+6               ql_build_rx_mp

  105   1%  79% 0.00     6807 cpu[0]                 atomic_add_int_nv

   97   1%  80% 0.00     6716 cpu[0]+6               ql_ring_rx



I don't know what is done by all these "copy" routines. They seem to take 
56% of the time. We did a similar test with Intel card and reveals that 
intel card just spends 14% of the time copying. We are wondering why?

 Is this because our driver has some alignment issues in the receive path? 
we do not place the IP-header on a 4-byte boundary, due to hardware 
limitations. What the overhead on sparc will be if the driver sends packets 
with IP header on 2-byte boundary to upper layers?

Tom


----- Original Message ----- 
From: "Paul Durrant" <pdurrant at gmail.com>
To: "Tom Chen" <chentom60 at hotmail.com>
Cc: <driver-discuss at opensolaris.org>
Sent: Tuesday, June 02, 2009 9:57 AM
Subject: Re: [driver-discuss] GLD3 NIC driver performance tuning


> Tom Chen wrote:
>>
>> Is there a way in solaris to figure out how many CPUs the system has and 
>> which interrupt is assigned to which CPU? I am wondering why most of the 
>> receive interrupts are assigned to CPU 1 in our test sometimes. I wish I 
>> could assign different interrupts to different CPUs.
>>
>
> From within a driver? You can look at ncpus to tell you how many CPUs you 
> have. As for interrupt -> CPU mapping; that can change dynamically (e.g. 
> when a CPU is offlined) so relying on the mapping is risky. As for making 
> sure interrupts get spread; there is a policy tweakable somewhere in the 
> APIC code (can't remember where) and you need to make sure it's set to 
> round-robin interrupt assignment; I'm assuming you're using MSI-X on a 
> recent Solaris build.
>
>   Paul
> 




More information about the driver-discuss mailing list