2009/235 dladm Possible Values List

Garrett D'Amore gdamore at sun.com
Tue Apr 14 10:35:04 PDT 2009


Mike Shapiro wrote:
> On Tue, Apr 14, 2009 at 10:08:58AM -0700, Garrett D'Amore wrote:
>   
>> Michael Shapiro wrote:
>>     
>>> This case currently does not address the issue of how an administrator
>>> or layered software determines the optimal large MTU, as opposed to
>>> the maximum MTU.  The two are not always the same.  For example, on
>>> Neptune (nxge), the maximum is 9000, but the optimal large MTU is 8150,
>>> because of the size of the DMA transfers the card does in hardware.
>>> This case needs to address this issue explicitly, by either:
>>>
>>> (a) Defining an additional interface by which the optimal value
>>>    can be returned from the driver as another attribute, OR
>>>
>>> (b) Making extremely clear to driver writers that if a large MTU
>>>    size less than the maximum is more optimal than the maximum,
>>>    that the optimal size should be returned by this interface.
>>>
>>> My preference is for option (a), but others should weigh in.
>>>
>>> -Mike
>>>
>>>  
>>>       
>> This feels like a "hardware tuning" element.
>>     
>
> That's because it is a hardware tuning element :)
>
>   
>> MTU configuration generally shouldn't need to worry about page sizes and 
>> such.  An extra DMA transfer is usually in the "noise" as far as 
>> overheads of network processing are concerned.
>>     
>
> Not if you're trying to maximize performance of a heavily
> loaded server, which is precisely when this matters most.
>   

But if you wind up having to do two packet exchanges because you had to 
break a transfer up into a smaller size, that penalty (I hypothesize) 
will hurt you more than the cost of the extra DMA to transfer the packet 
all at once would.

Now, as Jim Carlson already pointed out, if you know something about the 
hardware, and its true for *all* your L2 peers (that's critical!) on the 
network, then you can choose a slightly lower value that allows "most" 
packets to avoid an extra cost (DMA or otherwise), then that's probably 
worthwhile.

>  
>   
>> More specifically, large MTUs are intended, as I see it, to minimize the 
>> effect of per-packet overheads found in NIC hardware, switches, routers, 
>> and most especially *hosts*.  (I.e. the TCP/IP stack overheads.)   I 
>> suspect that because of these overheads, that in general the largest MTU 
>> you can configure is always "optimal", even if the hardware has to 
>> perform some extra DMA transfers to make them happen.
>>
>> Unless some specific real world tests (e.g. TCP throughput or UDP stress 
>> tests) show otherwise, I'm disinclined to believe that there is any 
>> reason an administrator would need to know about the underlying hardware 
>> DMA limitations, or that the "optimum" value is anything other than the 
>> largest supported value.
>>
>>    -- Garrett
>>     
>
> It makes a rather large difference in the absolute performance
> numbers we achieve on the 7000 series.  This is because you have
> a system which is executing 16 cores at 100% cpu bound when it's
> fully loaded and therefore cutting down on transfers and extraneous
> packet processing makes a rather huge difference.
>   

But the larger MTU should reduce the total number of packets.

Of course, if you never exchange a packet that is over 8K, but have to 
pay extra overheads to *support* the *possibility* of a 9K packet... 
then that's a different problem.

> Here's the actual point: the knowledge of this optimal size is a
> function of the hardware and the driver.  Our customers spend a lot
> of time complaining about the fact that the out-of-the-box tuning
> of networking on Solaris is sub-optimal and hard to understand how to do
> better.  Jumbo MTU isn't an area we can easily enable by default,
> but we can make it a lot easier to achieve the fastest setting.
> That knowledge belongs in the driver source, not on some out-of-date
> wiki page where everyone has to waste more time googling around
> and experimenting to figure it out.
>   

Jumbo MTU is almost *always* better than default.  The tweaking of 
values to hardware DMA engines requires tweaking not just one driver, 
but the whole network.  And what's true for one driver/hardware 
combination might not be true for another.

I believe that in general, the highest supported MTU (and 9000 is often 
used as the default because its big enough to transfer an 8K page) will 
win... unless you wind up in that situation where you pay for the 
overhead and don't actually *use* the extra transport capacity.

Anyway, since this involves (as Jim pointed out) a L2 tunable that 
affects the entire network configuration, I'm not sure this kind of 
optimization really belongs here.

    -- Garrett




More information about the opensolaris-arc mailing list