Issues for 2008/055

James Carlson james.d.carlson at sun.com
Wed Dec 17 13:00:05 PST 2008


Nicolas Droux writes:
> Here are my issues for Solaris Bridging (PSARC 2008/055).  
> Unfortunately I have a conflict and won't be able to attend the  
> inception review, but I'll be happy to follow-up by email.

I'll integrate these (and my replies) into the existing issues file.

> ngd-01 bridging-spec.txt states "The links assigned to a bridge must  
> not themselves be VLANs, VNICs, or tunnels. Only links that would be  
> acceptable as part of an aggregation or links that are aggregations  
> themselves may be assigned to a bridge." It should be also possible to  
> bridge etherstubs (introduced by Crossbow [PSARC 2006/357]), since  
> they can be used to create virtual switches.

We had an extended talk about this one.  The spec intentionally
doesn't mention etherstubs (except in passing) because they're not
prohibited.

You *should* in principle be able to create a bridge between two
etherstub instances.  I've attempted to do this, and I've found that
there appear to be numerous bugs related to etherstubs in ON today --
for instance, dladm_linkid2legacyname() thinks they're invalid and
dlpi_bind() won't allow me to bind to SAP zero so that I can send and
receive STP into the bit-bucket.

I'm sure I can fix and/or work around those bugs, and thus make it
possible to bridge these objects.  I'll include doing that as part of
the project.  From my prototype:

# dladm show-bridge -l bar
LINK         STATE        UPTIME   DESROOT
stub1        forwarding   22       32768/0:0:0:0:0:0
stub2        forwarding   22       32768/0:0:0:0:0:0

I'm not sure, though, that it's an interesting case.  You'll get
better performance if you just put all of the VNICs that must talk
with each other together on a single etherstub if you're planning to
bridge etherstubs together.  If you're planning to bridge an etherstub
with a regular NIC, then just move the VNICs over to the regular NIC.

> ngd-02 in bridging-spec.txt, 2.2 a), the proposed link/up behavior in  
> the presence of bridges needs to be refined. With Crossbow VNICs, the  
> link status advertised to MAC clients depends also on the presence of  
> other MAC clients on top of the underlying data-link, in order to  
> maintain connectivity between these MAC clients when the physical link  
> of the underlying data-link goes down. This needs to be factored-in in  
> the logic used to reflect the link status when bridging is configured  
> on the underlying data-link.

This appears to be a misunderstanding.  I'm not modifying the existing
link up/down handling that Crossbow VNICs have in any way.

The existing behavior is that the VNIC stays up if there are other
VNICs configured on the same NIC.  The same is true when bridging is
present in the picture: if all of the physical NICs go down, then
VNICs will still do the same thing they did before, and will still
advertise "up" status to clients when there are multiple VNICs present
on the same NIC.

There's no issue here.

> ngd-03 in bridging-spec.txt 4.1, "However, in the event that bridging  
> integrates without Crossbow" since you're not asking for patch binding  
> how is this possible?

The spec was written long ago, before bridging was made dependent on
Crossbow.  I just hadn't removed all the references.

> ngd-04 bridging-spec.txt 4.1 is missing the VLAN VNICs (dladm create- 
> vnic -v <vid> ...) in the description.

I can add that, but the point is the same.  All intentionally
(administratively) created VLANs are the same from the point of view
of this bridging design: they provide (through libdladm) a set of
"allowed VLANs" for the purpose of bridging behavior per 802.1q.

All "casual" VLANs (PPA hack) are different; the bridging code would
use these instances with forwarding disabled for that VLAN.  The user
would have to configure explicitly in order to get forwarding among
the other interfaces.  (That is, PPA hack VLANs do not enter the
"allowed VLAN" set.)

Obviously, since the PPA hack is now gone, the point is moot.  There's
only one kind of VLAN -- the intentionally created kind -- and it
always enters the "allowed VLAN" set.

> ngd-05 bridging-design.pdf The design document is still referring to  
> old Crossbow architectural details which do not match what was  
> integrated in Nevada. For instance, some of the arguments used against  
> using the Crossbow classifier don't hold true with the latest Crossbow  
> implementation, and the discussion at pages 16-17 should be updated,  
> and the design possibly revisited based on the latest Crossbow flow  
> table architecture.

The design document, as I've tried to make clear, is a very early
draft, has not been updated, and is informative for the architectural
review, not normative.  It's explicitly not under review here.

However, the classifier issues still remain, and we discussed those at
length.  The analogy from before still stands: for the same basic
reasons that the Fireengine conn_t classifier can't really be used
effectively as a substitute for the Patricia-tree based IP forwarding
look-up (and vice-versa), the local delivery related classification in
Crossbow doesn't appear suitable for the bridge forwarding case.

With Crossbow, the classification is tied to the administrative bits,
which rely on explicit configuration of the VNICs and flows involved
using a user-space component.  With bridging, forwarding entries are
created and updated on the fly based on source MAC addresses seen in
the data path, and then aged away over time; there's no administrative
involvement normally expected for these entries.

The two are different in many respects.  In theory, though, it might
be possible modify Crossbow so that it can create and destroy
classification entries on the fly (this does not look trivial in the
least; the locking scheme makes this an unobvious approach), and it
may be possible to make use of some aspects of flow administration
when tied to more easily identifiable objects, such as VLANs, though
it's unclear how this should work with the existing Crossbow resource
management structure.

I regard all of that as a research project.  It may well be an
interesting one, but it's not this project by any stretch.  I have no
plans or engineering resources available to redesign the internals of
Crossbow to handle things it wasn't originally designed to do, and I
think that insisting on such an extension of the project I've proposed
is not reasonable.  I will not be doing that.

For what it's worth, it may also be possible to modify Crossbow so
that it eliminates the Fireengine classifier entirely.  After all, the
two are much more aligned than are Crossbow and bridging: both involve
identifying specific receiving client(s) on input and handling output
from multiple clients, and both involve classification structures that
are created strictly on the action of user space components.  It seems
like a performance loss to have Crossbow inspect and classify the
packet once -- potentially looking high up the stack for flow
information -- only to have Fireengine do the same thing again.

I can see that this path wasn't taken, so I can't help but wonder how
reuse of Crossbow's classifier could be considered a requirement for
bridging.

One important issue did come up here: we need to define the relative
ordering between L2 filtering and bridging, and I believe it makes
sense to put L2 filtering closer to the physical I/O.  In other words,
L2 filter should do its work underneath the bridge.

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677



More information about the opensolaris-arc mailing list