Generic HCA Support for uDAPL [PSARC/2007/575 FastTrack timeout 10/10/2007]

Ted Kim tedk at sac.sfbay.sun.com
Wed Oct 3 14:23:54 PDT 2007


Template Version: @(#)sac_nextcase 1.64 07/13/07 SMI
This information is Copyright 2007 Sun Microsystems
1. Introduction
    1.1. Project/Component Working Name:
	 Generic HCA Support for uDAPL
    1.2. Name of Document Author/Supplier:
	 Author:  Bill Taylor
    1.3  Date of This Document:
	03 October, 2007
4. Technical Description

Generic HCA Support for uDAPL
=============================


Background
----------

One important usage mode of InfiniBand is the "OS Bypass" model.
Typically what is done is to map InfiniBand adapter (HCA) objects,
such as queue pairs and completion queues, into the address space of
userland application processes.  The application can then directly
talk to the InfiniBand hardware through the mapped objects.  Of
course, the set up of these mappings must be done in the kernel.

The current framework in Solaris used to accomplish OS Bypass is
"uDAPL" (PSARC/2003/145) which is used by the Sun Clustertool MPI
product.  (Another OS bypass framework called "Open Fabrics User
Verbs" is now under development as an OpenSolaris project.)  uDAPL is
structured to have a part that is generic and a part which is specific
to each type of HCA (called a "plugin" library).  Currently, Mellanox
HCAs which run the Tavor driver (PSARC/2002/539) are supported.
However, relatively soon support will be needed for Mellanox Arbel
(PSARC/2006/400) and Hermon (1-pager not filed yet) HCAs.

Initially, it was expected that the driver for Mellanox Arbel would
use a similar structure to Tavor.  However, since then it has been
discovered that the Arbel hardware uses a different design.  Hermon
has even further changes.  Also, it's possible that HCAs from other
vendors may eventually be supported.

Rather than try to have uDAPL understand data formats for so many
different adapters, we are changing uDAPL to keep the details of each
adapter private to the HCA driver and device-specific libraries.  The
uDAPL kernel module now has a much smaller *GENERIC* interface which
is the same for all HCAs.  Other device specific information is still
transferred, but that information is simply treated as an opaque block
of bytes to copy with only the HCA specific code understanding it
(i.e. the in-kernel HCA driver and the userland plug-in library).

As with other kernel usage of InfiniBand, the uDAPL kernel module
calls through InfiniBand Transport Framework (IBTF) (PSARC/2002/132)
"Transport interface" to the HCA driver entry points in the "Channel
interface".  The particular functions used by the uDAPL kernel module
were introduced in "Userland HCA Object Mapping calls" of
PSARC/2003/358. The specific calls are the IBTF ibt_ci_data_in() and
ibt_ci_data_out() functions which in turn call ibc_ci_data_in and
ibc_ci_data_out entry points on the HCA drivers.  This case now
codifies the semantics that uDAPL now expects for these IBTF calls
within IBTF itself for all OS-bypass users (instead of treating it as
a private agreement between uDAPL and the HCA drivers).


Proposal
--------

The interface binding for this entire proposal is micro/patch.

There are three parts to this proposal:

A. Add new semantics to IBTF calls to allow supporting multiple HCA
types for uDAPL and other InfiniBand OS-bypass users. All calls
continue to be consolidation private as before. The modified functions
are:

Transport interface - 
  ibt_ci_data_in(9F) and ibt_ci_data_out(9F):
  both described in the ibt_ci_data_in.9f man page.

Channel interface -
  ibc_ci_data_in(9E) and ibc_ci_data_out(9E):
  both described in the ibc_ci_data_in.9e man page.

See materials directory for change-bar versions of these man pages.


B. replace the existing Tavor/uDAPL interface from PSARC/2003/557
(main interface) and PSARC/2004/737 (adding SRQ)

Not only is the content of these cases replaced by the new interface,
but contracts from those older cases are obsoleted by *this* case. The
contracts being terminated are:

 (a). "contract-01" in the case directory for PSARC/2003/557
 (b). "draft-contract-01" in the case directory for PSARC/2004/737


C. replace the existing Arbel/uDAPL interface from PSARC/2006/400

The information in "arbel-udapl-interface.txt" in the PSARC/2006/400
case materials directory is replaced by the new interface, and
"contract-02.txt" from PSARC/2006/400 is obsoleted by *this* case.
Further, the plug-in library path will be the same as previouusly used
by Tavor for compatibility reasons: /usr/lib/{64/}udapl_tavor.so.1.

Note that the rest of the PSARC/2006/400 case is not altered by this
proposal.



6. Resources and Schedule
    6.4. Steering Committee requested information
   	6.4.1. Consolidation C-team Name:
		ON
    6.5. ARC review type: FastTrack
    6.6. ARC Exposure: open




More information about the opensolaris-arc mailing list