Interrupt affinity interfaces and PCITool enhancements [PSARC/2009/340 FastTrack timeout 06/17/2009]
Garrett D'Amore - sun microsystems
gd78059 at sac.sfbay.sun.com
Wed Jun 3 22:02:01 PDT 2009
I'm sponsoring this fast track for Govinda Tatti and the PCI team.
This project introduces new DDI interfaces, and changes PCITool's command line
syntax in an incompatible way. However, the change is intended to correct
an incompatibility with respect to CLIP, and the original code has only been
integrated in the last couple of builds of Nevada, so we believe it is
an opportune time to fix this.
The project is seeking Minor Commitment, since the interfaces are primarily
intended for consumption by Crossbow which is not available in Solaris 10.
Man pages, headers, and supporting materials are also located in the
case directory under "materials/"
- Garrett
Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
Interrupt affinity interfaces and PCITool enhancements
1.2. Name of Document Author/Supplier:
Author: Govinda Tatti
1.3 Date of This Document:
03 June, 2009
4. Technical Description
Template Version: @(#)sac_nextcase 1.9 06/02/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1 Project/Component Working Name:
Interrupt Affinity Interfaces and PCITool Enhancements
1.2 Name of Document Author/Supplier:
Author: Govinda Tatti
1.3 Date of This Document:
02 June, 2009
4. Technical Description
4.1 Project Summary
This project provides a mechanism for device drivers, IO frameworks such
as Crossbow, and for the users who want to know the current CPU binding
for their interrupts and fine tune those bindings to achieve maximum IO
performance.
The first phase of this project delivers the simple DDI interrupt affinity
interfaces to allow a device driver to retrieve the current interrupt
target CPU and to express its interrupt target preference. In addition,
it will deliver some PCITool enhancement to retarget MSI/X interrupts.
In the next phase, these simple DDI interrupt affinity interfaces will be
replaced with hint or preference based interfaces. Plus, the DDI interrupt
framework and platform specific implementation will be modified to query
the NUMA-IO framework for optimal interrupt target CPU before configuring
the platform interrupt targeting hardware logic.
4.2 Problem and Requirements
Modern IO bus technologies support large numbers of interrupts. A single
PCI or PCIe device could use up to 32 MSI interrupts, or 2048 MSI-X
interrupts. The IRM project (PSARC/2008/628) fixed the MSI-X allocation
limit issue and solved part of an IO performance problem. Other part of
this problem is how to fine tune the CPU bindings for these multiple MSI-X
interrupts to achieve the expected IO performance.
Currently there is a need for Solaris device drivers such as NIC (10G),
HBA (Emulex) and IO frameworks such as Crossbow to retrieve and reroute
the target CPU for their interrupts. For example, Crossbow provides a
framework by which NIC resources such as Rx and Tx rings are exposed to
the MAC layer. The MAC layer doles out these resources to VNICs when they
get created while reserving a fixed amount for the primary NIC. CPUs,
on which the processing of packets take place, can be specified at VNIC
creation time or later. If they are specified, the interrupts associated
with the Rx/Tx rings need to be re-targeted to the specified CPUs. A
mechanism by which we can re-target a specific MSI-X interrupt to a
different CPU is needed. This is for the virtualization part of Crossbow.
For optimal performance of regular NICs (as well as VNICs), the poll thread
associated with an Rx ring should be bound to the same CPU as the interrupt
CPU. So given an interrupt handle and a CPU, we need a mechanism to retarget
the interrupt to the specified CPU. This has become a major issue (on
Maramba) for performance when multiple 10 Gig NICs are present. The poll
threads belonging to one NIC can end up running on CPUs which is taking
interrupts from another NIC.
Presently Crossbow uses the PCITool ioctls (sys/pci_tools.h) to re-target
fixed interrupts from inside the kernel. The interface provided is not ideal
for doing this kind of work from inside the kernel. A better interface is
needed here. Also this mechanism currently does not work for MSI-Xs on
SPARC platforms. This should be addressed.
To achieve the above objectives, the following interfaces are required:
1. Given an interrupt handle (ddi_intr_handle_t) that is associated with an
Rx/Tx ring, provide the CPU (processorid_t) to which interrupt is going.
2. Given an interrupt handle (ddi_intr_handle_t) that is associated with an
Rx/Tx ring and a CPU, bind the interrupt to the specified CPU.
4.3 Changes From the Previous Case
This project is an extension to the approved cases, PSARC/2004/253
"Advanced DDI Interrupt Interfaces" and PSARC/2008/628 "Interrupt Resource
Management". The existing DDI interrupt interfaces are not changed. But
some new DDI interrupt interfaces are added, which extend the capabilities
of the existing interfaces.
The changes include:
- A new function (ddi_intr_get_affinity(9f)) to return the interrupt
target CPU for a given DDI interrupt handle h.
- A new function (ddi_intr_set_affinity(9f)) to set the interrupt target
CPU for a given DDI interrupt handle h.
- Modify ddi_intr_get_cap(9f) function to return the new capability flag
DDI_INTR_FLAG_RETARGETABLE indicating all the interrupts are retargetable
for the current interrupt type in use.
- A new PCITool option, -m to retarget MSI/X interrupts.
4.4 Competitive Analysis
Linux and Microsoft OSs already provides the interrupt retarget interfaces
of some fashion to their device drivers. So, it is important to provide
similar features to Solaris device drivers to achieve individual device
performance and also, overall IO performance on all of Sun's platforms in
order to remain competitive in the marketplace.
4.5 Project Description
4.5.1 Interrupt Affinity Interfaces
The basic strategy is to provide an opportunity for device drivers to
provide its input in selecting the proper interrupt target CPU (such as
CPU# or preference) for its interrupts. The device drivers or IO frameworks
will call the proposed affinity interfaces either during its initialization
or run time to optimize its IO performance based on the available resources
such as DMA channels, rings, interrupts allocated and current CPU bindings.
typedef processorid_t ddi_intr_target_t;
int ddi_intr_get_affinity(ddi_intr_handle_t h, ddi_intr_target_t *tgt_p);
int ddi_intr_set_affinity(ddi_intr_handle_t h, ddi_intr_target_t tgt);
These interfaces are optional to the device drivers, so drivers that don't
use it still work even if the system has implemented this feature. And
conversely, drivers that do use it also work if the system does not
implement the support.
This case also includes the contract for Crossbow framework to use these
interrupt affinity interfaces in place of existing PCITool ioctl interfaces.
Constraints:
a) Set affinity limitations for certain interrupt types
Fixed or INTx interrupts could be either exclusive or sharable depending
on hardware. Because there is no good way to detect that, the current
implementation will refuse any set affinity requests for INTx interrupts.
On x86 platforms, multiple MSI interrupts of a single PCI function need
to be rerouted together since all MSI interrupts share the same MSI
address, which in turn includes same CPU number. Hence the current x86
implementation will refuse any set affinity requests for MSI interrupts.
The future phase of this project may support MSI group retarget, similar
to PCITool method.
b) CPU offline considerations
CPUs may be online/offlined through administrative interfaces. When
a CPU is offlined, all of the interrupts targeting it are re-targeted.
The OS will pick any set of the surviving CPUs for re-targeting. The
OS is under no obligation to maintain drivers' interrupt affinity
preferences.
The first phase of this project will not provide any callback on CPU
online/offline events. Such callback events need to be defined in the
future. If a driver or framework is interested in maintaining optimal
CPU targeting, it should monitor its interrupt CPU bindings on a regular
basis using ddi_intr_get_affinity(9f) or register a callback to receive
various CPU specific events using register_cpu_setup_func(). Where as,
the userland entities should subscribe to CPU DR specific sysevents.
4.5.2 PCITool Enhancements
Current syntax:
pcitool pci@<unit-address> -i ino=ino
[ -r [ -c ] | -w cpu=CPU [ -g ] ] [ -v ] [ -q ]
Proposed syntax:
pcitool pci@<unit-address> -i <ino#> | all
[ -r [ -c ] | -w <cpu#> [ -g ] ] [ -v ] [ -q ]
pcitool pci@<unit-address> -m <msi#> | all
[ -r [ -c ] | -w <cpu#> [ -g ] ] [ -v ] [ -q ]
The PCItool is a low-level tool which provides a facility for getting and
setting interrupt routing information. This project is making some minor
syntax changes to PCITool since the current syntax is not compliant with
existing userland guidelines.
In addition, this project is adding a new "-m" option to retrieve and
reroute the interrupt target CPU for MSI/Xs on SPARC platforms.
On SPARC platforms, the INO is mapped to an interrupt mondo, and where as
one or more MSI/Xs are mapped to an INO. So, INO and MSI/Xs are individually
retargetable. Use "-i " option to retrieve or reroute a given INO, and
where as use "-m" option for MSI/Xs.
On x86 platforms, both INOs and MSI/Xs are mapped to the same interrupt
vectors. Use "-i" option to retrieve and reroute any interrupt vectors
(both INO and MSI/Xs). So, "-m" option is not required on x86 platforms.
Hence it is not supported.
4.6 Interfaces
4.6.1 Exported Interfaces
Interface Stability Comments
----------------------------+---------------+--------------------------
ddi_intr_target_t Project Interrupt target CPU
Private
ddi_intr_get_affinity Project Get interrupt target CPU
Private
ddi_intr_set_affinity Project Set interrupt target CPU
Private
-----------------------------------------------------------------------
4.6.2 Imported Interfaces
Interface Stability Comments
----------------------------+---------------+--------------------------
DDI_INTR_FLAG_RETARGETABLE Project Return this new flag (RO) to
Private ddi_intr_get_cap() callers if
current interrupt type in use
is retargetable
pcitool Project Minor syntax changes. Added
Private new -m option for MSI/Xs.
-----------------------------------------------------------------------
5. References
[1] Solaris Interrupt Project Webpage
http://pciexpress.sfbay/intr
[2] Advanced DDI Interrupt Functions - PSARC/2004/253
http://sac.sfbay.sun.com/PSARC/2004/253
[3] Interrupt Resource Management - PSARC/2008/628
http://sac.sfbay.sun.com/PSARC/2008/628
[4] PCITool and its nexus ioctl support - PSARC/2005/232
http://sac.sfbay.sun.com/PSARC/2005/232
[5] PCITool Public Interrupts - PSARC/2009/215
http://sac.sfbay.sun.com/PSARC/2009/215
6. Resources and Schedule
6.4 Steering Committee requested information
6.4.1 Consolidation C-team Name:
ON
6.5 ARC review type: FastTrack
6.6 ARC Exposure: open
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open
More information about the opensolaris-arc
mailing list