PSARC/2008/628 Interrupt Resource Management

Garrett D'Amore gdamore at sun.com
Tue Oct 7 21:12:31 PDT 2008


(Preface: I've not read the material in 2004/253.)

The interfaces are marked Committed.  I have some concerns with this, as 
I read the project details.  One of the areas that really concerns me is 
the fact that drivers have to deal with a situation where the total 
number of interrupts is reduced.   (*Increasing* interrupts seems free 
from the issues I am concerned with.)

My read is that the interrupts are taken from them, without any way for 
drivers to decline (or even get a warning) in advance.

I'm interested to know how this is used with real hardware.

For example, one potential use of this facility is to support multiple 
receive rings.  However, now the problem is that there is a window of 
time where an extra ring may be "orphaned".  What happens to those 
packets that are received there?  (What happens to the interrupt that 
the hardware issues, for that matter?)

What if a device has to have a minimum number of interrupts once it 
configures them?  Should a driver be able to specify that once 
allocated, certain interrupts cannot ever be returned to the system?

How will this feature interact with Crossbow resource allocation?  Do we 
have any examples of the two working together yet?

I'd feel a lot better if we had some more complete description of how 
this is used some typical device drivers, with some real experience with 
them, before raising the commitment.  If the project team has some 
sample implementations that can use this, then I might change my 
position.  But in the absence of that, I'd feel better with an 
Uncommitted binding while we get some experience with the APIs at hand.

    -- Garrett

Artem Kachitchkine wrote:
> I'm sponsoring this fasttrack for Scott Carter. It is set to timeout 
> on 10/15/2008. Given that this is a relatively minor amendment to a 
> larger, approved case PSARC/2004/253 "Advanced DDI Interrupt 
> Functions", I think   it qualifies as a fasttrack. As the discussion 
> progresses, extending the timer or even promoting to a full case may 
> be considered.
>
> In addition to this proposal, three man page files can be found in the 
> case directory:
>
> ddi_cb_register.9f.txt
> ddi_intr_set_nreq.9f.txt
> ndi_irm_create.9f.txt
>
> -Artem
>
> Template Version: @(#)sac_nextcase %I% %G% SMI
> This information is Copyright 2008 Sun Microsystems
> 1. Introduction
>     1.1. Project/Component Working Name:
>      Interrupt Resource Management
>     1.2. Name of Document Author/Supplier:
>      Author:  Scott Carter
>     1.3  Date of This Document:
>     07 October, 2008
> 4. Technical Description
> 4.1 Project Description
>
> This project delivers the resource management feature originally
> defined in PSARC/2004/253 "Advanced DDI Interrupt Functions"
> with minor changes (see 4.1.3 for details).
>
> It provides a mechanism for device drivers to get more interrupt
> vectors (and increased performance).  By participating with the new
> feature, their number of available interrupts becomes dynamic and
> can increase or decrease.  The goal is to maximize utilization of
> interrupt vectors in a fair manner, and rebalance the allocations
> whenever devices are added or removed from the system.
>
>
> 4.1.1 Definition
>
> The project defines new DDI interfaces for device drivers to register
> and unregister a generic callback handler.  If a driver has registered
> a handler, then it can be notified when its number of interrupts has
> been increased or decreased.
>
> How many interrupt vectors a device driver wants is set by the initial
> number it attempts to allocate through ddi_intr_alloc().  And a driver
> can explicitly adjust this number at later times with a new DDI function
> that is introduced by this project.
>
> The project also defines new NDI interfaces for nexus drivers to define
> the supplies of interrupt vectors that are available in the system.  A
> supply can be associated with a specific IO bus (e.g. PCIe root complex)
> or it could be a global supply shared by all devices.
>
> Along with the new DDI and NDI interfaces, the project also delivers
> an MDB debugging module with two new debug macros.  One macro is used
> to display statistics on all of the defined interrupt supplies, and the
> other displays statistics about how a supply is divided up between the
> drivers that map to it.
>
>
> 4.1.2 Motivation, Goals, and Requirements
>
> Currently, interrupt vectors are given to device drivers in very small
> numbers.  This is to avoid exhausting a system's supply before all the
> devices have been attached, and to keep interrupt vectors in reserve for
> later hotplugs.  Interrupts are given so conservatively because there
> is no mechanism to take them back later unless a driver detaches.
>
> Fewer interrupt vectors means less IO performance.  More interrupt
> vectors means more parallelism for handling interrupt conditions.  So
> the motivation for this project is to increase IO performance.  And
> the goal of this project is to maximize the allocation of interrupts
> given to each device, but in a way which is fair and balanced across
> the full set of attached devices.
>
> The requirement of this project is:
>
> - Provide a mechanism for drivers to get more interrupts.
>
> The feature is optional so drivers that don't use it still work even
> if the system has implemented support for the feature.  And conversely,
> drivers that do use it also work if the system does not implement the
> support.
>
> A full implementation of the platform level support will be provided
> by this project for PCIe IO bus drivers on SPARC.
>
>
> 4.1.3 Changes From the Previous Case
>
> This project is a micro change to the approved case PSARC/2004/253
> "Advanced DDI Interrupt Interfaces".  The existing DDI interrupt
> interfaces are not changed.  But some new DDI interrupt interfaces
> are added, which extend the capabilities of the existing interfaces.
>
> The interfaces described in section 6.3 of the specifications for
> PSARC/2004/253 "Advanced DDI Interrupt Functions" have already been
> approved, but those resource management interfaces have never been
> implemented.  This project is providing the implementation of those
> interfaces, with minor changes.
>
> The changes include:
>
> - Generalization of the proposed callback interfaces, so they
>   can be shared by both IRM and any future projects.
>
> - Removed semantics originally proposed to temporarily disable
>   callbacks.
>
> - An additional function (ddi_intr_set_nreq()) for explicitly
>   setting how many interrupt vectors a device driver requests.
>
> The interface described in PSARC/2007/453 "MSI-X interrupt limit
> override" provides a temporary workaround for device drivers to request
> more interrupts.  And the workaround is currently used.  This project
> supercedes the functionality of this workaround, and drivers using the
> workaround should ultimately be converted.  But in the meantime, the
> workaround is preserved and still works in conjunction with this
> project.
>
>
> 4.1.4 Competitive Analysis
>
> This project is important for the overall IO performance on all of
> Sun's platforms in order to remain competitive in the marketplace.
>
> Modern IO bus technologies support large numbers of interrupts.  A
> single PCI or PCIe device could use up to 32 MSI interrupts, or 2048
> MSI-X interrupts.  Without this project, Solaris only gives at most
> 2 interrupts per each device.  We are limiting the IO throughput of
> our systems by not having this feature, and by not giving some devices
> nearly as many interrupt vectors as they can support and utilize.
>
> Current Sun platforms have limited numbers of interrupts to give to
> the devices.  SPARC PCIe platforms have 256 interrupts per root complex,
> and x64 systems only have 256 interrupts per each processor.  But we
> have future platforms in development with many thousands of interrupt
> vectors available.  We need this feature so that we can achieve the
> full potential of advanced MSI and MSI-X devices on such platforms.
> And in the meantime we already have some devices which already could
> benefit from getting more than the current 2 interrupt vector limit.
>
>
> 4.2 Technical Description
>
> 4.2.1 Architecture
>
> The basic strategy is to organize representations of each supply of
> interrupt vectors in the system with the drivers who consume them.  And
> to compute the optimal way to divide each supply amongst those devices.
>
> When a driver is attached or detached, the computations are performed
> to rebalance the system.  And, drivers can explicitly change how many
> interrupts they want in response to load.  The computations seek to
> maximize the use of the system's interrupts and derive fair allocations
> for each device.  Drivers are notified using a callback mechanism when
> the computations result in giving the driver more or less interrupts.
>
> The main components of this project are:
>
> - NDI interfaces for nexus drivers to create or destroy the
>   descriptions of individual supplies of interrupt vectors.
>
> - DDI interfaces for drivers to register or unregister callback
>   mechanisms.  The callbacks notify them of changes to their
>   interrupt availability.
>
> - DDI interface for drivers to explicitly set their requested
>   number of interrupts dynamically.
>
> - A mechanism to map individual devices to interrupt supplies.
>
> - Background threads which keep the allocations of interrupts
>   from each supply to each device optimized and balanced.
>
> - Routines to initialize the new feature at boottime.
>
> - An MDB debugging module to display the status of how the
>   interrupt subsystem has been balanced.
>
> Each supply of interrupt vectors in the system is described by a data
> structure (ddi_irm_pool_t), representing one pool of interrupts that
> can be shared by multiple devices.  And for each device that maps to
> an interrupt pool, a data structure (ddi_irm_req_t) represents how
> many interrupts it wants versus how many it received.
>
> Nexus drivers create the interrupt pools, and then they map individual
> devices to them through the existing bus nexus driver INTROP feature.
> The request data structures are created internally and associated with
> the interrupt pools when devices are attached and mapped to a pool.
>
> Existing device drivers do not benefit, and they continue to only get
> the same small number of interrupt vectors that they currently receive.
> In order to benefit, they must be modified with optional enhancements
> so that they can participate.  To participate, they must provide a new
> callback mechanism so that the system can notify them when they have
> been given more or less interrupt vectors.
>
> A modified driver first registers a generic callback handler so it can
> receive notifications of interrupt availability changes.  Then it calls
> ddi_intr_alloc() to request an initial number of interrupt vectors.  If
> the system has the necessary support (from nexus drivers), then it
> associates the requesting driver with an interrupt pool.  Through this
> association, the system will compute if the driver gets more interrupts.
>
> A driver may initially receive all the interrupts it requested, or it
> may receive callbacks at a later time (post attach(9F)) notifying it
> when more interrupt vectors are available.  More could be available if
> other devices were removed, or if workload changes cause other drivers
> to reduce their requests.
>
> But to qualify for additional interrupts, a driver must also yield and
> call ddi_intr_free() when necessary.  This may occur if another device
> is inserted, or if changes in workload cause other drivers to need more.
>
> This project introduces the interfaces for nexus drivers to describe
> the interrupt pools, and for device drivers to engage with those
> interrupt pools to possibly receive more resources.  Plus all of the
> additional implementation behind the scenes to perform the related
> computations when necessary.
>
>
> 4.2.2 Interfaces
>
> 4.2.2.1 Exported Interfaces
>
> Interface                            Stability  Comments
> -----------------------------------+----------+-------------------------
> ndi_irm_create()                     Committed  Create an IRM pool
> ndi_irm_destroy()                    Committed  Destroy an IRM pool
> DDI_INTROP_GETPOOL                   Committed  Get IRM pool INTROP
>
> ddi_cb_register()                    Committed  Install callback handler
> ddi_cb_unregister()                  Committed  Remove callback handler
> ddi_cb_action_t                      Committed  Callback action type
> ddi_cb_flags_t                       Committed  Callback flags type
> ddi_cb_func_t                        Committed  Callback function type
> ddi_cb_handle_t                      Committed  Callback handle type
>
> ddi_intr_set_nreq()                  Committed  Set IRM request size
> -----------------------------------+----------+-------------------------
>
>
> 4.2.2.2 Imported Interfaces
>
> Interface                            Stability  Comments
> -----------------------------------+----------+-------------------------
> ddi_intr_alloc()                    Committed   Added hooks into IRM
> ddi_intr_free()                     Committed   Added hooks into IRM
> -----------------------------------+----------+-------------------------
>
>
> 4.2.2.3 Removed Interfaces
>
> These interfaces were previously approved, but never implemented.
> And they are now superceded by the interfaces of this project.
>
> Interface                            Stability  Comments
> -----------------------------------+----------+-------------------------
> ddi_intr_register_management_cb()   Committed   Register callback
> ddi_intr_unregister_management_cb() Committed   Unregister callback
> ddi_intr_enable_management_cb()     Committed   Enable callback
> ddi_intr_disable_management_cb()    Committed   Disable callback
> -----------------------------------+----------+-------------------------
>
>
> 5. References
>
> This Project Implements, Extends, or Replaces these Projects:
>
> - PSARC/2004/253: Advanced DDI Interrupt Functions
> - PSARC/2007/453: MSI-X interrupt limit override
>   (A future RFE will remove the workaround once all
>   consumers have been converted to use this project.)
>
> Consumers of this Project:
>
> - PSARC/2008/181: Solaris Hotplug Framework
>   (Uses this project's generic callback mechanism)
> - IRM Enhancements for Atlas/Neptune Driver
>   (Convert from existing workaround to use this project)
> - x86 APIC Expansion and IRM Support
>   (Provide interrupt pool definitions on x86.  The scope of
>   this project is to only deliver interrupt pools on SPARC.)
>
> Design and Implementation Specification of this Project:
> - http://pciexpress.sfbay/intr/docs/irm/irm_design.txt
>
> 6. Resources and Schedule
>     6.4. Steering Committee requested information
>        6.4.1. Consolidation C-team Name:
>         ON
>     6.5. ARC review type: FastTrack
>     6.6. ARC Exposure: open
>




More information about the opensolaris-arc mailing list