[tools-discuss] Distributed SCM requirements draft

Stephen Hahn sch at eng.sun.com
Thu Dec 1 14:01:24 PST 2005


  As I mentioned a few weeks ago, we want to evaluate the various
  distributed source code management systems available to see which are
  a best fit for OpenSolaris.  The following document outlines a roughly
  ranked set of requirements to use in the evaluation.  Thoughts,
  corrections, and comments are welcomed.

  - Stephen

----

ident	"@(#)d-scm-requirements.txt	1.3	05/12/01 SMI"

OpenSolaris
DISTRIBUTED SOURCE CODE MANAGEMENT REQUIREMENTS [DRAFT]

1.  Summary

    This document identifies and explains the requirements for a
    distributed source code management (SCM) solution to be used with
    OpenSolaris.  The requirements are grouped into three sets of
    decreasing importance.  It outlines a number of specific evaluations
    that will be used to determine whether a candidate SCM meets the
    various requirements.

2.  Discussion

    The requirements described below arise from a number of distinct
    classes:  some are social, in that the requirement is believe
    necessary for successful use in the community; some are technical,
    in that the requirement is believed necessary to successfully
    produce software in a multi-project, multi-committer, multi-site
    development organization; and some are economic, in that the
    requirement is attempting to describe attributes that would limit
    the costs of the ongoing use of the tool.

    In an attempt to use neutral terms, we use the phrase "candidate
    SCM" to describe the SCM solution we are evaluating and "current
    SCM" to refer to the distributed SCM solution in use (inside Sun) at
    present.  (Not all consolidations participating in OpenSolaris use a
    distributed SCM at present; their SCM requirements are not discussed
    in this document.)

    The requirements are ranked by necessity, using the terminology
    proposed in IEEE Std 830-1998 [1].

2.1.  "Essential" requirements

	E0.  Open source

	To be considered for use by the OpenSolaris community, the
	candidate SCM is expected to be available under an OSI-approved
	license.

	E1.  Unbiased and disconnected distribution

	Although a distributed SCM may choose to implement some form of
	dependency relationship between source trees (such as a
	"parent-child" convention) that relationship must not need to be
	continuously available for sensible SCM operation.  Moreover,
	the candidate SCM must support source code updates between two
	distinct repositories with a common ancestor that have had no
	other contact.  Sensible operation for disconnected use
	encompasses all SCM operations that act only on the local
	repository: creation, modification, or deletion of files or
	directories or the metadata describing these objects or the
	changes to them.

	E2.  Networked operation

	The candidate SCM must be able to operate in a sensible and well
	performing manner between two hosts in separate administrative
	domains.  Beyond the data contained within the candidate SCM's
	representation, the only common administrative requirement
	should be a credential identifying the remote operator
	initiating the transaction to the other host.

	One mechanism that meets this requirement is to tunnel the
	candidate SCM operation through ssh(1).  Candidate SCMs that use
	an implementation that requires domains to change security
	policies to open unusual or believed risky network ports will be
	considered to be minimally compliant with this requirement.

	Performance measurements will be used to compare candidates, as
	outlined below.  A candidate SCM with performance results in the
	bottom third of all candidates will be deemed to have failed to
	meet this requirement.

	E3.  Interface stability and completeness

	The storage representation, command line interfaces, network
	protocols, and hooks interfaces should be documented and have
	some level of declared commitment.  The state of the storage
	representation and the operations that modify it should be well
	defined, so that use with advanced file system capabilities can
	be assessed for hazards.  (For example, consistent use with file
	system snapshot capabilities.)

	E4.  Standard operations and transactions

	The candidate SCM is expected to support rename and deletion
	transactions at the file and directory levels.  A
	history-preserving copy operation, followed by a delete
	operation, may be considered equivalent to a rename;
	equivalency, and the reasons for rename omission, are to be
	assessed and documented by the evaluating engineer.

	E5.  Per changeset metadata.

	The candidate SCM must be able to associate, at a minimum, an
	unstructured text fragment with each changeset.

2.2.  "Conditional" requirements

	C6.  Ease of use

	The candidate SCM should be easy to install in a reasonably
	self-contained fashion.  In principle, shipment in an
	OpenSolaris consolidation should be possible with a finite
	investment of resources.

	The primary interfaces should be understandable to a user
	familiar with distributed SCM concepts.

	The candidate SCM should offer some assistance with conflict
	resolution during an update, the issuance of source code
	patches, and the ability to browse the source tree via a web
	server.

	The candidate SCM should be able to undo the application of a
	specific changeset ("backout") atomically and easily.

	C7.  "No dedicated server" operational mode

	In the interests of machine resource conservation, the candidate
	SCM should have a mode in which it can operate without a
	continuously running server process.  This mode may have
	concurrency restrictions or performance limitations compared to
	its primary server mode.

	For instance, within a large administrative domain, it may be
	more convenient to utilize NFS and a shared identity
	infrastructure than to rely on the networked operating mode
	required by E2.  A candidate SCM which can sensibly operate in a
	pure OpenSolaris NFS environment without the establishment of a
	dedicated server process would meet this requirement.

	C8.  Tool community health

	The community or author of the candidate SCM needs to be active
	and engaged with their user population.  The ability of the
	candidate SCM's community to absorb, directly or through a
	liaison, the defects and feature requests of the OpenSolaris
	community should be estimated, preferably by a direct inquiry to
	the candidate SCM community.

	C9.  OpenSolaris community implementation expertise

	One or more contributors within the OpenSolaris community need
	to be able to assess potential defects in the implementation of
	the candidate SCM and potentially participate in the development
	of new features or supporting tools for the candidate SCM.

	C10.  Interface extensibility

	Beyond the requirements of E3, an extensible interface, so that
	OpenSolaris-specific tools might be integrated with SCM
	operations is desired.  Such an interface might be composed of a
	documented "hooks" interface, a documented library interface, or
	some other modular approach.  An extensive hooks interface, with
	hook evaluations able to terminate operations, is a strongly
	desired attribute in a candidate SCM; a candidate SCM with such
	an interface will be considered to meet fully this requirement.

	C11.  Transactional operations and corruption recovery

	The operations on the candidate repository should have defined
	semantics, in particular identifying non-atomic transactions and
	mechanisms for recovery from a corrupted repository.

	C12.  Content generality

	The candidate SCM should be able to represent safely and track
	files with binary content, in addition to text files.

2.3  "Optional" requirements

	O13.  Partial trees

	The structure of the ON consolidation and the current SCM
	solution allow a contributor to work on specific subsets of the
	source tree in a supported fashion.  This requirement states
	that, while such a mode with support for expressing dependencies
	between files and directories is valuable, support for partial
	tree repositories is not necessary.

	O14.  Per-file histories

	The current SCM uses SCCS as a per-file revision storage format.
	As such, each file has an individual history.  This feature
	allows the combination of disjoint issues to be addressed in a
	single commit without connecting the per-file history.  It is
	believed that the ability to meet the other requirements stated
	in this document is sufficiently more valuable than the support
	of per-file revision histories.  Moreover, the construction of
	per-file histories in reporting and browsing tools can be
	accomplished by convention in many cases.

	That is to say, a candidate SCM that meets E5 is sufficient.

3.  Evaluations

    We anticipate a number of qualitative and quantitative tests to
    evaluate the satisfaction of the various requirements, where a
    "meets" or "does not meet" result is not applicable.

3.1.  Representational and performance criteria

    These criteria focus on the ability of the candidate SCM to
    represent a large, long-running, and active source tree.  The ON
    consolidation represents more than 25 000 changesets by over 1300
    committers against approximately 40 000 files.

    The expected set of meaningful operations for performance evaluation
    are:

    - first pull/clone operation,
    - subsequent pull/update operation, and
    - push/commit operation.

    Performance results for the set of operations will be captured for
    three distinct scenarios:  within a campus, across SWAN between
    sites, and between two Internet sites.  SWAN measurements will be
    captured between each of Menlo Park, CA and Burlington, MA, Manchester,
    UK, and Beijing, PRC.  (Equivalent sites may be added or
    substituted.)  For comparison, results will be phrased both
    as as a percentage of sustained bandwidth, and as absolute time
    elapsed (for an identical pair of endpoints).  Baseline absolute
    time comparisons will be made against standard and "turbo" TeamWare
    for within-SWAN scenarios, and against an rsync copy of the same
    data for all scenarios.

    The candidate SCM will be evaluated for data integrity by
    interruption of the set of operations by signal and by machine
    failure.

    The safety of the candidate SCM with respect to file system
    capabilities will be evaluated using ZFS snapshot/clone technology
    for safe repository copies.

3.2.  Implementation criteria

    The candidate SCM implementation will be assessed by a design and
    code review by an OpenSolaris contributor with expertise in the
    implementation language of the candidate SCM.

3.3.  Tools criteria

    If available, the candidate SCM is expected to provide or identify a
    graphical merge program that can be used to resolve conflicts
    resulting from an update operation.  In the case that no known
    program can be used, the evaluating contributor will assess the work
    necessary to use one of the standard graphical merge programs.

4.  References

[1] IEEE Std 830-1998, "IEEE Recommended Practice for Software
    Requirements Specifications", 1998.

-- 
Stephen Hahn, PhD  Solaris Kernel Development, Sun Microsystems
stephen.hahn at sun.com  http://blogs.sun.com/sch/



More information about the tools-discuss mailing list