[tools-discuss] Distributed SCM requirements draft
Stephen Hahn
sch at eng.sun.com
Thu Dec 1 14:01:24 PST 2005
As I mentioned a few weeks ago, we want to evaluate the various
distributed source code management systems available to see which are
a best fit for OpenSolaris. The following document outlines a roughly
ranked set of requirements to use in the evaluation. Thoughts,
corrections, and comments are welcomed.
- Stephen
----
ident "@(#)d-scm-requirements.txt 1.3 05/12/01 SMI"
OpenSolaris
DISTRIBUTED SOURCE CODE MANAGEMENT REQUIREMENTS [DRAFT]
1. Summary
This document identifies and explains the requirements for a
distributed source code management (SCM) solution to be used with
OpenSolaris. The requirements are grouped into three sets of
decreasing importance. It outlines a number of specific evaluations
that will be used to determine whether a candidate SCM meets the
various requirements.
2. Discussion
The requirements described below arise from a number of distinct
classes: some are social, in that the requirement is believe
necessary for successful use in the community; some are technical,
in that the requirement is believed necessary to successfully
produce software in a multi-project, multi-committer, multi-site
development organization; and some are economic, in that the
requirement is attempting to describe attributes that would limit
the costs of the ongoing use of the tool.
In an attempt to use neutral terms, we use the phrase "candidate
SCM" to describe the SCM solution we are evaluating and "current
SCM" to refer to the distributed SCM solution in use (inside Sun) at
present. (Not all consolidations participating in OpenSolaris use a
distributed SCM at present; their SCM requirements are not discussed
in this document.)
The requirements are ranked by necessity, using the terminology
proposed in IEEE Std 830-1998 [1].
2.1. "Essential" requirements
E0. Open source
To be considered for use by the OpenSolaris community, the
candidate SCM is expected to be available under an OSI-approved
license.
E1. Unbiased and disconnected distribution
Although a distributed SCM may choose to implement some form of
dependency relationship between source trees (such as a
"parent-child" convention) that relationship must not need to be
continuously available for sensible SCM operation. Moreover,
the candidate SCM must support source code updates between two
distinct repositories with a common ancestor that have had no
other contact. Sensible operation for disconnected use
encompasses all SCM operations that act only on the local
repository: creation, modification, or deletion of files or
directories or the metadata describing these objects or the
changes to them.
E2. Networked operation
The candidate SCM must be able to operate in a sensible and well
performing manner between two hosts in separate administrative
domains. Beyond the data contained within the candidate SCM's
representation, the only common administrative requirement
should be a credential identifying the remote operator
initiating the transaction to the other host.
One mechanism that meets this requirement is to tunnel the
candidate SCM operation through ssh(1). Candidate SCMs that use
an implementation that requires domains to change security
policies to open unusual or believed risky network ports will be
considered to be minimally compliant with this requirement.
Performance measurements will be used to compare candidates, as
outlined below. A candidate SCM with performance results in the
bottom third of all candidates will be deemed to have failed to
meet this requirement.
E3. Interface stability and completeness
The storage representation, command line interfaces, network
protocols, and hooks interfaces should be documented and have
some level of declared commitment. The state of the storage
representation and the operations that modify it should be well
defined, so that use with advanced file system capabilities can
be assessed for hazards. (For example, consistent use with file
system snapshot capabilities.)
E4. Standard operations and transactions
The candidate SCM is expected to support rename and deletion
transactions at the file and directory levels. A
history-preserving copy operation, followed by a delete
operation, may be considered equivalent to a rename;
equivalency, and the reasons for rename omission, are to be
assessed and documented by the evaluating engineer.
E5. Per changeset metadata.
The candidate SCM must be able to associate, at a minimum, an
unstructured text fragment with each changeset.
2.2. "Conditional" requirements
C6. Ease of use
The candidate SCM should be easy to install in a reasonably
self-contained fashion. In principle, shipment in an
OpenSolaris consolidation should be possible with a finite
investment of resources.
The primary interfaces should be understandable to a user
familiar with distributed SCM concepts.
The candidate SCM should offer some assistance with conflict
resolution during an update, the issuance of source code
patches, and the ability to browse the source tree via a web
server.
The candidate SCM should be able to undo the application of a
specific changeset ("backout") atomically and easily.
C7. "No dedicated server" operational mode
In the interests of machine resource conservation, the candidate
SCM should have a mode in which it can operate without a
continuously running server process. This mode may have
concurrency restrictions or performance limitations compared to
its primary server mode.
For instance, within a large administrative domain, it may be
more convenient to utilize NFS and a shared identity
infrastructure than to rely on the networked operating mode
required by E2. A candidate SCM which can sensibly operate in a
pure OpenSolaris NFS environment without the establishment of a
dedicated server process would meet this requirement.
C8. Tool community health
The community or author of the candidate SCM needs to be active
and engaged with their user population. The ability of the
candidate SCM's community to absorb, directly or through a
liaison, the defects and feature requests of the OpenSolaris
community should be estimated, preferably by a direct inquiry to
the candidate SCM community.
C9. OpenSolaris community implementation expertise
One or more contributors within the OpenSolaris community need
to be able to assess potential defects in the implementation of
the candidate SCM and potentially participate in the development
of new features or supporting tools for the candidate SCM.
C10. Interface extensibility
Beyond the requirements of E3, an extensible interface, so that
OpenSolaris-specific tools might be integrated with SCM
operations is desired. Such an interface might be composed of a
documented "hooks" interface, a documented library interface, or
some other modular approach. An extensive hooks interface, with
hook evaluations able to terminate operations, is a strongly
desired attribute in a candidate SCM; a candidate SCM with such
an interface will be considered to meet fully this requirement.
C11. Transactional operations and corruption recovery
The operations on the candidate repository should have defined
semantics, in particular identifying non-atomic transactions and
mechanisms for recovery from a corrupted repository.
C12. Content generality
The candidate SCM should be able to represent safely and track
files with binary content, in addition to text files.
2.3 "Optional" requirements
O13. Partial trees
The structure of the ON consolidation and the current SCM
solution allow a contributor to work on specific subsets of the
source tree in a supported fashion. This requirement states
that, while such a mode with support for expressing dependencies
between files and directories is valuable, support for partial
tree repositories is not necessary.
O14. Per-file histories
The current SCM uses SCCS as a per-file revision storage format.
As such, each file has an individual history. This feature
allows the combination of disjoint issues to be addressed in a
single commit without connecting the per-file history. It is
believed that the ability to meet the other requirements stated
in this document is sufficiently more valuable than the support
of per-file revision histories. Moreover, the construction of
per-file histories in reporting and browsing tools can be
accomplished by convention in many cases.
That is to say, a candidate SCM that meets E5 is sufficient.
3. Evaluations
We anticipate a number of qualitative and quantitative tests to
evaluate the satisfaction of the various requirements, where a
"meets" or "does not meet" result is not applicable.
3.1. Representational and performance criteria
These criteria focus on the ability of the candidate SCM to
represent a large, long-running, and active source tree. The ON
consolidation represents more than 25 000 changesets by over 1300
committers against approximately 40 000 files.
The expected set of meaningful operations for performance evaluation
are:
- first pull/clone operation,
- subsequent pull/update operation, and
- push/commit operation.
Performance results for the set of operations will be captured for
three distinct scenarios: within a campus, across SWAN between
sites, and between two Internet sites. SWAN measurements will be
captured between each of Menlo Park, CA and Burlington, MA, Manchester,
UK, and Beijing, PRC. (Equivalent sites may be added or
substituted.) For comparison, results will be phrased both
as as a percentage of sustained bandwidth, and as absolute time
elapsed (for an identical pair of endpoints). Baseline absolute
time comparisons will be made against standard and "turbo" TeamWare
for within-SWAN scenarios, and against an rsync copy of the same
data for all scenarios.
The candidate SCM will be evaluated for data integrity by
interruption of the set of operations by signal and by machine
failure.
The safety of the candidate SCM with respect to file system
capabilities will be evaluated using ZFS snapshot/clone technology
for safe repository copies.
3.2. Implementation criteria
The candidate SCM implementation will be assessed by a design and
code review by an OpenSolaris contributor with expertise in the
implementation language of the candidate SCM.
3.3. Tools criteria
If available, the candidate SCM is expected to provide or identify a
graphical merge program that can be used to resolve conflicts
resulting from an update operation. In the case that no known
program can be used, the evaluating contributor will assess the work
necessary to use one of the standard graphical merge programs.
4. References
[1] IEEE Std 830-1998, "IEEE Recommended Practice for Software
Requirements Specifications", 1998.
--
Stephen Hahn, PhD Solaris Kernel Development, Sun Microsystems
stephen.hahn at sun.com http://blogs.sun.com/sch/
More information about the tools-discuss
mailing list