2009/387 [Pathname Reparse Points]
Garrett D'Amore
gdamore at sun.com
Fri Jul 10 10:03:07 PDT 2009
This sounds reasonable, and is not like the case of AFS which uses
special tokens in symbolic links that can expand to other things.
I'm a bit concerned about potential effects on applications, it *seems*
like this is done in a manner that is safe, but there are a few items:
* are applications consistent in their use of pathconf/fpathconf to
get filesystem limits
* presumably archivers and such are not expected to traverse these?
(they get handled like an ordinary symbolic link)
* what happens when the referral is archived and then reextracted?
(is the attribute lost?)
* as a nit, its not truly file system independent, since it relies
on symbolic links (not all filesystems support
symlinks, though admittedly the ones of interest to this case all do)
I believe that this case likely exceeds the obviousness test for a fast
track. I certainly wouldn't be comfortable having it go through with
only a single +1 from another member (your own +1 doesn't count, as I
understand the rules -- case owners don't count).
Given this, I'm going to derail the case, just to force enough members
to read it to get a meaningful vote. I'll write any resulting opinion.
I don't think we need any additional materials apart from answers to the
questions I've already raised.
Note that I don't think there is anything intrinsically wrong with the
case (though my archivers question above is I think a real potential
concern) -- the derail here should not be taken as a negative statement
about the case itself; I just want to make sure it is adequately and
properly reviewed.
Thanks.
- Garrett
Glenn Skinner wrote:
> I'm sponsoring the following fast track for Afshin Salek and the CIFS
> i-team. It times out on Friday, July 17th.
>
> A copy of the specification below appears in the case directory under
> the name "specification".
>
> I've pre-reviewed it and will give it a +1 up front.
>
> -- Glenn
>
> ----------------
>
> Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
> Copyright 2007 Sun Microsystems
>
> 1. Introduction
> 1.1. Project/Component Working Name:
> Support for Reparse Points
>
> 1.2. Name of Document Author/Supplier:
> Author: Afshin Salek
>
> 1.3. Date of This Document:
> 07/08/09
>
> 1.4. Name of Major Document Customer(s)/Consumer(s):
> PSARC
> CIFS team
>
> 1.5. Email Aliases:
> 1.5.1. Responsible Manager: Barry.Greenberg at Sun.COM
> 1.5.2. Responsible Engineer: Afshin.Ardakani at Sun.COM
> 1.5.3. Marketing Manager:
> 1.5.4. Interest List: cifs-team at sun.com
>
> A patch binding is requested for this change.
>
> 4. Technical Description:
> 4.1. Details:
>
> INTRODUCTION
>
> There are situations where a mechanism is needed to reflect
> the concept that data is not present at a particular path, but
> can be found in some alternate location(s). Examples include
> "referrals" used to build unified name spaces in NFSv4.x and
> SMB, and data relocation in HSM systems. A "reparse point" is
> defined as the marker for a namespace redirection and a
> container for the metadata to specify where the target of this
> redirection is.
>
> Reparse points are intended to be a general mechanism for
> location redirection and as such the file system that contains
> them is not cognizant of the reparse point format or content.
> Services that use reparse points know how to interpret and use
> the stored data.
>
> REPARSE POINT OBJECT
>
> After a lot of discussion the consensus is that the best way
> to represent reparse points in the file system, in order to
> minimize the effect on existing applications and utilities, to
> use symbolic links. One of the main goals in this context has
> been the ability to use existing utilities for backup/restore
> and also ZFS send/receive without having to modify them to
> know how to deal with reparse points.
>
> Some of what is envisioned here could be done with extensions
> to the Solaris automounter capability. Part of the
> motivation, though, is to create centrally-administrated
> namespaces served by a group of fileservers to near-zero-admin
> clients. It is expected to be easier to keep the namespaces
> uniform if only a small number of servers need to participate.
> HSM solutions would also normally be tied closely to a storage
> server by this mechanism. Also, for both NFS and SMB
> referrals, it is the client that chooses the target and not
> the server. The server only provides the targets' information
> and it is up to the client to pick the desirable target to
> access the data.
>
> To distinguish a regular symlink from a reparse point, an
> extensible system attribute will be set on the symlink. This
> system attribute is only one bit which indicates whether or
> not a symlink contains reparse data.
>
> The reparse data will be stored as the link target. The
> reparse data is not in file system path format, which is the
> typical format of a link target. In order to avoid coming up
> with a totaly new format for reparse data as the link target
> we decided to adopt the format used by magic links in BSD:
> (http://www.daemon-systems.org/man/symlink.7.html)
>
> @{REPARSE@{service-type1:data} [@{service-type2:data}]...}
>
> Where some examples of service-type are:
>
> #define REPARSE_SVC_SMB "SMB"
> #define REPARSE_SVC_NFS "NFS"
> #define REPARSE_SVC_HSM "HSM"
>
> The data for each service will be in string format, which is
> expected to be typically a UUID string.
>
> The pattern above starts with "REPARSE" to distinguish it from
> a other magic links, such as those supported by BSD. Note
> that this case is not a proposal to support BSD magic links,
> the intent is to avoid precluding the future addition of full
> BSD magic link support.
>
> Multiple services entries can co-exist within the symlink
> data. It is expected that normally, all entries would resolve
> to the same logical location, e.g. NFS and CIFS clients would
> find the same files.
>
> BASIC INTERFACES
>
> There is a need for both userspace and kernel APIs to work
> with reparse points.
>
> Userspace API
>
> In userspace the symlink(2) system call will be used to set a
> reparse point. The readlink(2) system call will be used in
> turn to read the reparse data.
>
> Kernel API
>
> In the kernel, VOP_SYMLINK and VOP_READLINK will be used to
> set/get reparse data.
>
> These interfaces will support all replication, archive and
> copy operations to preserve reparse points without further
> changes.
>
> fop_symlink() needs to be modified to recognize the reparse
> @{REPARSE} tag and pass the appropriate attribute (i.e.
> reparse system attribute) to VOP_SYMLINK to be set on the
> symlink.
>
> IMPLEMENTATION OBSERVATIONS
>
> VFS feature registration can be used to determine whether or
> not a file system supports reparse points.
>
> Two things are needed to obtain the reparse point data in the
> kernel. First, the consumer needs to know that a reparse
> point has been encountered and, second, it needs the vnode
> pointer to the symlink. The proposal is to enhance VOP_LOOKUP
> to return the attributes of the looked up vnode. This way
> when the vnode is available the caller can check the
> attributes to determine if the returned vnode is a reparse
> point or a regular symlink. Here are the old and revised
> signatures of VOP_LOOKUP:
>
> int VOP_LOOKUP(vnode_t *dvp, char *nm, vnode_t **vpp,
> pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr,
> caller_context_t *ct, int *deflags, pathname_t *ppnp)
>
> int VOP_LOOKUP(vnode_t *dvp, char *nm, vnode_t **vpp,
> pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr,
> caller_context_t *ct, int *deflags, pathname_t *ppnp,
> vattr_t *vap)
>
> A vattr_t pointer argument is added at the end to return the
> attributes if it is non-NULL. This is an optimization so that
> consumers don't have to invoke an extra VOP_GETATTR after
> lookup for obtaining the attributes.
>
> The symlink target size should be increased to 16K to
> accomodate the maximum size supported for MS-DFS referrals by
> Windows. Applications are expected to query the PATH_MAX and
> SYMLINK_MAX values on the local system using
> pathconf(2)/fpathconf(2). The value of SYMLINK_MAX would be
> changed to 16K on ZFS. The value of PATH_MAX will not be
> affected.
>
> To provide compatibility with other UNIXes (see section 6
> below), sharemgr(1M) would be enhanced to support a "refer"
> option for NFS exports. This option would only result in
> creation of a reparse point at the specified path and does not
> actually share the path over NFS.
>
> This case is only about the underlying infrastructure and a
> future case will be presented to deal with details and
> specifics of handling referrals for NFSv4 server.
>
> SECURITY CONSIDERATIONS
>
> Referrals are similar to regular symbolic links in that they
> are only pointers to data that could be discovered in some
> other way. The presence of such a pointer does not compromise
> the security of the target object or data; the target service
> or file system must still enforce security.
>
> OPERATION FLOW
>
> Once a kernel service encounters a reparse point, it reads the
> data using VOP_READLINK and passes the data up to a user space
> daemon (e.g. reparsed) along with its desired record type.
> Depending on the requested record type the daemon could simply
> extract the information from the passed data and return it to
> kernel or do any other processing necessary to obtain the
> actual referral information e.g. in the case of FedFS,
> contacting NSDB. Going through a common user space daemon to
> get the referral data makes this process generic and easily
> expandable for possible future use cases.
>
> Referral extraction and creation by a userspace daemon can be
> handled via a library plugin architecture for different
> service types.
>
> Operation Flow Example
>
> Here is a simplified example of operation for a CIFS client
> that tries to access a file where the path contains a DFS
> link:
>
> a) Client tries to access \\srv\root\...\link\...\file.txt
> where:
> 'root' is a share (namespace root)
> 'link' is a reparse point seen as a folder by client
>
> b) CIFS server does a VOP_LOOKUP for 'link' when it is
> recognized as a reparse point by examining the attributes
> return by VOP_LOOKUP. At this point a
> STATUS_PATH_NOT_COVERED is returned to client
>
> c) Client sends a "link referral" request to the server. CIFS
> server uses VOP_READLINK to get the 'link' data and sends
> the data to 'reparsed' daemon via a door call and gets back
> the DFS link targets in a format understandable by the CIFS
> client. The targets are sent back to the client in
> response to its "link referral" request.
>
> b) Client picks one of the targets and contacts the target
> server to access 'file.txt'
>
> NFS REFERRAL IN OTHER UNIXES
>
> FS referrals have been implemented in other major UNIX
> distributions such as Linux, AIX and HP-UX but there is no
> unified approach or implementation.
>
> Linux, AIX and HP-UX specify referrals as an NFS export
> option. The option format is basically the same in all three
> operating systems (refer=path at host) but the presentation is
> somewhat different in each case:
>
> - In Linux a referral is presented as a mount point.
> - In HP-UX a referral is a file system partition or logical volume.
> - In AIX a special object is used to represent a referral.
>
> These are all mechanisms to trigger a change in namespace
> while resolving a path.
>
> This proposal is somewhat aligned with the AIX approach but
> does not require a new object type to be defined, which has
> the advantage of not impacting existing applications. As
> mentioned previously, an NFS "refer" option will be supported
> to provide option format compatibility.
>
> Additionally, the Solaris requirements include support for
> both NFS and SMB referrals whereas these other operating
> systems only support NFS referrals, and they do not provide
> native SMB support. For the Solaris operating system, this
> proposal provides a generic solution to support multiple,
> disparate referral mechanisms without placing restrictions on
> the format required by each mechanism.
>
> The following links provide a bit more details about each OS
> discussed above:
>
> http://www.citi.umich.edu/projects/nfsv4/linux/using-referrals.html
> http://nfsv4.bullopensource.org/doc/migration-and-replication-0.2.pdf
> http://docs.hp.com/en/5900-0306/ch01s11.html?jumpid=reg_R1002_USEN
> http://docs.hp.com/en/13578/nfsv4_whitepaper.pdf
> http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.commadmn/doc/commadmndita/nfs_referrals.htm
>
> INTERFACE TABLE
>
> |Proposed |Specified |
> |Stability |in what |
> Interface Name |Classification |Document? | Comments
> ===========================================================================
> XAT_REPARSE |Consolidation |This |Reparse extensible
> |Private |Document |attribute
> | | |
> VOP_LOOKUP, fop_lookup |Contracted |This |Added new argument:
> |Consolidation |Document |vattr_t *vap
> |Private* | |
> | | |
> Reparse token syntax |Committed |This |
> |Private |Document |
> | | |
> SYMLINK_MAX |Committed |This |Increased to 16K
> | |Document |
>
> * The project's deliverables will all go into the OS/NET
> Consolidation, so no contracts are required.
>
> 6. Resources and Schedule:
>
> 6.4. Product Approval Committee requested information:
> 6.4.1. Consolidation or Component Name:
> ON
>
> 6.5. ARC review type:
> FastTrack
>
>
More information about the opensolaris-arc
mailing list