2009/387 [Pathname Reparse Points]

Garrett D'Amore gdamore at sun.com
Fri Jul 10 10:03:07 PDT 2009


This sounds reasonable, and is not like the case of AFS which uses 
special tokens in symbolic links that can expand to other things.

I'm a bit concerned about potential effects on applications, it *seems* 
like this is done in a manner that is safe, but there are a few items:

    * are applications consistent in their use of pathconf/fpathconf to 
get filesystem limits
    * presumably archivers and such are not expected to traverse these?  
(they get handled like an ordinary symbolic link)
    * what happens when the referral is archived and then reextracted?  
(is the attribute lost?)
    * as a nit, its not truly file system independent, since it relies 
on symbolic links (not all filesystems support
    symlinks, though admittedly the ones of interest to this case all do)

I believe that this case likely exceeds the obviousness test for a fast 
track.  I certainly wouldn't be comfortable having it go through with 
only a single +1 from another member (your own +1 doesn't count, as I 
understand the rules -- case owners don't count).

Given this, I'm going to derail the case, just to force enough members 
to read it to get a meaningful vote.  I'll write any resulting opinion.  
I don't think we need any additional materials apart from answers to the 
questions I've already raised.

Note that I don't think there is anything intrinsically wrong with the 
case (though my archivers question above is I think a real potential 
concern) -- the derail here should not be taken as a negative statement 
about the case itself; I just want to make sure it is adequately and 
properly reviewed.

Thanks.

    - Garrett

Glenn Skinner wrote:
> I'm sponsoring the following fast track for Afshin Salek and the CIFS
> i-team.  It times out on Friday, July 17th.
>
> A copy of the specification below appears in the case directory under
> the name "specification".
>
> I've pre-reviewed it and will give it a +1 up front.
>
> 		-- Glenn
>
> ----------------
>
> Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
> Copyright 2007 Sun Microsystems
>
> 1. Introduction
>    1.1. Project/Component Working Name:
>         Support for Reparse Points
>
>    1.2. Name of Document Author/Supplier:
>         Author: Afshin Salek
>
>    1.3. Date of This Document:
>         07/08/09
> 	
>    1.4. Name of Major Document Customer(s)/Consumer(s):
>         PSARC
> 	CIFS team
>
>    1.5. Email Aliases:
>     	1.5.1. Responsible Manager: Barry.Greenberg at Sun.COM
>     	1.5.2. Responsible Engineer: Afshin.Ardakani at Sun.COM
>     	1.5.3. Marketing Manager:
> 	1.5.4. Interest List: cifs-team at sun.com
>
>    A patch binding is requested for this change.	
>
> 4. Technical Description:
>     4.1. Details:
>
>        INTRODUCTION
> 	  
> 	 There are situations where a mechanism is needed to reflect
> 	 the concept that data is not present at a particular path, but
> 	 can be found in some alternate location(s).  Examples include
> 	 "referrals" used to build unified name spaces in NFSv4.x and
> 	 SMB, and data relocation in HSM systems.  A "reparse point" is
> 	 defined as the marker for a namespace redirection and a
> 	 container for the metadata to specify where the target of this
> 	 redirection is.
> 	  
> 	 Reparse points are intended to be a general mechanism for
> 	 location redirection and as such the file system that contains
> 	 them is not cognizant of the reparse point format or content.
> 	 Services that use reparse points know how to interpret and use
> 	 the stored data.
> 	  
>        REPARSE POINT OBJECT
> 	  
> 	 After a lot of discussion the consensus is that the best way
> 	 to represent reparse points in the file system, in order to
> 	 minimize the effect on existing applications and utilities, to
> 	 use symbolic links.  One of the main goals in this context has
> 	 been the ability to use existing utilities for backup/restore
> 	 and also ZFS send/receive without having to modify them to
> 	 know how to deal with reparse points.
>
> 	 Some of what is envisioned here could be done with extensions
> 	 to the Solaris automounter capability.  Part of the
> 	 motivation, though, is to create centrally-administrated
> 	 namespaces served by a group of fileservers to near-zero-admin
> 	 clients.  It is expected to be easier to keep the namespaces
> 	 uniform if only a small number of servers need to participate.
> 	 HSM solutions would also normally be tied closely to a storage
> 	 server by this mechanism.  Also, for both NFS and SMB
> 	 referrals, it is the client that chooses the target and not
> 	 the server.  The server only provides the targets' information
> 	 and it is up to the client to pick the desirable target to
> 	 access the data.
>
> 	 To distinguish a regular symlink from a reparse point, an
> 	 extensible system attribute will be set on the symlink.  This
> 	 system attribute is only one bit which indicates whether or
> 	 not a symlink contains reparse data.
> 	  
> 	 The reparse data will be stored as the link target.  The
> 	 reparse data is not in file system path format, which is the
> 	 typical format of a link target.  In order to avoid coming up
> 	 with a totaly new format for reparse data as the link target
> 	 we decided to adopt the format used by magic links in BSD:
> 	 (http://www.daemon-systems.org/man/symlink.7.html)
> 	  
> 	 @{REPARSE@{service-type1:data} [@{service-type2:data}]...}
> 	  
> 	 Where some examples of service-type are:
>        
> 	 #define REPARSE_SVC_SMB	"SMB"
> 	 #define REPARSE_SVC_NFS	"NFS"
> 	 #define REPARSE_SVC_HSM	"HSM"
> 	  
> 	 The data for each service will be in string format, which is
> 	 expected to be typically a UUID string.
>
> 	 The pattern above starts with "REPARSE" to distinguish it from
> 	 a other magic links, such as those supported by BSD.  Note
> 	 that this case is not a proposal to support BSD magic links,
> 	 the intent is to avoid precluding the future addition of full
> 	 BSD magic link support.
> 	  
> 	 Multiple services entries can co-exist within the symlink
> 	 data.  It is expected that normally, all entries would resolve
> 	 to the same logical location, e.g.  NFS and CIFS clients would
> 	 find the same files.
> 	  
>        BASIC INTERFACES
> 	  
> 	 There is a need for both userspace and kernel APIs to work
> 	 with reparse points.
> 	  
>        Userspace API
> 	  
> 	 In userspace the symlink(2) system call will be used to set a
> 	 reparse point.  The readlink(2) system call will be used in
> 	 turn to read the reparse data.
> 	  
>        Kernel API
> 	  
> 	 In the kernel, VOP_SYMLINK and VOP_READLINK will be used to
> 	 set/get reparse data.
> 	  
> 	 These interfaces will support all replication, archive and
> 	 copy operations to preserve reparse points without further
> 	 changes.
> 	  
> 	 fop_symlink() needs to be modified to recognize the reparse
> 	 @{REPARSE} tag and pass the appropriate attribute (i.e.
> 	 reparse system attribute) to VOP_SYMLINK to be set on the
> 	 symlink.
>        
>        IMPLEMENTATION OBSERVATIONS
> 	  
> 	 VFS feature registration can be used to determine whether or
> 	 not a file system supports reparse points.
> 	  
> 	 Two things are needed to obtain the reparse point data in the
> 	 kernel.  First, the consumer needs to know that a reparse
> 	 point has been encountered and, second, it needs the vnode
> 	 pointer to the symlink.  The proposal is to enhance VOP_LOOKUP
> 	 to return the attributes of the looked up vnode.  This way
> 	 when the vnode is available the caller can check the
> 	 attributes to determine if the returned vnode is a reparse
> 	 point or a regular symlink.  Here are the old and revised
> 	 signatures of VOP_LOOKUP:
>
> 	 int VOP_LOOKUP(vnode_t *dvp, char *nm, vnode_t **vpp,
> 	      pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr,
> 	      caller_context_t *ct, int *deflags, pathname_t *ppnp)
>
> 	 int VOP_LOOKUP(vnode_t *dvp, char *nm, vnode_t **vpp,
> 	      pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr,
> 	      caller_context_t *ct, int *deflags, pathname_t *ppnp,
> 	      vattr_t *vap)
> 	  
> 	 A vattr_t pointer argument is added at the end to return the
> 	 attributes if it is non-NULL.  This is an optimization so that
> 	 consumers don't have to invoke an extra VOP_GETATTR after
> 	 lookup for obtaining the attributes.
>
> 	 The symlink target size should be increased to 16K to
> 	 accomodate the maximum size supported for MS-DFS referrals by
> 	 Windows.  Applications are expected to query the PATH_MAX and
> 	 SYMLINK_MAX values on the local system using
> 	 pathconf(2)/fpathconf(2).  The value of SYMLINK_MAX would be
> 	 changed to 16K on ZFS.  The value of PATH_MAX will not be
> 	 affected.
>             
> 	 To provide compatibility with other UNIXes (see section 6
> 	 below), sharemgr(1M) would be enhanced to support a "refer"
> 	 option for NFS exports.  This option would only result in
> 	 creation of a reparse point at the specified path and does not
> 	 actually share the path over NFS.
>             
> 	 This case is only about the underlying infrastructure and a
> 	 future case will be presented to deal with details and
> 	 specifics of handling referrals for NFSv4 server.
>
>        SECURITY CONSIDERATIONS
>             
> 	 Referrals are similar to regular symbolic links in that they
> 	 are only pointers to data that could be discovered in some
> 	 other way.  The presence of such a pointer does not compromise
> 	 the security of the target object or data; the target service
> 	 or file system must still enforce security.
>             
>        OPERATION FLOW
>             
> 	 Once a kernel service encounters a reparse point, it reads the
> 	 data using VOP_READLINK and passes the data up to a user space
> 	 daemon (e.g.  reparsed) along with its desired record type.
> 	 Depending on the requested record type the daemon could simply
> 	 extract the information from the passed data and return it to
> 	 kernel or do any other processing necessary to obtain the
> 	 actual referral information e.g.  in the case of FedFS,
> 	 contacting NSDB.  Going through a common user space daemon to
> 	 get the referral data makes this process generic and easily
> 	 expandable for possible future use cases.
>             
> 	 Referral extraction and creation by a userspace daemon can be
> 	 handled via a library plugin architecture for different
> 	 service types.
>             
>        Operation Flow Example
>             
> 	 Here is a simplified example of operation for a CIFS client
> 	 that tries to access a file where the path contains a DFS
> 	 link:
>             
> 	 a) Client tries to access \\srv\root\...\link\...\file.txt
> 	    where:
> 	       'root' is a share (namespace root)
> 	       'link' is a reparse point seen as a folder by client
> 	  
> 	 b) CIFS server does a VOP_LOOKUP for 'link' when it is
> 	    recognized as a reparse point by examining the attributes
> 	    return by VOP_LOOKUP.  At this point a
> 	    STATUS_PATH_NOT_COVERED is returned to client
> 	  
> 	 c) Client sends a "link referral" request to the server.  CIFS
> 	    server uses VOP_READLINK to get the 'link' data and sends
> 	    the data to 'reparsed' daemon via a door call and gets back
> 	    the DFS link targets in a format understandable by the CIFS
> 	    client.  The targets are sent back to the client in
> 	    response to its "link referral" request.
> 	  
> 	 b) Client picks one of the targets and contacts the target
> 	    server to access 'file.txt'
> 	  
>        NFS REFERRAL IN OTHER UNIXES
>             
> 	 FS referrals have been implemented in other major UNIX
> 	 distributions such as Linux, AIX and HP-UX but there is no
> 	 unified approach or implementation.
>
> 	 Linux, AIX and HP-UX specify referrals as an NFS export
> 	 option.  The option format is basically the same in all three
> 	 operating systems (refer=path at host) but the presentation is
> 	 somewhat different in each case:
>
> 	 - In Linux a referral is presented as a mount point.
> 	 - In HP-UX a referral is a file system partition or logical volume.
> 	 - In AIX a special object is used to represent a referral.
>
> 	 These are all mechanisms to trigger a change in namespace
> 	 while resolving a path.
>       
> 	 This proposal is somewhat aligned with the AIX approach but
> 	 does not require a new object type to be defined, which has
> 	 the advantage of not impacting existing applications.  As
> 	 mentioned previously, an NFS "refer" option will be supported
> 	 to provide option format compatibility.
>       
> 	 Additionally, the Solaris requirements include support for
> 	 both NFS and SMB referrals whereas these other operating
> 	 systems only support NFS referrals, and they do not provide
> 	 native SMB support.  For the Solaris operating system, this
> 	 proposal provides a generic solution to support multiple,
> 	 disparate referral mechanisms without placing restrictions on
> 	 the format required by each mechanism.
>     
> 	 The following links provide a bit more details about each OS
> 	 discussed above:
>             
>          http://www.citi.umich.edu/projects/nfsv4/linux/using-referrals.html
>          http://nfsv4.bullopensource.org/doc/migration-and-replication-0.2.pdf
>          http://docs.hp.com/en/5900-0306/ch01s11.html?jumpid=reg_R1002_USEN
>          http://docs.hp.com/en/13578/nfsv4_whitepaper.pdf 
>          http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.commadmn/doc/commadmndita/nfs_referrals.htm 
>
>  INTERFACE TABLE
>
>                           |Proposed       |Specified   |
>                           |Stability      |in what     |
>   Interface Name          |Classification |Document?   | Comments
>   ===========================================================================
>    XAT_REPARSE            |Consolidation  |This        |Reparse extensible
>                           |Private        |Document    |attribute
>                           |               |            |
>    VOP_LOOKUP, fop_lookup |Contracted     |This        |Added new argument:
>                           |Consolidation  |Document    |vattr_t *vap 
>                           |Private*        |            |
>                           |               |            |
>    Reparse token syntax   |Committed      |This        |
>                           |Private        |Document    |
>                           |               |            |
>    SYMLINK_MAX            |Committed      |This        |Increased to 16K
>                           |               |Document    |
>
>  * The project's deliverables will all go into the OS/NET
>    Consolidation, so no contracts are required.
>
> 6. Resources and Schedule:
>
>    6.4. Product Approval Committee requested information:
>    	6.4.1. Consolidation or Component Name:
> 	       ON
>
>    6.5. ARC review type:
>         FastTrack
>
>   




More information about the opensolaris-arc mailing list