This wiki is obsolete, see the NorduGrid web pages for up to date information.

Data Staging/Protocols overview

From NorduGrid
Jump to navigationJump to search

Protocols overview

There follows a list of protocols which may or may not be supported by our implementation. Protocols are ordered, from pure data transfer to pure data indexing.

HTTP, HTTPS

  • Request/response exchange is synchronous. Communication is initiated and controlled by client.
  • Can redirect.
  • Can report temporary inavailability. But has no support for avalability estimation or request.
  • Can transfer partially (chunks) if server allows
  • Transfer can't be put on hold
  • Transfer can be canceled at any time.
  • Transfer can start from any place if server allows
  • No integrated bandwidth control
  • No way to ensure content returned by different request is the same (dynamic content)
  • Communication may include various metadata, both standard and non-standard - creation time, file size, content type are examples of standard and common metadata.
  • Can report metadata without doing actual transfer
  • Can compress data being transfered
  • Can transfer in both directions (upload and download)
  • Third party transfers not supported
  • It is common to have it wrapped into TLS/SSL layer

FTP/GridFTP

  • Request/response exchange is synchronous, but server may send multiple responses per single request
  • Control and data channels are separated
  • Communication is initiated and controlled by client.
  • Can't redirect.
  • Can transfer partially (chunks) - if GridFTP
  • Transfer can't be put on hold
  • Transfer can be canceled at any time.
  • Transfer can start from any place - if GridFTP
  • Integrated bandwidth control - if GridFTP (need to check)
  • No way to ensure content returned by different request is the same (dynamic content possible but not common)
  • Reports metadata without doing actual transfer
  • Allows searching/listing for available data entities
  • Can transfer in both directions (upload and download)
  • Third party transfers supported
  • GridFTP adds wrapping for control and data channels into secure communication channels

rfio

  • Access protocol to data stored on the CASTOR system developed by CERN
  • To be decided whether rfio is supported

afs

  • Networked file system based on Kerberos authentication
  • Location transparent
  • All the files stored in afs are available through an afs client connecting to an afs server
  • To be decided whether afs is supported

dcap

Chelonia

  • Allows file system-like view of logical filename space, with operation like mkdir, stat, mv
  • Each file can have multiple replicas in different locations
  • Each file has a unique GUID
  • Files have Logical Names (LNs, paths in the namespace)
  • Multiple LNs can point to the same GUID (hardlinks)
  • Each file has metadata (including size, checksum, location of replicas)
  • Communication is synchronous, and initiated by the client
  • Negotiation is separate from file transfer, is via SOAP/HTTPS
  • Can report metadata without doing actual transfer
  • File transfer has as pluggable architecture, currently HTTP(S) is supported
  • After negotiation a one-time transfer URL is generated (TURL), which currently has no lifetime:
    • if via the negotiation interface the client gets the TURL, then it can store that TURL, and use it later
    • however, the TURL can break (that replica could disappear, storage node could go offline), then a new negotiation is needed for a new TURL
    • (in the current Hopi implementation) the TURL can be used only once: no hold or resume - but chunks are supported (if request for chunks are close in time)
  • It is possible that a file has no valid replicas at a given point of time, but the same file may have valid replicas later
  • The negotiation is between a Bartender and a client. There should be multiple Bartenders, and if a Bartender cannot be accessed, another Bartenders should be tried. If none of the Bartenders can be accessed, it should be tried again later.
  • If the Bartender replies that there is no such file, then there is not much point in trying it again later (except if the user uploads the file later)
  • For SOAP/HTTPS and HTTPS file transfer, X.509 certificate needed - the access control is based on the DN of the user or the name of the VO (from the extension of the user's X.509 certificate)
  • No space management
  • The current arc:// URL implementation has the LN in the URL (e.g. arc:///niif/zsombor/myjobs refers to the logical name '/niif/zsombor/myjobs'), but the LN in itself is not enough the access Chelonia: the client needs a Bartender URL
    • If the client knows some ISIS URLs, then it can ask for Bartender URLs
    • If the client knows some Bartender URLs, then it can use those
    • If the client know only one Bartender URL, that could be a problem, if that Bartender is offline the client cannot access Chelonia even if there are other Bartenders in the system
    • The arc:// URL can contain one Bartender URL in the form of: arc:///niif/zsombor/myjobs?BartenderURL=<URL> which has the same problem of having only one URL like the previous point, and it is ugly.

xrootd

SRB

SRM

  • Specifications for v2.2 of the protocol is at http://sdm.lbl.gov/srm-wg/doc/SRM.v2.2.html
  • Protocol for negotiation of file transfer, not transfer itself
  • Negotiation of transfer URL for read and write is asynchronous - client must make request and poll until request is processed
  • Other operations are synchronous (but listing operations can also be asynchronous)
  • It is acceptable for requests to take an arbitrary length of time, due to physical file replication or staging from low-latency media behind the scenes
  • Supported physical transfer protocols can include GridFTP, HTTP(S), dcap, xrootd, rfio
  • If a request is satisfied the transfer URL is "pinned" (guaranteed to be non-exclusively available) for a set length of time
  • Pins should be released on completion of physical transfer whether successful or not
  • Requests can't be put on hold
  • Requests can be cancelled at any time (this does not cancel the physical transfer)
  • Provides space management by applying "space tokens" to files representing areas of space with limited capacity
  • Reports metadata without doing actual transfer
  • Allows user to set various metadata such as access latency to files, lifetime of files (some features are not implemented in any known implementation)
  • Allows searching/listing for available data entities
  • Can transfer in both directions (upload and download)
  • Third party transfers supported

LFC

  • File catalog developed by LCG
  • Allows file system-like view of logical filename space
  • Each file can have multiple replicas in different locations
  • Each file has a unique GUID
  • Metadata similar to file system with addition of checksum and flags for fileclass and migration status (these last two are hangovers from the CASTOR code on which LFC is based and do not have practical use in LFC)
  • Many file system operations supported eg mkdir, ln, stat, readdir
  • Supports ACLs and unix-style permissions on files and directories
  • Communication is synchronous and initiated and controlled by client

RLS

  • Replica Location Service originally developed by Globus and EDG
  • Maps file names in a flat logical name space to physical replicas
  • Logical file names have attributes such as creation date, size as well as user-defined attributes
  • Coarse-grained security policy - read/write access is granted per DN to entire catalog
  • To be decided whether RLS is supported