This wiki is obsolete, see the NorduGrid web pages for up to date information.

Data Staging/URL Options

From NorduGrid
Jump to navigationJump to search

Solution to URL options problem

URL options are an ARC extension to standard URLs and are used to specify ARC-specific options to input and output files, for example whether the file should be cached or which type of checksum to calculate. The syntax is

<protocol>://host[:port][;option[;option[...]]]/<file>

where option has the syntax name=value. Furthermore, indexing service URLs can have the URL of a physical replica embedded within them:

<protocol>://[url[|url[...]]@]<host>[:port]/<lfn>

This syntax can become complicated and confusing for users. URLs for input and output data are used in three different areas of arc, and with each is described with the solution currently implemented or planned to be implemented:

  • Command line tools such as arccp
    • Use command line options, for example
arccp --location=srm://srm.ndgf.org/file1 --dest-option checksum=md5 myfile lfc://lfc.ndgf.org/grid/atlas/file1
  • Job description - XRSL
    • Add attributes to each input or output file, for example
(inputfiles = ("file1" "lfc://lfc.ndgf.org/grid/atlas/file1" "cache=yes"))
(outputfiles = ("output" "lfc://lfc.ndgf.org/grid/atlas/file2" "location=srm://srm.ndgf.org/file2"))
  • Job description - JSDL
    • Use sub-nodes of Source and Target
<DataStaging>
  <Filename>file1</Filename>
    <Source>
      <URI>lfc://lfc.ndgf.org/grid/atlas/file1</URI>
      <URIOption>cache=yes</URIOption>
    </Source>
    <Target>
      <URI>lfc://lfc.ndgf.org/grid/atlas/file1.out</URI>
      <Location>
        <URI>srm://srm.org/path/file1.out</URI>
        <URIOption>spacetoken=MYSPACETOKEN</URIOption>
      </Location>
    </Target>
...
  • Files used by the downloader and uploader (job.id.input and job.id.output)
    • Add key-value attribute pairs after the local and remote filenames, for example
file1 lfc://lfc.ndgf.org/grid/atlas/file1 cache=yes
  • As these files are internal to ARC and rarely seen by users, they can retain the old syntax for now
    • This means advanced users using dynamic output files still have to know the old syntax