ARC Data Clients

From NorduGrid
Jump to: navigation, search

ARC provides a set of tools to manage data on the Grid through a variety of protocols. As with job management, a set of command line tools and a library are available as part of the ARC client package. ARC's data management works at a file-level granularity and can be used for uploading, downloading, moving around and deleting files. It does not cover file collections (except for a basic awareness of file-system like hierarchies) or byte-level access. Third-party copy (where data is transferred directly between two Grid storage endpoints) is supported through EMI datalib (see below) - in which ARC is integrated with GFAL and can use GFAL for third-party copy.

Quick Links

Supported Protocols

  • file - local file system
  • HTTP/HTTPS/HTTPG
  • FTP
  • GridFTP - FTP with GSI security
  • SRM - Storage Resource Manager
  • LDAP
  • LFC - LCG File Catalog
  • RFIO (through GFAL2 plugin)
  • DCAP - POSIX-like access to dCache (through GFAL2 plugin)
  • GSIDCAP - POSIX-like access to dCache with GSI security (through GFAL2 plugin)
  • xrootd (read-only)

ARC Data CLI

The following commands are available:

  • arccp - copy files on and off Grid and between Grid storage
  • arcls - list files on Grid storage
  • arcrm - delete files on Grid storage
  • arcmkdir (in 12.05 and later) - create directories on Grid storage
  • arcrename (in 13.02 and later) - rename files or directories on Grid storage

For a full description of each command along with relevant options and examples see their man pages or the ARC Clients User Manual.

Arc Data Library

The ARC Data Library performs data transfer and other data management operations and is used by the CLI as well as ARC's Computing Element (A-REX) for staging data for jobs. It uses a plugin approach to dynamically load code for each protocol at runtime as required. Therefore some the protocols listed above may or may not be available depending on the packages installed. The API description generated using doxygen is available here.

Java and Python bindings also exist for the data library.

EMI datalib

An aim of the EMI project was to consolidate data management tools from the various middlewares into a single entity, called EMI datalib. It was decided to combine parts of ARC and gLite tools, which perform roughly the same functionality but in a slightly different way. The plan is for ARC's data library to add new procotol plugins through a single plugin using GFAL2, which also provides third-party transfer functionality. EMI datalib was released in EMI-3.

Using GFAL2 with ARC

  • Install nordugrid-arc-client-tools (client) or nordugrid-arc-compute-element (server) meta-packages
  • Install the nordugrid-arc-plugins-gfal package, available from the same location as the above packages
    • This installs GFAL2 common packages as dependencies
  • Install gfal2-plugin-* packages, or a subset based on the required protocols
    • This may also install extra packages for each protocol
  • ARC will then automatically use GFAL2 for protocols that ARC does not support
  • Third-party transfer, where the destination pulls from the source, is also supported along with basic transfer monitoring
    • In API: DataPoint::Transfer3rdParty() where an optional monitoring callback can be set
    • In CLI: arccp -3 where simple transfer progress can be seen by adding -i option
  • For third-party transfer using SRM, unless the full SRM URL is used (eg srm://srm.host.org:8443/srm/managerv2?SFN=/path/to/file), GFAL2 requires looking up a BDII for any missing information. The BDII host is set by the environment variable LCG_GFAL_INFOSYS and defaults to the WLCG top-BDII (and so if you are only using WLCG storage elements you should not have to worry about this).

Building ARC source to include GFAL

  • Install all gfal2* packages, including gfal2-devel to build ARC's GFAL plugin and gfal2-plugin-* to get all the supported protocols
    • These are available from the epel-testing repository for RHEL5 and 6 and associated flavours and latest Fedora versions (16 and later)
  • GFAL support is disabled in ARC by default, to enable run "./configure --enable-gfal"
  • Build and install ARC as usual
  • Now ARC should use GFAL2 as above

Moving between ARC and LCG data tools

LCG-utils by the gLite project contains command line tools which perform similar functionality to ARC data tools. The following table compares the functionality of the two tools (accurate for lcg_util v1.14 and ARC v3.0.0).

Function LCG-utils ARC Comment
Add/remove alias in LFC lcg-aa/lcg-ra n/a Can also use lfc-ln/lfc-rm
Stage file from tape to disk lcg-bringonline n/a
Copy files lcg-cp arccp
Copy files and register in catalog lcg-cr arccp -L With ARC physical location is not generated automatically so must be specified
Delete files lcg-del arcrm
Get checksum lcg-get-checksum arcls -l In ARC availability of checksum depends on the protocol
Get transport URLs lcg-gt/lcg-getturls n/a
Resolve GUIDs, LFNs and SURLs lcg-la/lcg-lg/lcg-lr arcls In ARC SURLs cannot be resolved to GUIDs or LFNs. To resolve GUIDs to LFNs and vice versa use arcls -m. To resolve GUIDs or LFNs to SURLs use arcls -L.
List files lcg-ls arcls
Replicate files lcg-rep arccp -3 Third party transfer is available in ARC with the -3 option in version 13.02 and above and if GFAL plugins are installed. Without -3 the transfer goes through the client.
Register existing file in catalog lcg-rf arccp -T
Set request done lcg-sd n/a
Get SRM space token information lcg-stmd n/a
Unregister file in catalog lcg-uf n/a arcrm lfc://... removes all physical replicas as well as the catalog entry
Create directory n/a arcmkdir Protocol-specific tools exist such as lfc-mkdir, srmmkdir
Rename file n/a arcrename Protocol-specific tools exist such as lfc-rename, srmmv

Notes

  • ARC does not use BDII for endpoint information
  • SRMv2.2 is always assumed by ARC for srm:// URLs unless a different service path is explicitly specified
  • The full URL to LFC must always be specified with ARC (eg lfc://hostname/grid/..), there is no equivalent to setting LFC_HOST
  • In ARC LFC guids are specified like lfc://hostname/:guid=abcd...

Open Data Management Bugs

Please report any bugs to bugzilla under the "Data Management" component.

New and reopened

no bugzilla tickets were found


Blockers and criticals

no bugzilla tickets were found


General

IDPSeverityVersionSummary (12 tasks) ComponentAssignee
3393P3major13.11.1Possible data corruption bug in arccpData ManagementDavid Cameron
3401P3normal13.11.2arcls hanging on listing SRM directoryData ManagementDavid Cameron
3418P3normal13.11.2Don't look up ACIX if primary replica is first in preferred patternData ManagementDavid Cameron
3454P3normal13.11.2arccp with full srm path copies to wrong filenameData ManagementDavid Cameron
3497P3normal15.03.3Skip heavily loaded delivery serversData ManagementDavid Cameron
3581P3normallatestString matching error in DTR node selectionData ManagementDavid Cameron
3620P3normal15.03.10A-rex seems to loose track of downloadsData ManagementDavid Cameron
3627P3normallatestARC acts badly with SRM checked out TURLsData ManagementDavid Cameron
3637P3normal15.03.11arcget with multiple jobs crashesData ManagementDavid Cameron
3042P3normal12.05.1Segfault in arccp when compiling against GT 5.2.3Data ManagementDavid Cameron
3363P3normal13.11.1arcls hangs randomly on a typo in gsiftp URLData ManagementDavid Cameron
3368P3normal13.11.1Expired gridftp host certificate causes arcls to segfaultData ManagementDavid Cameron


Enhancements

IDPVersionSummaryComponentAssignee
3502P315.03.3bulk arclsData ManagementDavid Cameron


Feature requests

IDPVersionSummary (5 tasks) ComponentAssignee
2505P311.05arcgacl utility (like ngacl) is neededData ManagementAleksandr Konstantinov
2614P3SVNautomatic downtime handling for transfersData ManagementDavid Cameron
3248P313.02.3ARC data management clients should work via a proxyData ManagementDavid Cameron
3624P315.03.10Data delivery service can only listen to one network interface.Data ManagementDavid Cameron
1925P50.8.2.2allow caching of output filesData ManagementDavid Cameron