This wiki is obsolete, see the NorduGrid web pages for up to date information.

ARC Data Clients

From NorduGrid
Jump to navigationJump to search

ARC provides a set of tools to manage data on the Grid through a variety of protocols. As with job management, a set of command line tools and a library are available as part of the ARC client package. ARC's data management works at a file-level granularity and can be used for uploading, downloading, moving around and deleting files. It does not cover file collections (except for a basic awareness of file-system like hierarchies) or byte-level access. Third-party copy (where data is transferred directly between two Grid storage endpoints) is supported through EMI datalib (see below) - in which ARC is integrated with GFAL and can use GFAL for third-party copy.

Quick Links

Supported Protocols

  • file - local file system
  • HTTP/HTTPS/HTTPG
  • FTP
  • GridFTP - FTP with GSI security
  • SRM - Storage Resource Manager
  • LDAP
  • LFC - LCG File Catalog
  • RFIO (through GFAL2 plugin)
  • DCAP - POSIX-like access to dCache (through GFAL2 plugin)
  • GSIDCAP - POSIX-like access to dCache with GSI security (through GFAL2 plugin)
  • xrootd (read-only)

ARC Data CLI

The following commands are available:

  • arccp - copy files on and off Grid and between Grid storage
  • arcls - list files on Grid storage
  • arcrm - delete files on Grid storage
  • arcmkdir (in 12.05 and later) - create directories on Grid storage
  • arcrename (in 13.02 and later) - rename files or directories on Grid storage

For a full description of each command along with relevant options and examples see their man pages or the ARC Clients User Manual.

Arc Data Library

The ARC Data Library performs data transfer and other data management operations and is used by the CLI as well as ARC's Computing Element (A-REX) for staging data for jobs. It uses a plugin approach to dynamically load code for each protocol at runtime as required. Therefore some the protocols listed above may or may not be available depending on the packages installed. The API description generated using doxygen is available here.

Java and Python bindings also exist for the data library.

EMI datalib

An aim of the EMI project was to consolidate data management tools from the various middlewares into a single entity, called EMI datalib. It was decided to combine parts of ARC and gLite tools, which perform roughly the same functionality but in a slightly different way. The plan is for ARC's data library to add new procotol plugins through a single plugin using GFAL2, which also provides third-party transfer functionality. EMI datalib was released in EMI-3.

Using GFAL2 with ARC

  • Install nordugrid-arc-client-tools (client) or nordugrid-arc-compute-element (server) meta-packages
  • Install the nordugrid-arc-plugins-gfal package, available from the same location as the above packages
    • This installs GFAL2 common packages as dependencies
  • Install gfal2-plugin-* packages, or a subset based on the required protocols
    • This may also install extra packages for each protocol
  • ARC will then automatically use GFAL2 for protocols that ARC does not support
  • Third-party transfer, where the destination pulls from the source, is also supported along with basic transfer monitoring
    • In API: DataPoint::Transfer3rdParty() where an optional monitoring callback can be set
    • In CLI: arccp -3 where simple transfer progress can be seen by adding -i option
  • For third-party transfer using SRM, unless the full SRM URL is used (eg srm://srm.host.org:8443/srm/managerv2?SFN=/path/to/file), GFAL2 requires looking up a BDII for any missing information. The BDII host is set by the environment variable LCG_GFAL_INFOSYS and defaults to the WLCG top-BDII (and so if you are only using WLCG storage elements you should not have to worry about this).

Building ARC source to include GFAL

  • Install all gfal2* packages, including gfal2-devel to build ARC's GFAL plugin and gfal2-plugin-* to get all the supported protocols
    • These are available from the epel-testing repository for RHEL5 and 6 and associated flavours and latest Fedora versions (16 and later)
  • GFAL support is disabled in ARC by default, to enable run "./configure --enable-gfal"
  • Build and install ARC as usual
  • Now ARC should use GFAL2 as above

Moving between ARC and LCG data tools

LCG-utils by the gLite project contains command line tools which perform similar functionality to ARC data tools. The following table compares the functionality of the two tools (accurate for lcg_util v1.14 and ARC v3.0.0).

Function LCG-utils ARC Comment
Add/remove alias in LFC lcg-aa/lcg-ra n/a Can also use lfc-ln/lfc-rm
Stage file from tape to disk lcg-bringonline n/a
Copy files lcg-cp arccp
Copy files and register in catalog lcg-cr arccp -L With ARC physical location is not generated automatically so must be specified
Delete files lcg-del arcrm
Get checksum lcg-get-checksum arcls -l In ARC availability of checksum depends on the protocol
Get transport URLs lcg-gt/lcg-getturls n/a
Resolve GUIDs, LFNs and SURLs lcg-la/lcg-lg/lcg-lr arcls In ARC SURLs cannot be resolved to GUIDs or LFNs. To resolve GUIDs to LFNs and vice versa use arcls -m. To resolve GUIDs or LFNs to SURLs use arcls -L.
List files lcg-ls arcls
Replicate files lcg-rep arccp -3 Third party transfer is available in ARC with the -3 option in version 13.02 and above and if GFAL plugins are installed. Without -3 the transfer goes through the client.
Register existing file in catalog lcg-rf arccp -T
Set request done lcg-sd n/a
Get SRM space token information lcg-stmd n/a
Unregister file in catalog lcg-uf n/a arcrm lfc://... removes all physical replicas as well as the catalog entry
Create directory n/a arcmkdir Protocol-specific tools exist such as lfc-mkdir, srmmkdir
Rename file n/a arcrename Protocol-specific tools exist such as lfc-rename, srmmv

Notes

  • ARC does not use BDII for endpoint information
  • SRMv2.2 is always assumed by ARC for srm:// URLs unless a different service path is explicitly specified
  • The full URL to LFC must always be specified with ARC (eg lfc://hostname/grid/..), there is no equivalent to setting LFC_HOST
  • In ARC LFC guids are specified like lfc://hostname/:guid=abcd...

Open Data Management Bugs

Please report any bugs to bugzilla under the "Data Management" component.

New and reopened

IDPVersionSummary (3 tasks) ComponentAssignee
4165P3unspecified/var/log/arc should be created automatically upon installation/start of serviceData ManagementDavid Cameron
4171P3unspecifieddatadelivery service - ARC 7 - SOAP message: TCP: GENERIC_ERRORData ManagementDavid Cameron
4237P36.21.1arc-datadelivery-service crashesData ManagementAleksandr Konstantinov


Blockers and criticals

no bugzilla tickets were found


General

IDPSeverityVersionSummary (10 tasks) ComponentAssignee
3965P3major6.7.0multiple 'du' on cacheData ManagementDavid Cameron
4191P3major6.20.1No data staging happening even though ARC thinks it isData ManagementAleksandr Konstantinov
4189P3normalunspecifiedFile lock and CACHE_WAITData ManagementAleksandr Konstantinov
3363P3normal13.11.1arcls hangs randomly on a typo in gsiftp URLData ManagementDavid Cameron
3368P3normal13.11.1Expired gridftp host certificate causes arcls to segfaultData ManagementDavid Cameron
4019P3normallatestcommon input files block job submission due to priority inconsistencyData ManagementDavid Cameron
4207P3normalunspecifiedHardcoded limit at 200 jobs in preparing?Data ManagementAleksandr Konstantinov
3497P3normal15.03.3Skip heavily loaded delivery serversData ManagementDavid Cameron
4066P3normal6.15.0Use relative URIs by default in HTTPData ManagementDavid Cameron
3837P3normallatestdatadelivery transfers fail massively if cachedir is in failed stateData ManagementDavid Cameron


Enhancements

IDPVersionSummary (2 tasks) ComponentAssignee
4095P36.16.1Full support for multiple checksumsData ManagementDavid Cameron
3502P515.03.3bulk arclsData ManagementDavid Cameron


Feature requests

IDPVersionSummary (2 tasks) ComponentAssignee
4183P3unspecifiedDo not download more input data if cache is above max level as configured in arc.confData ManagementAleksandr Konstantinov
1925P50.8.2.2allow caching of output filesData ManagementDavid Cameron