This wiki is obsolete, see the NorduGrid web pages for up to date information.
ARC Data Clients
ARC provides a set of tools to manage data on the Grid through a variety of protocols. As with job management, a set of command line tools and a library are available as part of the ARC client package. ARC's data management works at a file-level granularity and can be used for uploading, downloading, moving around and deleting files. It does not cover file collections (except for a basic awareness of file-system like hierarchies) or byte-level access. Third-party copy (where data is transferred directly between two Grid storage endpoints) is supported through EMI datalib (see below) - in which ARC is integrated with GFAL and can use GFAL for third-party copy.
Quick Links
Supported Protocols
- file - local file system
- HTTP/HTTPS/HTTPG
- FTP
- GridFTP - FTP with GSI security
- SRM - Storage Resource Manager
- LDAP
- LFC - LCG File Catalog
- RFIO (through GFAL2 plugin)
- DCAP - POSIX-like access to dCache (through GFAL2 plugin)
- GSIDCAP - POSIX-like access to dCache with GSI security (through GFAL2 plugin)
- xrootd (read-only)
ARC Data CLI
The following commands are available:
- arccp - copy files on and off Grid and between Grid storage
- arcls - list files on Grid storage
- arcrm - delete files on Grid storage
- arcmkdir (in 12.05 and later) - create directories on Grid storage
- arcrename (in 13.02 and later) - rename files or directories on Grid storage
For a full description of each command along with relevant options and examples see their man pages or the ARC Clients User Manual.
Arc Data Library
The ARC Data Library performs data transfer and other data management operations and is used by the CLI as well as ARC's Computing Element (A-REX) for staging data for jobs. It uses a plugin approach to dynamically load code for each protocol at runtime as required. Therefore some the protocols listed above may or may not be available depending on the packages installed. The API description generated using doxygen is available here.
Java and Python bindings also exist for the data library.
EMI datalib
An aim of the EMI project was to consolidate data management tools from the various middlewares into a single entity, called EMI datalib. It was decided to combine parts of ARC and gLite tools, which perform roughly the same functionality but in a slightly different way. The plan is for ARC's data library to add new procotol plugins through a single plugin using GFAL2, which also provides third-party transfer functionality. EMI datalib was released in EMI-3.
Using GFAL2 with ARC
- Install nordugrid-arc-client-tools (client) or nordugrid-arc-compute-element (server) meta-packages
- Install the nordugrid-arc-plugins-gfal package, available from the same location as the above packages
- This installs GFAL2 common packages as dependencies
- Install gfal2-plugin-* packages, or a subset based on the required protocols
- This may also install extra packages for each protocol
- ARC will then automatically use GFAL2 for protocols that ARC does not support
- Third-party transfer, where the destination pulls from the source, is also supported along with basic transfer monitoring
- In API: DataPoint::Transfer3rdParty() where an optional monitoring callback can be set
- In CLI: arccp -3 where simple transfer progress can be seen by adding -i option
- For third-party transfer using SRM, unless the full SRM URL is used (eg srm://srm.host.org:8443/srm/managerv2?SFN=/path/to/file), GFAL2 requires looking up a BDII for any missing information. The BDII host is set by the environment variable LCG_GFAL_INFOSYS and defaults to the WLCG top-BDII (and so if you are only using WLCG storage elements you should not have to worry about this).
Building ARC source to include GFAL
- Install all gfal2* packages, including gfal2-devel to build ARC's GFAL plugin and gfal2-plugin-* to get all the supported protocols
- These are available from the epel-testing repository for RHEL5 and 6 and associated flavours and latest Fedora versions (16 and later)
- GFAL support is disabled in ARC by default, to enable run "./configure --enable-gfal"
- Build and install ARC as usual
- Now ARC should use GFAL2 as above
Moving between ARC and LCG data tools
LCG-utils by the gLite project contains command line tools which perform similar functionality to ARC data tools. The following table compares the functionality of the two tools (accurate for lcg_util v1.14 and ARC v3.0.0).
Function | LCG-utils | ARC | Comment |
---|---|---|---|
Add/remove alias in LFC | lcg-aa/lcg-ra | n/a | Can also use lfc-ln/lfc-rm |
Stage file from tape to disk | lcg-bringonline | n/a | |
Copy files | lcg-cp | arccp | |
Copy files and register in catalog | lcg-cr | arccp -L | With ARC physical location is not generated automatically so must be specified |
Delete files | lcg-del | arcrm | |
Get checksum | lcg-get-checksum | arcls -l | In ARC availability of checksum depends on the protocol |
Get transport URLs | lcg-gt/lcg-getturls | n/a | |
Resolve GUIDs, LFNs and SURLs | lcg-la/lcg-lg/lcg-lr | arcls | In ARC SURLs cannot be resolved to GUIDs or LFNs. To resolve GUIDs to LFNs and vice versa use arcls -m. To resolve GUIDs or LFNs to SURLs use arcls -L. |
List files | lcg-ls | arcls | |
Replicate files | lcg-rep | arccp -3 | Third party transfer is available in ARC with the -3 option in version 13.02 and above and if GFAL plugins are installed. Without -3 the transfer goes through the client. |
Register existing file in catalog | lcg-rf | arccp -T | |
Set request done | lcg-sd | n/a | |
Get SRM space token information | lcg-stmd | n/a | |
Unregister file in catalog | lcg-uf | n/a | arcrm lfc://... removes all physical replicas as well as the catalog entry |
Create directory | n/a | arcmkdir | Protocol-specific tools exist such as lfc-mkdir, srmmkdir |
Rename file | n/a | arcrename | Protocol-specific tools exist such as lfc-rename, srmmv |
Notes
- ARC does not use BDII for endpoint information
- SRMv2.2 is always assumed by ARC for srm:// URLs unless a different service path is explicitly specified
- The full URL to LFC must always be specified with ARC (eg lfc://hostname/grid/..), there is no equivalent to setting LFC_HOST
- In ARC LFC guids are specified like lfc://hostname/:guid=abcd...
Open Data Management Bugs
Please report any bugs to bugzilla under the "Data Management" component.
New and reopened
ID | P | Version | Summary (3 tasks) ⇒ | Component | Assignee |
---|---|---|---|---|---|
4165 | P3 | unspecified | /var/log/arc should be created automatically upon installation/start of service | Data Management | David Cameron |
4171 | P3 | unspecified | datadelivery service - ARC 7 - SOAP message: TCP: GENERIC_ERROR | Data Management | David Cameron |
4237 | P3 | 6.21.1 | arc-datadelivery-service crashes | Data Management | Aleksandr Konstantinov |
Blockers and criticals
General
ID | P | Severity | Version | Summary (10 tasks) ⇒ | Component | Assignee |
---|---|---|---|---|---|---|
3965 | P3 | major | 6.7.0 | multiple 'du' on cache | Data Management | David Cameron |
4191 | P3 | major | 6.20.1 | No data staging happening even though ARC thinks it is | Data Management | Aleksandr Konstantinov |
4189 | P3 | normal | unspecified | File lock and CACHE_WAIT | Data Management | Aleksandr Konstantinov |
3363 | P3 | normal | 13.11.1 | arcls hangs randomly on a typo in gsiftp URL | Data Management | David Cameron |
3368 | P3 | normal | 13.11.1 | Expired gridftp host certificate causes arcls to segfault | Data Management | David Cameron |
4019 | P3 | normal | latest | common input files block job submission due to priority inconsistency | Data Management | David Cameron |
4207 | P3 | normal | unspecified | Hardcoded limit at 200 jobs in preparing? | Data Management | Aleksandr Konstantinov |
3497 | P3 | normal | 15.03.3 | Skip heavily loaded delivery servers | Data Management | David Cameron |
4066 | P3 | normal | 6.15.0 | Use relative URIs by default in HTTP | Data Management | David Cameron |
3837 | P3 | normal | latest | datadelivery transfers fail massively if cachedir is in failed state | Data Management | David Cameron |
Enhancements
ID | P | Version | Summary (2 tasks) ⇒ | Component | Assignee |
---|---|---|---|---|---|
4095 | P3 | 6.16.1 | Full support for multiple checksums | Data Management | David Cameron |
3502 | P5 | 15.03.3 | bulk arcls | Data Management | David Cameron |
Feature requests
ID | P | Version | Summary (2 tasks) ⇒ | Component | Assignee |
---|---|---|---|---|---|
4183 | P3 | unspecified | Do not download more input data if cache is above max level as configured in arc.conf | Data Management | Aleksandr Konstantinov |
1925 | P5 | 0.8.2.2 | allow caching of output files | Data Management | David Cameron |