This wiki is obsolete, see the NorduGrid web pages for up to date information.
Cache Service
This page is obsolete. For up to date documentation see http://www.nordugrid.org/arc/arc6/tech/data/candypond.html
This page describes technical details of the ARC Cache Service. The software is still in the development stage and may have bugs or missing functionality. Please report problems to http://bugzilla.nordugrid.org/.
Description and Purpose
The ARC caching system automatically saves to local disk job input files for use with future jobs. The cache is completely internal to the computing element and cannot be accessed or manipulated from the outside. The ARC Cache Service exposes various operations of the cache and can be especially useful in a pilot job model where input data for jobs is not known until the job is running on the worker node.
Installation
The cache service is designed to run alongside any standard production (>= 0.8.x) installation of the ARC computing element and comes in the package nordugrid-arc-cache-service, available in the usual NorduGrid or EMI repositories.
Configuring and Running
ARC 1.x and 2.x (releases 11.05 and 12.05)
The cache service runs as a separate process in a separate HED container, listening on port 60001 on path /cacheservice. It is assumed that there is an existing arc.conf configuration file for A-REX, which is read to get information on the caches. There is no other configuration needed for the cache service. It can be started with
$(ARC_LOCATION)/etc/init.d/arc-cache-service start
Messages are logged to /var/log/arc/cache-service.log.
ARC >= 3.x (releases 13.02 and above)
The cache service runs inside the same HED container as A-REX, and so is accessible at the same hostname and port as the A-REX web-service interface, at path "/cacheservice". The following option in the [grid-manager] section of arc.conf enables it
enable_cache_service=yes
The A-REX web-service interface must also be enabled through the arex_mount_point option. No other configuration is needed. The cache service is automatically started when A-REX is started and so does not need to be started separately. Messages are logged to the A-REX log. The same instance of the DTR data staging framework is used by both A-REX and the cache service so DTR must be enabled for the cache service to run (it is enabled by default).
Setting up the Runtime Environment
The Runtime Environment (RTE) advertises to clients that the cache service is available and sets up the environment for the job to use it, by setting an environment variable pointing to the service. The following template can be used to create an RTE in <your rte directory>/ENV/ARC-CACHE-SERVICE
#/bin/sh! case "$1" in 0) export ARC_CACHE_SERVICE=https://hostname:443/cacheservice ;; 1) return 0;; 2) return 0;; *) return 1;; esac
The only thing you need to change is to substitute your real host name for hostname. The proxy certification is also required on the worker node and so another runtime environment (eg ENV/PROXY) is needed for that, for example
#!/bin/bash x509_cert_dir="/etc/grid-security/certificates" case $1 in 0) mkdir -pv $joboption_directory/arc/certificates/ cp -rv $x509_cert_dir/ $joboption_directory/arc cat ${joboption_controldir}/job.${joboption_gridid}.proxy >$joboption_directory/user.proxy ;; 1) export X509_USER_PROXY=$RUNTIME_JOB_DIR/user.proxy export X509_USER_CERT=$RUNTIME_JOB_DIR/user.proxy export X509_CERT_DIR=$RUNTIME_JOB_DIR/arc/certificates ;; 2) : ;; esac
Client Usage
A Python client for the cache service is available in the nordugrid-arc-python package and in the source tree. It requires the ElementTree module which is available by default in Python versions 2.5 and higher. For Python 2.4 it is available in the elementtree module which must be installed separately. Python versions 2.3 or lower are not supported. To use it you may need to add the system-dependent installation path to PYTHONPATH.
from cache import cache
Three methods are defined:
- cache.cacheLink(): Tells the cache service to link the given URLs from the cache to the specified job directory on the worker node
- cache.cacheCheck(): Queries the cache service for the existence of the given URLs in the cache
- cache.echo(): Call the echo service - useful for testing
For full API description see the doc in the code:
from cache import cache print cache.cacheLink.__doc__
Small example script to call the service (this would be run on the worker node at the start of the job to prepare the input files):
run.py: #!/usr/bin/env python import sys import os import time import pwd from cache import cache endpoint = 'os.environ['ARC_CACHE_SERVICE'] proxy = os.environ['X509_USER_PROXY'] username = pwd.getpwuid(os.getuid())[0] # job id from GRID_GLOBAL_JOBID or cwd if 'GRID_GLOBAL_JOBID' in os.environ: gridid = os.environ['GRID_GLOBAL_JOBID'] # Assuming GridFTP job submission jobid = gridid[gridid.rfind('/')+1:] else: cwd = os.getcwd() jobid = cwd[cwd.rfind('/')+1:] urls = {'srm://srm.ndgf.org/ops/jens1': 'file1', 'lfc://lfc1.ndgf.org/:guid=8471134f-494e-41cb-b81e-b341f6a18caf': 'file2'} stage = False try: cacheurls = cache.cacheLink(endpoint, proxy, username, jobid, urls, stage) except cache.CacheException, e : print('Error calling cacheLink: ' + str(e)) print(cacheurls) print(os.listdir('.'))
and a job description file to submit this script:
cache.xrsl: & ("executable" = "run.py") ("jobname" = "cache service test" ) ("runtimeenvironment" = "ENV/PROXY") ("runtimeenvironment" = "ENV/ARC-CACHE-SERVICE") ("inputfiles" = ("run.py" "") ) ("walltime" = "3600" ) ("cputime" = "3600" ) ("stderr" = "stderr") ("stdout" = "stdout") ("gmlog" = "gmlog") )
Note that the ENV/PROXY runtime environment is needed in order to have access to the proxy on the worker node.
If successful and the requested files are in cache, the output should list the links to those files.
Issues and Notes
- The HED service container which hosts the cache service does not accept legacy proxies. This type of proxy is created by default with grid/voms-proxy-init, but an RFC-compliant proxy can be generated using the -rfc option.
- The cache service links files to the session dir. If a scratch directory is used for executing the job, the cache files are moved there from the session directory. This requires that the scratch dir is accessible from the cache service host, so the cache service cannot be used in situations where the scratch directory can only be accessed by the underlying LRMS.