Cache Service

From NorduGrid

This page is obsolete. For up to date documentation see http://www.nordugrid.org/arc/arc6/tech/data/candypond.html

This page describes technical details of the ARC Cache Service. The software is still in the development stage and may have bugs or missing functionality. Please report problems to http://bugzilla.nordugrid.org/.

Description and Purpose

The ARC caching system automatically saves to local disk job input files for use with future jobs. The cache is completely internal to the computing element and cannot be accessed or manipulated from the outside. The ARC Cache Service exposes various operations of the cache and can be especially useful in a pilot job model where input data for jobs is not known until the job is running on the worker node.

Installation

The cache service is designed to run alongside any standard production (>= 0.8.x) installation of the ARC computing element and comes in the package nordugrid-arc-cache-service, available in the usual NorduGrid or EMI repositories.

Configuring and Running

ARC 1.x and 2.x (releases 11.05 and 12.05)

The cache service runs as a separate process in a separate HED container, listening on port 60001 on path /cacheservice. It is assumed that there is an existing arc.conf configuration file for A-REX, which is read to get information on the caches. There is no other configuration needed for the cache service. It can be started with

$(ARC_LOCATION)/etc/init.d/arc-cache-service start

Messages are logged to /var/log/arc/cache-service.log.

ARC >= 3.x (releases 13.02 and above)

The cache service runs inside the same HED container as A-REX, and so is accessible at the same hostname and port as the A-REX web-service interface, at path "/cacheservice". The following option in the [grid-manager] section of arc.conf enables it

enable_cache_service=yes

The A-REX web-service interface must also be enabled through the arex_mount_point option. No other configuration is needed. The cache service is automatically started when A-REX is started and so does not need to be started separately. Messages are logged to the A-REX log. The same instance of the DTR data staging framework is used by both A-REX and the cache service so DTR must be enabled for the cache service to run (it is enabled by default).

Setting up the Runtime Environment

The Runtime Environment (RTE) advertises to clients that the cache service is available and sets up the environment for the job to use it, by setting an environment variable pointing to the service. The following template can be used to create an RTE in <your rte directory>/ENV/ARC-CACHE-SERVICE

#/bin/sh!

case "$1" in

0)
  export ARC_CACHE_SERVICE=https://hostname:443/cacheservice
  ;;
1)
  return 0;;
2)
  return 0;;

*)
  return 1;;

esac

The only thing you need to change is to substitute your real host name for hostname. The proxy certification is also required on the worker node and so another runtime environment (eg ENV/PROXY) is needed for that, for example

#!/bin/bash

x509_cert_dir="/etc/grid-security/certificates"

case $1 in
  0) mkdir -pv $joboption_directory/arc/certificates/
     cp -rv $x509_cert_dir/ $joboption_directory/arc
     cat ${joboption_controldir}/job.${joboption_gridid}.proxy >$joboption_directory/user.proxy
     ;;
  1) export X509_USER_PROXY=$RUNTIME_JOB_DIR/user.proxy
     export X509_USER_CERT=$RUNTIME_JOB_DIR/user.proxy
     export X509_CERT_DIR=$RUNTIME_JOB_DIR/arc/certificates
     ;;
  2) :
     ;;
esac

Client Usage

A Python client for the cache service is available in the nordugrid-arc-python package and in the source tree. It requires the ElementTree module which is available by default in Python versions 2.5 and higher. For Python 2.4 it is available in the elementtree module which must be installed separately. Python versions 2.3 or lower are not supported. To use it you may need to add the system-dependent installation path to PYTHONPATH.

from cache import cache

Three methods are defined:

  • cache.cacheLink(): Tells the cache service to link the given URLs from the cache to the specified job directory on the worker node
  • cache.cacheCheck(): Queries the cache service for the existence of the given URLs in the cache
  • cache.echo(): Call the echo service - useful for testing

For full API description see the doc in the code:

from cache import cache
print cache.cacheLink.__doc__

Small example script to call the service (this would be run on the worker node at the start of the job to prepare the input files):

run.py:

#!/usr/bin/env python

import sys
import os
import time
import pwd

from cache import cache

endpoint = 'os.environ['ARC_CACHE_SERVICE']
proxy = os.environ['X509_USER_PROXY']
username = pwd.getpwuid(os.getuid())[0]

# job id from GRID_GLOBAL_JOBID or cwd
if 'GRID_GLOBAL_JOBID' in os.environ:
   gridid = os.environ['GRID_GLOBAL_JOBID']
   # Assuming GridFTP job submission
   jobid = gridid[gridid.rfind('/')+1:]
else:
   cwd = os.getcwd()
   jobid = cwd[cwd.rfind('/')+1:]

urls = {'srm://srm.ndgf.org/ops/jens1': 'file1',
       'lfc://lfc1.ndgf.org/:guid=8471134f-494e-41cb-b81e-b341f6a18caf': 'file2'}

stage = False

try:
   cacheurls = cache.cacheLink(endpoint, proxy, username, jobid, urls, stage)
except cache.CacheException, e :
   print('Error calling cacheLink: ' + str(e))

print(cacheurls)
print(os.listdir('.'))

and a job description file to submit this script:

cache.xrsl:

&
("executable" = "run.py")
("jobname" = "cache service test" )
("runtimeenvironment" = "ENV/PROXY")
("runtimeenvironment" = "ENV/ARC-CACHE-SERVICE")
("inputfiles" =
   ("run.py" "")
)
("walltime" = "3600" )
("cputime" = "3600" )
("stderr" = "stderr")
("stdout" = "stdout")
("gmlog" = "gmlog")
)

Note that the ENV/PROXY runtime environment is needed in order to have access to the proxy on the worker node.

If successful and the requested files are in cache, the output should list the links to those files.

Issues and Notes

  • The HED service container which hosts the cache service does not accept legacy proxies. This type of proxy is created by default with grid/voms-proxy-init, but an RFC-compliant proxy can be generated using the -rfc option.
  • The cache service links files to the session dir. If a scratch directory is used for executing the job, the cache files are moved there from the session directory. This requires that the scratch dir is accessible from the cache service host, so the cache service cannot be used in situations where the scratch directory can only be accessed by the underlying LRMS.