Cache Service

From NorduGrid

Jump to: navigation, search

This page describes technical details of the ARC Cache Service. The software is still in the development stage and may have bugs or missing functionality. Please report problems to http://bugzilla.nordugrid.org/.

Contents

Description and Purpose

The ARC caching system automatically saves to local disk job input files for use with future jobs. The cache is completely internal to the computing element and cannot be accessed or manipulated from the outside. The ARC Cache Service exposes various operations of the cache and can be especially useful in a pilot job model where input data for jobs is not known until the job is running on the worker node.

Installation

The cache service is designed to run alongside any standard production (0.8.x or 1.0.x) installation of the ARC computing element and comes in the package nordugrid-arc-cache-service, available in the usual NorduGrid or EMI repositories.

Configuring

It is assumed that there is an existing arc.conf configuration file for the Grid Manager or A-REX service. This file is read to get information on the caches. Once the cache service is installed, it can be started without any further configuration:

$ARC_LOCATION/etc/init.d/arc-cache-service start

Setting up the Runtime Environment

The Runtime Environment (RTE) advertises to clients that the cache service is available and sets up the environment for the job to use it, by setting an environment variable pointing to the service. The following template can be used to create an RTE in <your rte directory>/ENV/ARC-CACHE-SERVICE

#/bin/sh!

case "$1" in

0)
  export ARC_CACHE_SERVICE=https://hostname:60001/cacheservice
  ;;
1)
  return 0;;
2)
  return 0;;

*)
       return 1;;

esac

The only thing you need to change is to substitute your real host name for hostname.

Detailed configuration

Some values in the init script are hard-coded, but detailed configuration can be done via XML. Here is a template configuration file:

<?xml version="1.0"?>
<ArcConfig
 xmlns="http://www.nordugrid.org/schemas/ArcConfig/2007"
 xmlns:tcp="http://www.nordugrid.org/schemas/ArcMCCTCP/2007"
 xmlns:arex="http://www.nordugrid.org/schemas/a-rex/Config"
 xmlns:ip="http://www.nordugrid.org/schemas/ArcConfig/2009/arex/InfoProvider"
 xmlns:lrms="http://www.nordugrid.org/schemas/ArcConfig/2009/arex/LRMS"
 xmlns:echo="urn:echo_config"
 xmlns:cacheservice="urn:cacheservice_config"
 xmlns:loader="http://www.nordugrid.org/schemas/loader/2009/08"
 xmlns:authz="http://www.nordugrid.org/schemas/arcauthz/2009/08"
 xmlns:spdp="http://www.nordugrid.org/schemas/simplelistpdp/2009/08"
 xmlns:imap="http://www.nordugrid.org/schemas/identitymap/2009/10"
>
 <Server>
   <PidFile>/var/run/arched-cache.pid</PidFile>
   <Logger>
     <Level>DEBUG</Level>
     <File>/var/log/arched.log</File>
     <Backups>10</Backups>
     <Maxsize>100000000</Maxsize>
   </Logger>
 </Server>
 <ModuleManager>
   <Path>/usr/lib/arc/</Path>
 </ModuleManager>
 <Plugins><Name>mcctcp</Name></Plugins>
 <Plugins><Name>mcctls</Name></Plugins>
 <Plugins><Name>mcchttp</Name></Plugins>
 <Plugins><Name>mccsoap</Name></Plugins>
 <Plugins><Name>arcshc</Name></Plugins>
 <Plugins><Name>identitymap</Name></Plugins>
 <Plugins><Name>cacheservice</Name></Plugins>
 <Chain>
   <Component name="tcp.service" id="tcp">
     <next id="tls"/>
     <tcp:Listen><tcp:Port>60001</tcp:Port></tcp:Listen>
   </Component>
   <Component name="tls.service" id="tls">
     <next id="http"/>
     <KeyPath>/etc/grid-security/hostkey.pem</KeyPath>
     <CertificatePath>/etc/grid-security/hostcert.pem</CertificatePath>
     <CACertificatesDir>/etc/grid-security/certificates</CACertificatesDir>
   </Component>
   <Component name="http.service" id="http">
     <next id="soap">POST</next>
     <next id="plexer">GET</next>
     <next id="plexer">PUT</next>
   </Component>
   <Component name="soap.service" id="soap">
     <next id="plexer"/>
   </Component>
   <Plexer name="plexer.service" id="plexer">
     <next id="cacheservice">^/cacheservice</next>
     <next id="echo">^/echo$</next>
   </Plexer>
   <Service name="echo" id="echo">
     <echo:prefix>[ </echo:prefix>
     <echo:suffix> ]</echo:suffix>
     <echo:serviceid>echo_service_id</echo:serviceid>
     <echo:endpoint>127.0.0.1/echo</echo:endpoint>
     <echo:expiration>P15M</echo:expiration>
   </Service>
   <Service name="cacheservice" id="cacheservice">
     <SecHandler name="identity.map" id="map" event="incoming">
       <PDP name="allow.pdp">
         <LocalList>/etc/grid-security/grid-mapfile</LocalList>
       </PDP>
     </SecHandler>
     <SecHandler name="arc.authz" id="map" event="incoming">
       <PDP name="simplelist.pdp" location="/etc/grid-security/grid-mapfile"/>
     </SecHandler>
     <cacheservice:cache>
       <cacheservice:config>/etc/arc.conf</cacheservice:config>
       <cacheservice:maxload>5</cacheservice:maxload>
     </cacheservice:cache>
     <cacheservice:serviceid>cache_service_id</cacheservice:serviceid>
     <cacheservice:endpoint>127.0.0.1/cacheservice</cacheservice:endpoint>
     <cacheservice:expiration>P15M</cacheservice:expiration>
   </Service>
 </Chain>
</ArcConfig>

It requires very little modification, so go through it and check the following items, modifying where necessary

  • Level: if DEBUG is too verbose then set this to VERBOSE or INFO.
  • ModuleManager:Path: This is where the ARC libraries are installed. If building from source the default is /usr/local/lib/arc. On 64-bit systems lib may be replaced by lib64.
  • tcp:Port: IMPORTANT If running A-REX, this must be different from the port A-REX uses (it uses 60000 by default).
  • KeyPath, CertificatePath, CACertificatesDir: Location of host key, host certificate and CA certificates
  • LocalList and PDP:location: The gridmap file location
  • cacheservice:config: The location of the ARC configuration file
  • cacheservice:maxload: Maximum number of simultaneous downloads done by the cache service

To check that everything is ok, start arched interactively, giving the configuration file as command line argument:

arched -c cache_service.xml -f

Some logging messages should be printed to screen, including a summary of the configuration in arc.conf. If the program exits then there is a problem so check the error messages.

An echo service is included in the template XML configuration file which can help with debugging. To use it use the arcecho command which can be found in the nordugrid-arc-client package if it is not already installed

arcecho https://localhost:60001/echo hello

This should print "[ hello ]". If not then check the messages from arched or use the -d option to arcecho to give debugging information.

If everything is ok then use Ctrl-C to stop arched and restart it without the -f option to make it a background daemon. Messages will be logged to /var/log/arched.log or whatever was specified in the configuration file.

Client Usage

A Python client for the cache service is available in the nordugrid-arc-python package and in the source tree. It requires the ElementTree module which is available by default in Python versions 2.5 and higher. For Python 2.4 it is available in the elementtree module which must be installed separately. Python versions 2.3 or lower are not supported. To use it you may need to add the system-dependent installation path to PYTHONPATH.

from cache import cache

Three methods are defined:

  • cache.cacheLink(): Tells the cache service to link the given URLs from the cache to the specified job directory on the worker node
  • cache.cacheCheck(): Queries the cache service for the existence of the given URLs in the cache
  • cache.echo(): Call the echo service - useful for testing

For full API description see the doc in the code:

from cache import cache print cache.cacheLink.__doc__

Small example script to call the service:

run.py:

#!/usr/bin/env python

import sys
import os
import time
import pwd

from cache import cache

endpoint = 'https://ce03.titan.uio.no:60001/cacheservice'
proxy = os.environ['X509_USER_PROXY']
username = pwd.getpwuid(os.getuid())[0]

# get job id from cwd
cwd = os.getcwd()
jobid = cwd[cwd.rfind('/')+1:]

urls = {'srm://srm.ndgf.org/ops/jens1': 'file1',
       'lfc://lfc1.ndgf.org/:guid=8471134f-494e-41cb-b81e-b341f6a18caf': 'file2'}

stage = False

try:
   cacheurls = cache.cacheLink(endpoint, proxy, username, jobid, urls, stage)
except cache.CacheException, e :
   print('Error calling cacheLink: ' + str(e))

print(cacheurls)
print(os.listdir('.'))

and a job description file to submit this script:

cache.xrsl:

&
("executable" = "run.py")
("jobname" = "cache service test" )
("runtimeenvironment" = "ENV/PROXY")
("runtimeenvironment" = "ENV/ARC-CACHE-SERVICE")
("inputfiles" =
   ("run.py" "")
)
("walltime" = "3600" )
("cputime" = "3600" )
("stderr" = "stderr")
("stdout" = "stdout")
("gmlog" = "gmlog")
)

Note that the ENV/PROXY runtime environment is needed in order to have access to the proxy on the worker node. There are also better ways for a job to get its own ID than looking at its current directory.

If successful and the requested files are in cache, the output should list the links to those files.

Issues and Notes

  • The HED service container which hosts the cache service does not accept legacy proxies. This type of proxy is created by default with grid/voms-proxy-init, but an RFC-compliant proxy can be generated using the -rfc option.
  • ARC v0.8.x (Grid Manager) and v1.0.x (A-REX) differ in the way they treat cache files. A-REX always adds a port number to URLs, unless there is no standard port like in the case of SRM, but Grid Manager does not. Since the cache service uses the same data library as A-REX it will also always add port numbers to URLs, and so files cached by Grid Manager without port numbers may not be found. Therefore it is recommended to use A-REX.
  • The cache service links files to the session dir. If a scratch directory is used for executing the job, the cache files must be linked to there from the session directory. It has yet to be decided how to do this - possibly by exposing the session dir through an environment variable set in the LRMS script or RTE, or creating a link back to the session dir in the scratch dir.
Personal tools