This wiki is obsolete, see the NorduGrid web pages for up to date information.

How to plug an ARC site into EGI/NDGF

From NorduGrid
Jump to navigationJump to search

ARC compute element is the central service of ARC middleware. It implements all core job execution functionalities, such as authorisation, interaction with underlying batch systems, status monitoring and task management. However, it has a number of advantages not seen in analogous products, most notably, actual transfer and local caching of input and output data, which greatly improves efficiency of computing facilities and dramatically reduces failure rates. Other advantages are its portability, simplicity in management, modularity and extensibility. It is perfectly compatible with other services constituting a Grid infrastructure, such as authorisation, monitoring and storage services. Several NGIs provide services to WLCG using ARC CE. This page collects experience from NDGF-affiliated sites deploying ARC-CE based sites fully integrated into the EGI operational infrastructure.


What services make an ARC-CE?

ARC is composed of several services acting in different stages of job and data management. Here they are arranged by workflow order. (Client is here as well for completeness).

Client

  • Create proxy-certificate (also with VOMS extensions, if needed)
  • Submit job (xRSL or JSDL document; xRSL is more versatile)
    • Does ARC-CE discovery and matchmaking through infosystem
    • Submits job to best/first/random ARC-CE

Server

GridFTPd

  • receives job description document and proxy-certificate
  • Validates proxy-certificate
  • Accepts the job

Grid Manager / A-REX

  • Goes through accepted jobs, starts downloaders for requested files
  • Caches files and manages the cache
  • Executes plugins / RunTime Environment (RTE) stage 0
  • Transforms xRSL to LRMS-specific shell-scripts
  • Submits job to LRMS (PBS, SLURM, LoadLeveler, LSF, SGE, Condor)
  • Regularly polls LRMS to see if jobs has completed
  • Finishes jobs, starts uploaders for output-files
  • Writes information for URLogger that reports to SGAS

LRMS (PBS, SLURM, LoadLeveler, LSF, SGE, Condor)

  • Starts job
  • Executes RTE stage 1
  • Executes job payload (what the user wanted to do)
  • Executes RTE stage 2
  • Ends job

Information: ARIS + EGIIS, ARC and Glue schemas

  • ARC Resource Information System (ARIS) is the local information system and it contains information about cluster, queues, jobs
  • EGIIS is the short for of "Enhanced Grid Information Index Server". This is used for gathering information about several clusters and storage elements. It is used by, among others, ARC clients, for brokering.
  • Information about clusters, queues, jobs and users can be published in different formats, called schemas. For ARC this is the ARC schema, for gLite this is the Glue schema (version 1.2 or 1.3).

Important: both ARC and gLite use BDII. This is the platform built on top of the LDAP server to put information into the LDAP server.

Accounting

ARC usage records can be submitted to SGAS and then converted to APEL, or directly to APEL (the latter is not officialy supported yet).

ARC features a logger for generating usage records and registering the records to one or more SGAS instances. Besides registering all records, it is also possible to register per VO or per user to specific SGAS instances.

The reporting to APEL in NDGF is done with a script which extracts the necessary information from the NDGF SGAS database. The script is run automatically by a cron job every day.

A script exists that produces an APEL usage report from the ARC CE (Grid Manager) log; it is not yet officially supported, but can be made available on request from nordugrid-support@nordugrid.org

SAM-Nagios

Nagios scripts (probes) exist that allow monitoring of ARC-CEs. The scripts are available in the EGI repository.

There is information relating to the probes at the following:

Additional services

These services are not critical to integrating ARC in EGI, but they are important components in order to get a good performing ARC-CE

EGIIS

The Enhanced Grid Information Index Service is used to aggregate information about multiple ARC-CEs

ACIX

The ARC Cache Index (ACIX) provides an index over cached files on ARC sites. This makes it possible to do cache-aware brokering.

Authorisation

The following authorisation tools and services are supported:

  • VOMS
  • LCAS/LCMAPS
  • MyProxy
  • ARGUS - prototype status

How does ARC interface with EGI?

Services within EGI

  • APEL

ARC reports either via SGAS or directly from ARC-CEs to APEL

  • GOCDB

Your NGI manager adds your resource to GOCDB. This requires, among other things, an LDAP URL to a resource publishing information in the Glue format about your ARC-CE. This can either be your ARC-CE configured to publish in the Glue format, or from a dedicated service that reads information from a GIIS and generates Glue output for multiple ARC-CEs.

  • BDII, Glue

ARC-CEs and GIISes can have their information published in the Glue format which can then be imported into top-level BDIIs

  • SAM / Nagios

We have Nagios plugins for service availability monitoring

  • Storage

ARC works well with dCache and other common storage solutions.


How to Install ARC-CE

Repositories

For ARC v0.8.x, see installation instructions here. For ARC 11.05 (included in EMI-1) instructions are similar and will be provided soon.

  • Add gpg-key to package-manager
  • add repositories to package-manager
    • When installing for non-SL5, use NorduGrid repositories (files /etc/apt/sources.list.d/nordugrid.list or /etc/yum.repos.d/nordugrid.repo)

Packages, meta-package

  • Install nordugrid-arc-compute-element, this should install:
    • nordugrid-arc-gridftpd
    • nordugrid-arc-arex
    • nordugrid-arc-aris
    • nordugrid-arc-gridmap-utils
    • nordugrid-arc-janitor

and their dependencies. The latter (Janitor) is not critical for ARC-CE functioning at the moment and can be safely omitted.

How to Configure ARC-CE

Detailed instructions for ARC v0.8.x can be found in NORDUGRID-TECH-2 (general description) and NORDUGRID-TECH-6 (for authorisation). Same configuration instructions are mostly valid for ARC 11.05 (EMI-1), with only minor changes.

Basic Configuration, Stand-alone mode

  • Getting certificates

ARC uses the same IGTF certificates as gLite, you go about it the same way.

  • gridftp, grid-manager, infosys

are configured through /etc/arc.conf, here is an example showing the main interesting options, there are many more that you can use to tailor your setup. ARC is Very versatile.

Sample arc.conf:

 [common]
 hostname="pbshead.domain.se"
 lrms=pbs
 pbs_log_path="/var/spool/torque/server_logs"
 globus_tcp_port_range="9000,9300"
 globus_udp_port_range="9000,9300"
 x509_user_key="/etc/grid-security/hostkey.pem"
 x509_user_cert="/etc/grid-security/hostcert.pem"
 x509_cert_dir="/etc/grid-security/certificates"
 # file mapping grid user to local user
 gridmap="/etc/grid-security/grid-mapfile"
 [grid-manager]
 controldir="/var/spool/nordugrid/jobstatus"
 sessiondir="/grid/sessiondir"
 # The directory which holds the runtimeenvironment scripts
 runtimedir="/software/runtime"
 # Store cached data for the jobs here
 cachedir="/grid/cache"
 # if computing node can access sesion directory at frontend, defaults to 'yes' 
 shared_filesystem="yes"
 [gridftpd]
 user="root"
 port="2811"
 pluginpath="/usr/lib/arc"
 [gridftpd/jobs]
 path="/jobs"
 plugin="jobplugin.so"
 allownew="yes"
 [infosys]
 # Bind slapd-process to all network cards
 slapd_hostnamebind="*"
 port="2135"
  • Publish information into a giis
 # Register to a GIIS
 [infosys/cluster/registration/toArctest]
 targethostname="giis.domain.se"
 targetport="2135"
 targetsuffix="mds-vo-name=arctest,o=grid"
 regperiod="30"

Integrate ARC-CE with SGAS Accounting

Configuration details can be found in NORDUGRID-MANUAL-16


Configuration in short:

arc.conf:

 [grid-manager]
 authplugin="FINISHED timeout=10,onfailure=pass /opt/nordugrid/libexec/arc-ur-logger %C %I %S"
 [logger]
 log_all="https://orval.grid.aau.dk:6143/sgas"


/opt/nordugrid/libexec/arc-ur-registrant should be installed into CRON and invoked regularly (typically every hour).

Integrate ARC-CE with EGI

BDII (use Glue converter)

You can either:

  1. Register with ARC schema to an EGIIS that is being used by a customized Site-BDII to generate Glue output (e.g., done by NDGF)
  2. Configure ARC CE to publish Glue 1.2 schema, and register ARC CE to a regular gLite site BDII

To configure ARC CE with a gLite site BDII (option 2), add this to arc.conf:

[infosys]
...
infosys_glue12=enable
...
[infosys/glue12]
resource_location="Atlantis"
resource_latitude="33.045508"
resource_longitude"-64.632568"

cpu_scaling_reference_si00='2600'
processor_other_description='Cores=6, Benchmark=10.27-HEP-SPEC06'
glue_site_web="http://www.atlantis.info/"
glue_site_unique_id="AA_TOP_T2"
provide_glue_site_info='false'

The important parts of the configuration parameters here are

  1. processor_other_description: this string needs to have exactly this format to work properly with gstat
  2. provide_glue_site_info='false': this should be set to "false" if you have a site-bdii as a separate service

In addition you should report your supported VOs with the authorizedvo configuration parameter. With the newest versions of ARC, multiple authorized VOs should be handled correctly.

Then start a port-forwarder (done automatically in versions of ARC 0.8.3 and above):

ncat --sh-exec ncat localhost 2135 -l 2170

Then, include it in site BDII (/opt/glite/etc/gip/site-urls.conf):

ARCCE  ldap://yoursite.yourdomain.se:2170/mds-vo-name=glue12,o=grid

Known issue with older ARC releases

Due to a bug in ARC 0.8.2, you need to do the following fixes as well:

You have to modify glue-generator.pl (here sitename is your site name). In the newer versions (0.8.3 and above) this is configurable in arc.conf

> #my $usemode = "nordugrid";
209c210
<       $GlueSiteUniqueID="NDGF-T1"
---
>       $GlueSiteUniqueID="sitename"
333c334,335
<         if ($waitingJobs == 0){ $waitingJobs = $DEFAULT; }
---
>       # It's OK if there's 0 waiting jobs.
>         #if ($waitingJobs == 0){ $waitingJobs = $DEFAULT; }

Registration (GOCDB)

Your NGI-manager should be able to help you enter information into the GOCDB. Typically, start with entering site BDII, and proceed to adding ARC CE.

Accounting (APEL)

See previous chapter.

Monitoring (SAM / Nagios)

Monitoring tests are submitted by your ROC/NGI. You only have to configure authorisation block to allow such tests, for example, to enable WLCG OPS VO access, for various roles, add this in arc.conf:

[vo]
id="vo_ops-swadmin"
vo="ops-swadmin"
source="vomss://voms.cern.ch:8443/voms/ops?/ops/Role=lcgadmin"
mapped_unixid="sgmops01"

[vo]
id="vo_ops-user"
vo="ops-user"
source="vomss://voms.cern.ch:8443/voms/ops?/ops"
mapped_unixid="ops001"

[group]
name="ops-lcgadmin"
voms="ops * lcgadmin *"

[group]
name="ops-users"
voms="ops * NULL *"

[gridftpd]
unixgroup="ops-lcgadmin simplepool /var/spool/nordugrid/ops-lcgadmin"
unixgroup="ops-users simplepool /var/spool/nordugrid/ops-users"


After you have enabled the Nagios SAM-tests, you can view your resource in SAM, like NDGF here:

https://lcg-sam.cern.ch:8443/sam/sam.py?sensors=ArcCE&regions=NGI_NDGF&vo=ops&order=SiteName&funct=ShowSensorTests

It is also possible to integrate your site into the operations portal together with other EGI resources:

https://operations-portal.in2p3.fr/

Storage

ARC does not require you to set up a local storage element, but it works well together with a number of storage solutions. No configuration is needed on ARC CE side, except of making sure the relevant ports are open in the firewall.

Supported storage-related services and protocols:

  • dCache (most common)
  • other SRM based storages (DPM, StoRM)
  • gsiftp
  • HTTP*
  • FTP
  • local files
  • LFC

Additional services

EGIIS

You usually have one EGIIS per NGI or organisation like NDGF. This service is used to aggregate information about multiple ARC-CEs. You install it with the nordugrid-arc-aris and nordugrid-arc-egiis packages.

You then configure it in arc.conf like this:

 [infosys/index/Arctest]
 name=Arctest
 allowreg="pbshead.domain.se:2135"

After this, you can check that your system shows up with LDAP like this:

 ldapsearch -H ldap://infoindex.domain.se:2135 -x -b 'mds-vo-name=arctest,o=grid" -s base giisregistrationstatus

ACIX

The ARC Cache index (ACIX) provides an index over cached files on ARC sites. This makes it possible to do cache-aware brokering.

VOMS

This page describes the different VO configurations strings relevant for setting up a NDGF associated site.

This page describes the mapping VOMS users based on their role.

Demonstrations