Gm to arex migration

From NorduGrid
Jump to: navigation, search

Upgrade ARC CE to EMI or ARC 11.05 or 12.05

This page describes how to migrate your site from the Grid Manager (pre-EMI) job management framework to the A-REX (as distributed by EMI) job execution service.

Introduction

A-REX is the next generation ARC job management service (see technical documentation). It has been introduced in ARC as a preview since release 0.8, and replaces Grid Manager in EMI releases. The A-REX can use the same GridFTP-based job submission interface the Grid Manager and/or a new web service interface.

Why migrate?

The job handling code of A-REX is essentially a re-factored Grid Manager, including a restructured code-base and some optimisations in job handling file structure, and so from a system administrator's point of view there are not many visible changes. However, for developers the new code structure makes life much easier. Starting in late 2010, new features are only added to A-REX and only essential bug fixes are supported in Grid Manager. There are several new features and performance improvements which are not in the Grid Manager (see below). From a user's point of view there will also not (yet) be much difference in the execution efficiency of jobs, in other words jobs will not complete any faster on sites running A-REX. The most visible change will be in the improved logging of information concerning the job.

In summary, migrating to A-REX will lead to improved developer support, access to new features over time and easier debugging of problems for users.

Installation

A-REX, as well as some other ARC components, should normally be installed from EGI UMD repositories (follow EGI instructions on how to enable those). Early adopters and testers can use EMI repositories, starting with EMI-1 release (follow EMI instructions on how to enable those).

However, these repositories are only available for Scientific Linux 5 and 6 (12.05 only) x86_64 (Debian 6 x86_64 support is expected in 2012). Users of all other platforms must use NorduGrid repositories.

Version numbering of releases differs between EGI, EMI and ARC. Individual package version numbers and names however are consistent between all these releases. EMI- and UMD-certified ARC packages should have version number 1.0.0 and above.

If you are using the NorduGrid repositories then upgrading packages to the full ARC 11.05 or 12.05 bundle or EMI release 1.0 or 2.0 will automatically install A-REX and dependencies and replace the Grid Manager installation. Any running Grid Manager will be automatically stopped so it is safe to do this while there are running jobs. Please note that after upgrade there will be no Grid Manager anymore. Grid Manager is completely replaced by A-REX. Gridftpd server and jobplugin providing GridFTP interface for job submission are also upgraded and are not compatible with Grid Manager anymore. Only safe way to switch back to Grid Manager based instalation is to downgrade software.

To install from source, it is necessary to install all the A-REX prerequisities and then build and install A-REX itself (similar procedures apply to older ARC 0.8.x releases). Because in this way both A-REX and Grid Manager may be installed simultaneoudsly it is important that they are installed in a different locations. This allows to avoid conflicts between incompatible versions of Gridftpd and allows to switch back to Grid Manager.

(Almost) NO configuration changes

When replacing the Grid Manager with A-REX no additional configuration is necessary. A-REX understands the exact same configuration as the Grid Manager, it takes all the configuration parameters from the same arc.conf file used also by the Grid Manager.

New debug levels

The only configuration parameter you may need to change is the "debug" parameter which specifies a log level. The Grid Manager accepts values from 0 (no logging) to 3 (VERBOSE), but A-REX uses values 0 (FATAL) to 5 (DEBUG). To maintain the same log level in A-REX you can usually add 2 to the previous Grid Manager level. This parameter also controls the logging level of the downloader and uploader executables to the job.id.errors files in the control directory (in the Grid Manager this was not changeable).

New plugins location

If you install ARC in a non-default location and run the GridFTP service for job submission, make sure the "pluginpath" option in the [gridftpd] section of arc.conf points to the actual location of plugins. Note that in A-REX plugins are installed in an arc/ subdirectory of the library installation path (default /usr/local/lib). When unsure, simply remove this line - a correct path will be picked up automatically.

No NULL VOMS attribute support

In case you used "NULL" as a VOMS attribute in configuration blocks like

  [group/atlas-general]
  name="atlas-general"
  voms="atlas * NULL *"

you must replace "NULL" with "*":

  voms="atlas * * *"

This is due to an change in the underlying security libraries. It means that groups matching specific attributes must be placed above this group in the configuration file so that they are matched first.

grid-update-crls replaced by fetch-crl

If you deinstalled the old grid-manager and grid-update-crls script and corresponding cron jobs went missing, the A-REX will not put it back on place. Instead, fetch-crl utility from the standard repositories like EPEL should be installed.

  yum install fetch-crl

installs the utility and also automatically creates corresponding cron job and a configuration file that usually fit most installations.

New features

Naturally, if you wish to enable new features, new configuration options have to be added. For details see the Additional A-REX Features section.

Obsolete features and options

The following service blocks are obsolete:

  [rc]
  [se]
  [httpsd]

Full Compatibility

Any computing element utilising the pre-configured A-REX is fully compatible with any other ARC service and client. This is because by default the WS-interface of A-REX is turned off and A-REX is configured to use the GridFTP-based job submission interface.

Note on the Cache

The A-REX can use files cached by the Grid Manager and vice versa. However, A-REX always uses a port number in URLs even if not specified in the job description (except for SRM URLs where there is no standard port number), and so if a URL without a specified port number was cached by the Grid Manager, it will not be used by A-REX since the URL with the port number added will be mapped to a different cache filename. For example:

Service URL Cache filename
Grid Manager http://localhost/data/file1 /pathtocache/data/a8/a8d4ca5a55f6b65a1ddca9e9e1c102cf176459
A-REX http://localhost:80/data/file1 /pathtocache/data/82/5bdc761499a3985aab6c93d25283f8732de0b9

Therefore after migration many cache files may be downloaded again even though it appears they are in the cache. The old files will be automatically cleaned up after some time as they will no longer be used.

Turning on/off A-REX

The A-REX service offers an out-of-the-box full replacement for the Grid Manager component. Installing A-REX binaries from repositories will replace the Grid Manager installation and so reverting back to it requires downgrading packages and should only be done in extreme cases. Also, due to changes resulting from performance enhancing improvements in A-REX (restructuring of the control directory) it is not easy to switch back to the Grid Manager after A-REX has been started. In this case all jobs managed by A-REX must be drained before switching.

It is important that you run only one of them at the same time: either A-REX or the Grid Manager.

If A-REX is installed from source: first make sure that the Grid Manager is not running

/etc/init.d/grid-manager status

Stop the Grid Manager and the GridFTP interface:

/etc/init.d/grid-manager stop
/etc/init.d/gridftpd stop

Depending on the way you installed the software, the above paths may be relative to the ARC installation location.

For all types of installation: start A-REX and the GridFTP interface:

/etc/init.d/a-rex start
/etc/init.d/gridftpd start

To stop A-REX (e.g. if needed for maintenance)

/etc/init.d/a-rex stop

Additional A-REX features

The following features are only available in A-REX. For detailed description and configuration, see the A-REX technical documentation.

Web Service Interface

A-REX is a very powerful component, a next generation job execution service. For example, it provides a standard-compliant Web Service (WS) interface to handle job submission/management. The WS interface of A-REX is however disabled by default in ARC and EMI distributions as of 2011. If you are interested to experiment with A-REX advanced features, setting the option arex_mount_point in the [grid-manager] section of arc.conf enables the web service interface, e.g.

arex_mount_point="https://your.host:60000/arex"

Then you can submit jobs through this new WS interface with the arcsub command (available in the ARC client package) and manage jobs with other arc* commands. IMPORTANT: this web service interface does not accept legacy proxies created by default by voms-proxy-init. RFC proxies must be used, which can be created by specifying voms-proxy-init -rfc or using arcproxy.

The WS interface can run alongside the GridFTP interface. Enabling the WS interface as shown above does not disable the GridFTP interface - if desired 'gridftpd' service must be explicitly stopped using the commands described in the previous section.

Restructured Control Directory

In the Grid Manager a single directory was used to store all job information. This made certain operations such as finding all jobs in a particular state slow when there were a large number of jobs in the system. In A-REX there are sub-directories of the control directory which keep job status information on jobs in that state, which massively improves efficiency when there are large numbers of jobs.

Cache cleaning logging configuration

A-REX gives the option of changing the logging level of the cache-clean tool, through the parameter cacheloglevel in the [grid-manager] section of the configuration file. Similar to the "debug" parameter, this one takes values from 0 (FATAL) to 5 (DEBUG).

Data Validation

For input files downloaded from an indexing service (eg LFC or RLS), metadata (file size, checksum) from these files is compared to metadata reported by the service hosting the physical replica of the file (e.g. SRM or GridFTP). If the metadata differ then the replica is not downloaded. This only applies to data downloaded to cache, for which stricter checks are needed.

Separate limits for transfer shares

It is possible to specify a separate limit for one or several transfer shares. For each such share, share_limit option is used to specify the limit which will be used instead of the default in maxloadshare. See the configuration section of A-REX technical documentation for how to enable this feature.

Extra limits on processed jobs

A limit on the number of jobs processed per-DN by A-REX and the total number of jobs in the system (including FINISHED and DELETED jobs) can be specified by giving a extra parameters to the maxjobs option in arc.conf, for example

maxjobs="1000 100 500 10000"

See the configuration section of A-REX technical documentation for more information.

Checksum Validation

Checksums are calculated on the fly during file transfers and are validated against checksums reported by the source or destination.

New data staging framework

New Data Staging framework, offering more sophisticated and flexible data upload and download mechanisms, has a preview status as of 2011, and is turned off by default. It can be enabled by setting newdatastaging=yes in the [grid-manager] section of the configuration file.