Advanced Resource Connector
The Advanced Resource Connector (ARC) middleware is an open source software solution distributed under the Apache 2.0 license, enabling production-quality computational and data Grids. Since the first release (May 2002) the middleware has been deployed and being used in production environments. This article briefly introduces the middleware, identifying its strengths as well as known limitations and shortcomings. A detailed technical description of the middleware is given in the comprehensive ARC overview paper.
Services of ARC v0.8
ARC provides a reliable, lightweight, production quality and scalable implementation of fundamental Grid services such as Grid job submission and management, resource characterization, resource aggregation and discovery, basic data management.
A Grid-enabled computing resource runs a non-intrusive Grid layer composed of three main components: a specialized GridFTP server, a Grid Manager daemon and the Local Information Service. Grid jobs are submitted to and managed on the cluster through a custom GridFTP interface called the jobplugin, thus the GridFTP server is used as a communication channel and serves as a gateway to the computing resource. Grid jobs are associated with their own separate session directories which are made available through the same GridFTP server. The Grid Manager daemon runs on a computing resource and acts as a very powerful backend of the GridFTP job submission interface. Among many things, it takes care of local job management (job states), the session directories, grid identity mappings, it handles input-output data staging (including resolution of Data Indexing URLs), manages the input data cache area, provides Runtime Environment support, interfaces to local batch system, collects local information for the logger component, etc.
A Grid-enabled storage resource makes use of either of the two types of Grid storage layers provided by ARC: a conventional solution, based upon the GridFTP server, or the Web service-based Smart Storage Element implemented in the recently developed ARC HTTPSd framework. The Smart Storage Element, besides being a data storage service that performs a basic set of the most important data management functions, also provides automatic and consistent registration to Data Indexing services. The SSE comes with an SRM interface. Storage resources also run Local Information Services.
The ARC middleware implements a scalable, production quality, dynamic, LDAP-based distributed information system. It consists of three main components: the Local Information Services, Index Services and the Registration Processes. The Local Information Services are responsible for resource (computing or storage) description and characterization. We note that the description of Grid jobs running on a resource is also a part of the resource characterization: job status monitoring within ARC is done via querying the Local Information Service components. Computing and storage resources are connected to the Grid via the Registration Process which links a resource to an Information Index Service, thus implementing resource aggregation. All of these services are based on various OpenLDAP extensions such as backends, schemas and processes developed by the NorduGrid and Globus Teams.
Grid-enabled resources can utilize the ARC logging framework consisting of a Logging service (Logger) and logging clients. The Logging Service (Logger) is a MySQL database with a Web service interface implemented in the HTTPSd infrastructure. The Logger database collects Usage Records (UR) of Grid jobs. The URs are produced on the Grid-enabled computing resources and pushed to the Logger database in a SOAP message. Access to the Logger database is controlled via flexible common access rules which provide administrative and logging access levels.
ARC provides clients making intelligent use of the distributed information and data available on the Grid. The middleware comes with a light-weight command line client, the User Interface (UI). UI is a set of tools to submit, monitor and manage jobs on the Grid, move data around, discover and query resource information. The UI comes with a built-in broker, which is able to select the best matching resource for a job. The ARC brokers integrated into the UI act completely independently of each others. The User Interface also provides client functionality for the other grid services (e.g. Storage Elements, Logger Service) and yet it is lightweight and installable by any user in an arbitrary location in a matter of a few minutes. The User Interface can be deployed and used in as many instances as users need. Another special client is the Grid Monitor, which uses any Web browser as an agent to periodically query the distributed information system and present the results as a set of inter-linked Web pages.
The security layer of the ARC middleware is built upon the standard Grid Security Infrastructure (GSI). The ARC services that make use of authentication and authorization are those built on top of the GridFTP server and the HTTPSd framework, the information system services assumes anonymous access without any access control.
In a basic and largely obsolete authorization model, ARC services support a simple X.509 certificate subject (Distinguished Name) mapping to UNIX users through the grid-mapfile mechanism. For advanced authorization, most of the ARC services are capable of acting purely on the Grid identity of the connecting client without relying on local identities. In addition to the personal Grid identity of a user, other identification mechanisms are supported. Among them are the Virtual Organization Membership Service (VOMS) and general VO membership (through external information gathering tools). In addition to access-level authorization, many services make use of the client Grid identities internally to provide fine-grained access control. Currently, all such services use Grid Access Control Lists (GACL) language for that purpose. Furthermore, the job handling daemon, the Grid Manager, also has a pluggable authorization mechanism that can provide fine-grained authorization and accounting during job execution through third-party plugins.
In addition to these traditional services, ARC v0.8 includes the next generation job execution service - A-REX. A-REX is a successor of Grid Manager,and is one of the key components of the new generation of ARC, first presented through the NOX release.
ARC v0.8 service interfaces
The current ARC middleware offers public interfaces through standard channels: the GridFTP and LDAP protocols. However, these standard communication channels have been customized in an ARC-specific manner to satisfy the needs of the ARC Grid service interfaces.
The GridFTP v1 protocol-based ARC Storage Elements, apart from the know issue of their firewall-unfriendliness; exhibit a de-facto standard interface and are interoperable with all the other GridFTP-based storages. Besides serving Storage Elements, the GridFTP protocol is also used as the grid job submission and management interface of computing resources through an ARC-specific customized manner together with custom ARC XRSL attributes.
The information system components rely on the LDAP protocol. Grid objects such as computing resources, storages, users and grid jobs are represented by LDAP entries presented according to the Nordugrid LDAP Information Schema and served through LDAP channels.
Besides the GridFTP and LDAP channels, some recently developed components such as the Smart Storage Element and the Logger service already utilize SOAP-based Web Services as their interfaces.
Job submission clients support both the NorduGrid developed XRSL and the OGF standard recommendation JSDL job description languages.
The current ARC has been successfully serving scientific and academic communities. Some of the key benefits of the current middleware include:
- Ease of installation and configuration. A site can be set up and integrated into a grid infrastructure by a few simple steps.
- Multi operating system and multi LRMS support, non-intrusive deployment. This is crucial, since a large fraction of the available computing resources at academic institutions are shared facilities, typically administered by a computing department, not willing to change the OS or LRMS they already have users relying on.
- Stability and high performance of the services. Delivering high performance and stability has always been one of the main development requirements which distinguished ARC from the other middlewares.
- The non-central and fault tolerant nature of the architecture. Central services leading to single-point-of-failure are avoided, e.g. Information Index services are multi-rooted, brokering is distributed (through the independent standalone clients).
- Optimized management of data processed and produced by computational jobs, including a built-in cache for frequently used files.
- Easy-to-use powerful clients.
- "Advanced Resource Connector middleware for lightweight computational Grids". M.Ellert et al., Future Generation Computer Systems 23 (2007) 219-240