Restructuring arclib

From NorduGrid

Here comes all the information about the restructuring of arclib during the February of 2012, in order to be able to handle multiple interfaces per service, to understand GLUE2 information, etc.

Team

Martin, Zsombor, Florido (infosys server side), Aleksandr (jobmanagement server side), Ivan (EMIR)

Goals

The main reason of the restructuring is to better fit the client to the general case when a CEs and Index Services come with different interfaces. The current client(lib) is very limited and follows a "single interface" philosophy.

The development is planned to be a rapid one, everything to be done in one month.

Notes

Meetings

  • Lund f2f, 30-31 January. add here meeting notes, pictures!

Brainstorming

Find a local information endpoint from an EMIR. How?

  • we need to have some information in the EMIR. We identified these as most relevant:
The following will be added to the open enumeration. resource means this is a local infoprovider endpoint.
  • EndpointCapability: information.discovery.resource
  • EndpointURL: a complete URL to contact the service
  • EndpointInterfaceName: tells what kind of protocol the thing speaks. Examples: org.nordugrid.ldapglue2, org.nordugrid.wsrfglue2

The updated list of valid types, names, capabilities including indexes and registries is provided in https://wiki.nordugrid.org/index.php/ARC1/Infosys/2011Review#Naming_Conventions

A request for adding ServiceID and EndpointID in EMIR records has been forwarded by Balazs to Shiraz. Using these IDs we can check if Endpoints belong to the same Service, or if Endpoints are the same.

Useful endpoint names and servers for non-ARC services

  • Top-BDII:
GLUE2EndpointInterfaceName: bdii_top
GLUE2EndpointCapability: information.model , information.discovery , information.monitoring
Some top-bdii urls to test against:
  • ldap://lcg-bdii.cern.ch:2170/o=glue2
  • ldap://bdii206.cern.ch:2170/o=glue2
  • GOCDB:

??

Find an endpoint that can list jobs How?

There are two ways to achieve this, either looking at the interfacename or getting capabilities.

At the moment there is no capability for that. Balazs suggests to use only the interfacename for this.

hence here comes the list of attributes an endpoint must have to be capable of listing jobs:

  • InterfaceName: org.nordugrid.ldapng
  • InterfaceName: org.nordugrid.ldapglue2 removed ComputingActivities from LDAP GLUE2
  • InterfaceName: org.ogf.emies removed after EMI-ES 1.15 spec, it is now org.ogf.glue.emies.activityinfo

Martin, Zsombor: please check if simple bes or org.nordugrid.xbes is capable of listing jobs using some protocol function. if so, the endpoint must be checked for the following attributes:

If bes doesn't have a joblisting feature, or we decided to drop that (don't remember) you can speedup by only looking at the InterfaceExtension attribute.

Associations: what exactly is there?

The following associations:

ComputingEndpoint<--->ComputingShare

ExecutionEnvironment<--->ComputingManager

ExecutionEnvironment<--->ComputingShare

Are represented in different ways in LDIF and XML. The following is a reminder to know how.

ComputingEndpoint<--->ComputingShare

Please keep in mind that GLUE2 Shares could also be NOT published.

Note: Associations between Endpoints and Shares are implemented in svn 24155. Will be in EMI2-2.0.0rc4

A ComputingEndpoint can serve multiple Shares, and a Share can be accessible my multiple Endpoints. However:

LDAP) The LDAP GLUE2 schema does NOT have any association between the ComputingEndpoint object and the ComputingShare object.
A client willing to discover such association has to retrieve each ComputingShare object and scan its ComputingShareComputingEndpointForeignKey attribute.
XML) The XML rendering DOES have associations in the <ComputingEndpoint> structure. Shares will be in the <Associations> tag:
Example:
...
  <ComputingEndpoint>
  ...
    <Associations>
       <ComputingShareID>urn:ogf:ComputingShare:piff.hep.lu.se:fork</ComputingShareID>
       ...
       <ComputingShareID></ComputingShareID>
    </Associations>
    ...
  </ComputingEndpoint>
...


ExecutionEnvironment<--->ComputingManager

An LRMS (ComputingManager) can manage one or more nodes.

  • If the nodes are Homogeneous, it will only have one ExecutionEnvironment representing such nodes.
  • If the nodes are INHomogeneous, it will have more than one ExecutionEnvironment.
LDAP) The LDAP GLUE2 schema does NOT have any association between the ComputingManager and the ExecutionEnvironment objects.
A client willing to discover such association has to retrieve each ExecutionEnvironment object and scan its ExecutionEnvironmentComputingManagerForeignKey attribute.
The ExecutionEnviroments belonging to a ComputingManager are published in the GLUE2GroupID=ExecutionEnvironments LDAP object under the ComputingManager object.
Example:
 -- GLUE2ManagerID= urn:ogf:ComputingManager:<FQDN>:<managerName>
    |  ...
    |-- GLUE2GroupID= ExecutionEnvironments
         |-- GLUE2ResourceID= urn:ogf:ExecutionEnvironment:<FQDN>:execenv<sequential number>
         |...
         |-- GLUE2ResourceID= urn:ogf:ExecutionEnvironment:<FQDN>:execenv<sequential number>...


XML) the XML rendering does NOT have any association between ComputingManager and ExecutionEnvironment objects. Instead, the list of ExecutionEnvironments belonging to a certain ComputingManager is contained in the <ExecutionEnvironments> nested in the <ComputingManager>
Example:
  <ComputingManager>
  ...
  <ExecutionEnvironments>
    <ExecutionEnvironment>
    ...
    </ExecutionEnvironment>
  </ExecutionEnvironments>
</ComputingManager>


ExecutionEnvironment<--->ComputingShare

  • Shares (queues) are not usually aware of the nodes. However, certain LRMSes allow to assign a queue to certain nodes:
  • In case of Homogeneous nodes, each Share will have the same ExecutionEnvironment.
  • In case of INHomogeneous nodes, each Share MIGHT have different associated ExecutionEnvironments.
LDAP) The LDAP GLUE2 schema does NOT have an association between ExecutionEnvironment and their ComputingShares in the ExecutionEnvironment object.
A client willing to discover such associations has to retrieve each ComputingShare object and scan its ComputingShareExecutionEnvironmentForeignKey attribute.
XML) the XML rendering DOES have an association between ExecutionEnvironment and ComputingShare objects. A list of associated ComputingShares is contained in the XML <Associations> tag:
Example:
...
  <ExecutionEnvironment>
  ...
    <Associations>
       <ComputingShareID>urn:ogf:ComputingShare:piff.hep.lu.se:fork</ComputingShareID>
       ...
       <ComputingShareID></ComputingShareID>
    </Associations>
    ...
  </ExecutionEnvironment>
...



GLUE2 Service/Endpoint discovery and NorduGrid Schema

The LDAP attribute nordugrid-cluster-comment has been added a padding string that tells what is the GLUE2ServiceID of the a-rex running; this has been done for ease if the client to guess if a service runs on the same machine.

The string is as follows:

If there is already a comment in the field, the comment string is suffixed this way:

nordugrid-cluster-comment: Some comment here ; GLUE2ServiceID=urn:ogf:ComputingService:$hostname:arex (notice semi-colon)

If there is no comment there, the comment string looks like this:

nordugrid-cluster-comment: GLUE2ServiceID=urn:ogf:ComputingService:$hostname:arex (no semi-colon)

Progress

Useful resources

A test server has been used to test new glue2 rendering. Such rendering will be published by 2.0, for as soon as we have EMI packages it will be there.

For the time being you can use piff:

  • ldap server: ldap://piff.hep.lu.se:2135/
  • ldapng endpoint URL: ldap://piff.hep.lu.se:2135/Mds-vo-name=local,o=grid
  • ldapglue1 endpoint URL: ldap://piff.hep.lu.se:2135/Mds-vo-name=resource,o=grid
  • ldapglue2 endpoint URL: ldap://piff.hep.lu.se:2135/o=glue

I suggest you to have a visual approach, it really makes things easy. I usually use a visual ldap browser called Luma (http://luma.sourceforge.net/) you can find it in all standard distros. MacOsX might have other tools.

to have a look at the XML use arcwsrf (that's why I authorized you on the cluster)

NEW Latest CA package (to authenticate the host): https://arc-emi.grid.upjs.sk/instantCA/certs/instantCA_full-1340631880.75.tar.gz

I sketched some of the trees using a mindmap tool. You can download them here https://wiki.nordugrid.org/index.php/ARC1/Infosys/2011Review#Pictures_of_LDAP.2FXML_trees

The queries

Queries arclib uses to retrieve information from local endpoints (TargetInformationRetrieverPlugin)

  • ldapng:
baseURL: ldap://<hostname>:2135/Mds-vo-name=local,o=grid
filter: (|(objectclass=nordugrid-cluster)(objectclass=nordugrid-queue)(nordugrid-authuser-sn=" + escaped_dn + "))
attribute:
scope: subtree
  • ldapglue1:
baseURL: ldap://<hostname>:2170/o=grid
Note: in gLite, one only queries the bdii-site. All the resources are under the domain object. Base might be refined with the domain name
filter:
attribute:
scope: subtree
  • ldapglue2:
baseURL: ldap://<hostname>:2135/o=glue
filter: (&(!(GLUE2GroupID=ComputingActivities))(!(ObjectClass=GLUE2ComputingActivity)))
attribute:
scope: subtree
  • wsrfglue2:
baseURL: http[s]://<hostname>:443/
 ??
  • emies:
baseURL: http[s]://<hostname>:443/
 ??


Queries arclib uses to retrieve information from index endpoints (ServiceEndpointRetrieverPlugin)

  • egiis: baseURL: ldap://<hostname>:2135/Mds-vo-name=local,o=grid
filter:
attribute: giisregistrationstatus
scope: base
  • EMIR
baseURL: http[s]://<hostname>:443/
query string: /services/query.xml?
additional fields: limit=<num>&skip=<num>


Queries arclib uses to retrieve job lists (JobListRetrieverPlugin)

  • ldapng:
baseURL: ldap://<hostname>:2135/Mds-vo-name=local,o=grid
filter: (|(nordugrid-job-globalowner=" + escaped_dn + "))
attribute:
scope: subtree
  • ldapglue1: not in the code
  • ldapglue2: not in the code
  • wsrfglue2: baseURL: http[s]://<hostname>:443/
 ??
  • emies: baseURL: http[s]://<hostname>:443/
 ?? list of jobs


Job Retrieval

relevant records in renderings

Note: this features will only be available starting from 3.0.0. Client should not expect such information in 2.0.X

  • The GLUE2 ComputingActivity record contains the following relevant information:
  1. The GLUE2 IDFromEndpoint now contains a string of the form:
urn:idfe:<IDFromEndpoint> where IDFromEndpoint is the string A-REX assigns to the job.
Example: <IDFromEndpoint> urn:idfe:1mWMDm3XNWgnq2TvIpCwoSEmABFKDmABFKDmv7GKDmABFKDmXVoc9n </IDFromEndpoint>


  1. a special OtherInfo value to identify which submission interface the job was submitted to.
the format is one of:
<OtherInfo>SubmittedVia=org.nordugrid.gridftpjob</OtherInfo>
<OtherInfo>SubmittedVia=org.nordugrid.xbes</OtherInfo>
<OtherInfo>SubmittedVia=org.ogf.emies</OtherInfo>

  • nordugrid-job objectclass contains the following relevant information:
    • nordugrid-job-comment: SubmittedVia=<interfaceName>
Example: nordugrid-job-comment: SubmittedVia=org.ogf.emies


JobRetrievers behaviour

  • NorduGrid JobRetriever should only take into account nordugrid-job objects that have
    • nordugrid-job-comment: SubmittedVia=org.nordugrid.gridftpjob
and should discard all others.


  • ldapglue2
the following applies to the latest GLUE2 development:
  1. there is no jobs in the LDAPGLUE2 schema rendering.
  • EMI-ES:
    • Every EMI-ES jobretriever operation should start with a ResourceInfo endpoint.
  1. Get ActivityInfo and ActivityManagement ports from the ResourceInfo port
  2. Execute ListActivites() on ActivityInfo port-type
  3. For each activity returned above,
  • Execute GetActivityInfo()
  • check if the activity has OtherInfo: SubmittedVia=org.ogf.emies and keep it if so, otherwise discard it.

Tasks

  • [DONE] Florido fix the local info trees, AccessPolicy and MappingPolicy implemented. Is a list of VOs
  • [NOT DONE] Martin highlights the service discovery algorithm based on those Endpoint values
  • [NOT DONE] Zsombor finds a better configuration format. Configuration must be fixed on monday 6 Feb
  • [NOT DONE] Ivan arranges for a ServiceID to be part of EMIR records.
  • [NOT DONE] propagate new open enumeration strings to the GLUE2 committee.
  • [ONGOING] Florido finds info on GOCDB GOCDB exposes a text file via HTML I was not able to get. Moreover they're sketching their own restful interface to make queries to the db. so I'd rather wait and see what happens.

TODO

  • Commandline argument: specifying interface to use.
  • Rejecting from commandline.
  • Legacy stuff in UserConfig class, e.g. AddService, GetSelectedServices, etc.
  • Renaming EndpointerRetriever class and others maybe.
  • Improve brokers: absolute ranking, sort by ComputingEndpoint
  • ExecutionTarget with multiple ComputingEndpoint objects.
  • ExecutionTarget combiner
  • Raw submitters
  • Job Infomation retrievers.
  • Finish GLUE2 LDAP implmentation (only mulitple ComputingEndpoint objects needs to be supported)
  • Cross check interface names with Balazs, and Florido
  • Adapt plugin listing in CLIs.
  • Agree on preferred interface in client configuration.
  • Write API documentation.
  • Remove TargetGenerator and TargetRetriever classes.