This wiki is obsolete, see the NorduGrid web pages for up to date information.

Nagios probes description from original readme

From NorduGrid
Jump to navigationJump to search

The readme is located in http://svn.nordugrid.org/trac/nordugrid/browser/nagios/trunk/README and its content is following:


Monitoring ARC CEs


For each CE to monitor run

   check_arcce -H <HOST> submit

This should be run not to frequently, in order to let one job finish before the next is submitted. The probe keeps track of submitted jobs, and will hold the next submission if necessary.

To also test data staging, add as many --stage-input and --stage-output options as required. The former must refer to an existing URL. The latter will stage a generated file to be uploaded to the given URL. This which will be deleted when fetching the job.

On a more regular basis, each 5 min or so, run

   check_arcce -H <NAGIOS-HOST> monitor

which will monitor all job status of each host and submit it passively to a service matching the host name and the service description "ARCCE Job Termination". The passive service name can be configured.

For additional options, see

   check_arcce --help
   check_arcce submit --help
   check_arcce monitor --help


Plugin Configuration


String-valued options can also be set in a configuration file, if it makes sense. Use the "arc-ce" section of /etc/nagios/plugins.ini, where the variable names are the option names with "--" removed and all "-" replaced by "_", e.g. --user-key becomes "user_key".

Connection URLs for job submission (the --ce option) may be specified in the section "arc-ce.connection_urls":

Example:

   [arc-ce]
   user_cert = /etc/nagios/globus/robot-cert.pem
   user_key = /etc/nagios/globus/robot-key.pem
   loglevel = DEBUG
   [arc-ce.connection_urls]
   arc1.example.org = ARC1:https://arc1.example.org:443/ce-service
   arc0.example.org = ARC0:arc0.example.org:2135/nordugrid-cluster-name=arc0.example.org,Mds-Vo-name=local,o=grid


Nagios Configuration


You will need command definitions for monitoring and submission:

   define command {
       command_name check_arcce_monitor
       command_line $USER1$/check_arcce -H $HOSTNAME$ monitor
   }
   define command {
       command_name check_arcce_submit
       command_line $USER1$/check_arcce -H $HOSTNAME$ submit \

--stage-input srm://storage.example.org/monitoring/readable.txt \ --stage-output srm://storage.example.org/monitoring/srm-$HOSTNAME$.txt

   }

Note that some ARC commands used will require a usable $HOME. If needed, prefix the above commands with "env HOME=/var/spool/nagios" or similar.

For monitoring, add a single service like

   define service {
       use                     monitoring-service
       host_name               localhost
       service_description     ARCCE Monitoring
       check_command           check_arcce_monitor
   }

For each host, add something like

   define service {

use submission-service host_name arc0.example.org service_description ARCCE Job Submission check_command check_arcce_submit

   }
   define service {

use passive-service host_name arc0.example.org service_description ARCCE Job Termination check_command check_passive

   }


Running Multiple Job Services on the Same Host


By default, running jobs are tracked on a per-host basis. To define multiple job submission services for the same host, pass to --job-tag a tag which identify the service uniquely on this host. Remember to also add a passive service and pass the corresponding --termination-service option.

As an example, the following adds dedicated services to check LFC staging for jobs:

   define command {
       command_name check_arcce_submit_lfc
       command_line $USER1$/check_arcce -H $HOSTNAME$ submit --job-tag lfc \

--termination-service 'ARCCE LFC Job Termination' \ --stage-input lfc://lfc.example.org/monitoring/readable.txt \ --stage-output lfc://srm://storage.example.org/monitoring/lfc-$HOSTNAME$.txt@lfc.example.org/monitoring/lfc-HOSTNAME$.txt

   }
   define service {

use submission-service host_name arc0.example.org service_description ARCCE LFC Job Submission check_command check_arcce_submit_lfc

   }
   define service {

use passive-service host_name arc0.example.org service_description ARCCE LFC Job Termination check_command check_passive

   }

Custom Job Descriptions


If the generated job scripts and job descriptions are not sufficient, you can provide hand-written ones by passing the --job-description option to the submit subcommand. In this case, --stage-input options will have no effect, while URLs passed to --stage-output will be recorded for deletion when the job finishes.

Currently no substitutions are done in the job decription file, other than what may be provided by ARC.