This wiki is obsolete, see the NorduGrid web pages for up to date information.
Nagios probes description from original readme
The readme is located in http://svn.nordugrid.org/trac/nordugrid/browser/nagios/trunk/README and its content is following:
Monitoring ARC CEs
For each CE to monitor run
check_arcce -H <HOST> submit
This should be run not to frequently, in order to let one job finish before the next is submitted. The probe keeps track of submitted jobs, and will hold the next submission if necessary.
To also test data staging, add as many --stage-input and --stage-output options as required. The former must refer to an existing URL. The latter will stage a generated file to be uploaded to the given URL. This which will be deleted when fetching the job.
On a more regular basis, each 5 min or so, run
check_arcce -H <NAGIOS-HOST> monitor
which will monitor all job status of each host and submit it passively to a service matching the host name and the service description "ARCCE Job Termination". The passive service name can be configured.
For additional options, see
check_arcce --help check_arcce submit --help check_arcce monitor --help
Plugin Configuration
String-valued options can also be set in a configuration file, if it makes sense. Use the "arc-ce" section of /etc/nagios/plugins.ini, where the variable names are the option names with "--" removed and all "-" replaced by "_", e.g. --user-key becomes "user_key".
Connection URLs for job submission (the --ce option) may be specified in the section "arc-ce.connection_urls":
Example:
[arc-ce] user_cert = /etc/nagios/globus/robot-cert.pem user_key = /etc/nagios/globus/robot-key.pem loglevel = DEBUG
[arc-ce.connection_urls] arc1.example.org = ARC1:https://arc1.example.org:443/ce-service arc0.example.org = ARC0:arc0.example.org:2135/nordugrid-cluster-name=arc0.example.org,Mds-Vo-name=local,o=grid
Nagios Configuration
You will need command definitions for monitoring and submission:
define command { command_name check_arcce_monitor command_line $USER1$/check_arcce -H $HOSTNAME$ monitor } define command { command_name check_arcce_submit command_line $USER1$/check_arcce -H $HOSTNAME$ submit \
--stage-input srm://storage.example.org/monitoring/readable.txt \ --stage-output srm://storage.example.org/monitoring/srm-$HOSTNAME$.txt
}
Note that some ARC commands used will require a usable $HOME. If needed, prefix the above commands with "env HOME=/var/spool/nagios" or similar.
For monitoring, add a single service like
define service { use monitoring-service host_name localhost service_description ARCCE Monitoring check_command check_arcce_monitor }
For each host, add something like
define service {
use submission-service host_name arc0.example.org service_description ARCCE Job Submission check_command check_arcce_submit
}
define service {
use passive-service host_name arc0.example.org service_description ARCCE Job Termination check_command check_passive
}
Running Multiple Job Services on the Same Host
By default, running jobs are tracked on a per-host basis. To define multiple job submission services for the same host, pass to --job-tag a tag which identify the service uniquely on this host. Remember to also add a passive service and pass the corresponding --termination-service option.
As an example, the following adds dedicated services to check LFC staging for jobs:
define command { command_name check_arcce_submit_lfc command_line $USER1$/check_arcce -H $HOSTNAME$ submit --job-tag lfc \
--termination-service 'ARCCE LFC Job Termination' \ --stage-input lfc://lfc.example.org/monitoring/readable.txt \ --stage-output lfc://srm://storage.example.org/monitoring/lfc-$HOSTNAME$.txt@lfc.example.org/monitoring/lfc-HOSTNAME$.txt
}
define service {
use submission-service host_name arc0.example.org service_description ARCCE LFC Job Submission check_command check_arcce_submit_lfc
}
define service {
use passive-service host_name arc0.example.org service_description ARCCE LFC Job Termination check_command check_passive
}
Custom Job Descriptions
If the generated job scripts and job descriptions are not sufficient, you can provide hand-written ones by passing the --job-description option to the submit subcommand. In this case, --stage-input options will have no effect, while URLs passed to --stage-output will be recorded for deletion when the job finishes.
Currently no substitutions are done in the job decription file, other than what may be provided by ARC.