This wiki is obsolete, see the NorduGrid web pages for up to date information.
Nagios Tests
Note: this page is not up-to-date. Please visit: http://git.nbi.ku.dk/downloads/NorduGridARCNagiosPlugins/
NorduGrid provides a set of Nagios tests that can be used to monitor the functionality of an ARC computing element. These tests were originally developed by the NDGF in order to provide availability monitoring to WLCG. The maintenance of the tests has since been taken over by the EMI project.
The tests are available in the workarea of the nordugrid subversion server: http://svn.nordugrid.org/trac/workarea/browser/nagios
They are also available packaged as an RPM: grid-monitoring-probes-org.ndgf
The configuration of the tests is collected in one configuration file called org.ndgf.conf. Adapt it to your needs making sure that the user configured to run the tests is authorized at the CEs you are testing and has the necessary access rights to the storage locations and catalogues configured.
Some of the tests send test jobs to the CE and will report the result when the test job has finished. If the job does not complete within 12 hours it will be killed and a warning is reported in Nagios.
The following tests are available:
org.arc.AUTH
This test checks that the CE is capable of doing authentication. The test performs a gridftp listing of the CE's contact URL and checks that the retrieved list contains a "new" directory.
(Note, that the source repository currently labels this as ARCCE-auth.)
org.arc.CA-VERSION
This test checks the IGTF CA version supported by the cluster. The test compares the list of CAs published in the CE's information system with the list of CAs in the version of the IGTF classic policy installed on the machine running the test.
(Note, that the source repository currently labels this as ARCCE-caver.)
org.arc.GRIDFTP
This test checks that the CE is capable of downloading input from and uploading output to a gridftp based storage. The test sends a testjob with a job description having an input file on a gridftp server and requesting upload of an output file using gridftp.
(Note, that the source repository currently labels this as ARCCE-gridftp.)
org.arc.Jobsubmit
This test checks that the CE is capable of running jobs. The testjob that is sent performs a set of additional test on the worker node where the test is run. The result of these tests are reported as passive tests. These tests are:
(Note, that the source repository currently labels this as ARCCE-jobsubmit.)
org.arc.python
Checks that a python interpreter is installed and reports its version.
(Note, that the source repository currently labels this as python.)
org.arc.perl
Checks that a perl interpreter is installed and reports its version.
(Note, that the source repository currently labels this as perl.)
org.arc.gcc
Checks that a gcc compiler is installed and reports its version.
(Note, that the source repository currently labels this as perl.)
org.arc.csh
Checks that a working csh interpreter is installed.
(Note, that the source repository currently labels this as csh.)
org.arc.LFC
This test checks that the CE is capable of downloading input that is registered in an LFC catalogue and registering uploaded output in an LFC catalogue. The test sends a testjob with a job description having an input file in an LFC catalog and requesting registration of the uploaded output file in LFC.
(Note, that the source repository currently labels this as ARCCE-lfc.)
org.arc.SW-VERSION
This test reports the middleware version used on the CE as reported in the CE's information system.
(Note, that the source repository currently labels this as ARCCE-softver.)
org.arc.SRM
This test checks that the CE is capable of downloading input from and uploading output to an SRM based storage. The test sends a testjob with a job description having an input file on an SRM server and requesting upload of an output file using SRM.
(Note, that the source repository currently labels this as ARCCE-srm.)
org.arc.ARC-STATUS
This test checks the status of the CE as reported in the CE's informaton system.
(Note, that the source repository currently labels this as ARCCE-status.)