This wiki is obsolete, see the NorduGrid web pages for up to date information.

Nagios Tests

From NorduGrid
Jump to navigationJump to search


Note: this page is not up-to-date. Please visit: http://git.nbi.ku.dk/downloads/NorduGridARCNagiosPlugins/

NorduGrid provides a set of Nagios tests that can be used to monitor the functionality of an ARC computing element. These tests were originally developed by the NDGF in order to provide availability monitoring to WLCG. The maintenance of the tests has since been taken over by the EMI project.

The tests are available in the workarea of the nordugrid subversion server: http://svn.nordugrid.org/trac/workarea/browser/nagios

They are also available packaged as an RPM: grid-monitoring-probes-org.ndgf

The configuration of the tests is collected in one configuration file called org.ndgf.conf. Adapt it to your needs making sure that the user configured to run the tests is authorized at the CEs you are testing and has the necessary access rights to the storage locations and catalogues configured.

Some of the tests send test jobs to the CE and will report the result when the test job has finished. If the job does not complete within 12 hours it will be killed and a warning is reported in Nagios.

The following tests are available:

org.arc.AUTH

This test checks that the CE is capable of doing authentication. The test performs a gridftp listing of the CE's contact URL and checks that the retrieved list contains a "new" directory.

(Note, that the source repository currently labels this as ARCCE-auth.)

org.arc.CA-VERSION

This test checks the IGTF CA version supported by the cluster. The test compares the list of CAs published in the CE's information system with the list of CAs in the version of the IGTF classic policy installed on the machine running the test.

(Note, that the source repository currently labels this as ARCCE-caver.)

org.arc.GRIDFTP

This test checks that the CE is capable of downloading input from and uploading output to a gridftp based storage. The test sends a testjob with a job description having an input file on a gridftp server and requesting upload of an output file using gridftp.

(Note, that the source repository currently labels this as ARCCE-gridftp.)

org.arc.Jobsubmit

This test checks that the CE is capable of running jobs. The testjob that is sent performs a set of additional test on the worker node where the test is run. The result of these tests are reported as passive tests. These tests are:

(Note, that the source repository currently labels this as ARCCE-jobsubmit.)

org.arc.python

Checks that a python interpreter is installed and reports its version.

(Note, that the source repository currently labels this as python.)

org.arc.perl

Checks that a perl interpreter is installed and reports its version.

(Note, that the source repository currently labels this as perl.)

org.arc.gcc

Checks that a gcc compiler is installed and reports its version.

(Note, that the source repository currently labels this as perl.)

org.arc.csh

Checks that a working csh interpreter is installed.

(Note, that the source repository currently labels this as csh.)

org.arc.LFC

This test checks that the CE is capable of downloading input that is registered in an LFC catalogue and registering uploaded output in an LFC catalogue. The test sends a testjob with a job description having an input file in an LFC catalog and requesting registration of the uploaded output file in LFC.

(Note, that the source repository currently labels this as ARCCE-lfc.)

org.arc.SW-VERSION

This test reports the middleware version used on the CE as reported in the CE's information system.

(Note, that the source repository currently labels this as ARCCE-softver.)

org.arc.SRM

This test checks that the CE is capable of downloading input from and uploading output to an SRM based storage. The test sends a testjob with a job description having an input file on an SRM server and requesting upload of an output file using SRM.

(Note, that the source repository currently labels this as ARCCE-srm.)

org.arc.ARC-STATUS

This test checks the status of the CE as reported in the CE's informaton system.

(Note, that the source repository currently labels this as ARCCE-status.)