This wiki is obsolete, see the NorduGrid web pages for up to date information.


From NorduGrid
Jump to navigationJump to search

Features and fault tolerance

  • Configuration:

- robustness, no hanging upon small syntax errors in the config file, default values are available and used for config options

- order of jobreport_options should be arbitrary

  • Job log file management:

- should be indifferent to any extra key=value lines not documented

- files should be deleted if and only if successfully submitted to the specified LUTS and/or expiration time has passed

  • UR generation:

- invalid UR XMLs should never be generated

- invalid means e.g.:

-- missing required properties (RecordId, JobId, Status etc.)

-- incorrect format of data (wrong iso 8601 time/interval, wrong DN string representation etc.)

  • Submitting:

- No hanging when service is not working

- UR batch size is not bigger than configured

- No jobs are skipped

  • UR archiving:

- storing: robustness, availability to create directory if non-existent, no hanging when fails writing to disk

- reading back if archive file is available: the content of read UR must be the same as that of a newly created one (except time stamp)

Technical documentation

If anything is unclear, please ask me or read the technical description (see svn arc1/trunk/doc/tech_doc/jura/jura-tech-doc.pdf).

Test scenario

In my notion, a test scenario would look something like this:

I. Set up a LUTS service somewhere

1. Find a suitable host


2. Compile the latest version of SGAS

The SGAS documentation is absolutely not up-to-date. The installation cannot be done based on these documents.

3. Set access control so that our JURA can insert records


4. Start the server

OK. Needs SUN JDK. The GCJ throws security exceptions.

II. Set up JURA

1. Find an A-REX resource for testing


2. Compile & install latest arc1 revision (or at least JURA part)

OK. Knowarc_final branch was used. Checked revision was 15143.

3. Configure: insert several "jobreport" destination URLs: at least one for a working service, and some invalid URLs for fault-tolerance testing


4. Enable JURA by putting a symlink called "logger" into the libexec dir

OK. The test were executed on a Debian linux. Is this "configuration" method suitable for Windows machines?

III. Run test jobs

1. Send thousands of jobs, possibly with different usage measures

- The Service URL generation is yet wrong in the source. (Already fixed in the trunk.)

- The former ini configuration contains a jobreport_credentials element that is completely missing from the XML A-REX configuration. Without this feature only the manual log file manipulation can "solve" the problem. (By adding the missing key_path, certificate_path, ca_certificates_dir elements.)

- The implemented configuration elements are working properly.

- The key_path, certificate_path, ca_certificates_dir elements are missing from the technical documentation. From the Appendix containing the available job log entries.

- When the A-REX starts it execute JURA as well. The files that aren't sent (because of the missing mandatory Status element) tried to be deleted twice.

[2009-10-27 01:01:29] [Arc] [ERROR] [4112/151015712] Failed to delete file /home/martoni/arex/control/logs/350612566015131150773522.t6rPOj:No such file or directory
[2009-10-27 01:01:29] [Arc] [ERROR] [4112/151015712] Failed to delete file /home/martoni/arex/control/logs/350612566015131150773522.nHHtrF:No such file or directory

- On SGAS LUTS server side the following error message appeared. This can be caused by an unsuccessful SOAP message compilation.

2009-10-27 05:56:25,641 WARN  handlers.FaultHandler [ServiceThread-8,getFrom:114] The WS-Addressing To request header is missing

2. Check that all jobs have been logged exactly once in the LUTS

OK. The sgas-exist-client was used for the server side usagerecord checking, but it has its own bugs. The scope of this testing doesn't cover that software component.

3. Test filling/omitting RunTimeEnvironment, JobName, ProjectName elements in JSDL

OK. The changes are appearing in the log file

4. Test filling RemoteLogger element in JSDL: with a valid LUTS URL as well as an invalid one

- The defined RemoteLogger element's value will be used next to the server side configured by generating log files, two for each reporting destination.

- Only one jobReport configuration element has significance. The further are ignored and won't used. This is not consistent with the former configuration and the documentation.

5. Suspend a LUTS for a longer period (n hours), check that jobs submitted during this period are logged when LUTS becomes available again

OK. The files are standing there and will be sent when the LUTS come up.

6. Suspend LUTS for a period longer than "expiration_time", check that files are deleted

OK. The files are standing there and the valid ones will be sent when the LUTS come up.

7. During all this, try changing JURA configuration (record set size, logging dir etc.), and see if any problem occurs

OK. The parameters (formerly options, the naming of configuration elements is not consistent with the documentation) are handled properly.