- 1 Chelonia test cases
- 1.1 General Comments
- 1.2 Testing Chelonia with centralized metadata
- 1.3 Testing with replicated metadata
- 1.4 Conclusion
Chelonia test cases
The Chelonia Manual provides information about how to install and use Chelonia: PDF
Test cases are written in bold type. Results of tests carried out on 22-23 Oct 2009 are written in normal type.
All tests were carried out using code from revision 15062 (21/10/09) of the arc1 trunk. Several problems were fixed in svn as they were reported, but have not been tested.
- Design document (NORDUGRID-TECH-17) is a very good technical overview
- Admin manual (NORDUGRID-MANUAL-10) has very good example configurations but lacks a complete configuration guide
- Suggest to have separate user manual with all the UI pieces from the admin manual
- Add command to list all shepherds and online status
- Ability to use wildcards
- mkdir -p
- Proper HED init script
Testing Chelonia with centralized metadata
- Install the Chelonia services on one machine (chapter 1 of the manual)
- Add two additional storage elements (chapter 3 of the manual)
- Use the chelonia client tool to test the following cases (chapter 2 of the manual)
Installation of server and client were done from source, all on the same laptop, following the instructions and examples in the manual. Everything worked and the services started ok. The only missing thing was that the profiles were not installed.
The 'chelonia' tool's builtin help
- Check the help 'chelonia' prints when running without arguments
- Check the help of all the methods: run chelonia with each method name without any other arguments
OK, but modify should give more info on the parameters it takes (ie that the values can be found from stat)
Working with files
- Upload a small file, download it, check if the downloaded one is the same as the uploaded one
stat should show timestamps in human-readable format, ls should be ordered, and it's not possible to overwrite a local file (but it is with arccp)
- Upload a big file, download it, check
Upload of 200MB file took 2.5 minutes, compared to 13 seconds with ngcp to gridftp server on same host
Upload of 3.2GB file took 52 minutes, compared to 9.5 minutes with ngcp to gridftp server on same host. After upload disk use was 100% and load average was 4.5 so had to delete the file to reduce load.
A big file appears in the namespace before the upload has finished (with replica state "creating")
- Upload several small files simultaneously, download them, test
Using 10 byte file
|No files||upload time||download time|
Uploading above 20 files and downloading above 30 files:
[2009-10-22 11:34:27] [Arc.MCC.TLS] [ERROR] [17135/147805992] Failed to establish SSL connection [2009-10-22 11:34:27] [Arc.MCC.TLS] [ERROR] [17135/147805992] SSL error: -1 - (empty):(empty):(empty) [2009-10-22 11:34:27] [Arc.MCC.TLS] [ERROR] [17135/147805992] Failed to send content of buffer ERROR: Cannot get an answer from the Bartender, reason: Failed to send SOAP message
Could be related to HED problems (bug 1579)
- Upload several big files simultaneously, download them, test
Using 200MB file
|No files||upload time||download time|
|10||10m (1 fail, load ave 23)||10m (5 failed)|
After upload of >10 files, disk use was constantly 100%, and computer ground to a halt. This was due to default value of checksum interval, which meant all files were checksummed every 20s. With a disk speed of 100MB/s, more than 2GB of files means constant checksumming. Increasing this parameter solved the disk use problem.
- Upload a small file, check the number of replicas (with 'stat'), wait until it has the correct number of replicas, then set the number to 1, and wait, then to 3 again, and wait, etc.
OK, but the check period is the same as the checksum period, so increasing this to a reasonable value (~hours) slows down replica creation.
- Do the previous with a big file
- Upload a file, set the replica number to 2, wait until it has the correct number of replicas, then kill one of the Shepherd-only services, wait and check continuously what is happening with the replicas (it should maintain 2 alive replicas)
OK, but would be nice to set the number of replicas in put (now fixed in svn)
- Remove files
OK, same time for big and small files
- Access files with using its GUIDs instead of using Logical Names (the GUIDs can be get from 'stat')
- Add arbitrary extra metadata (key-value pairs) to a file with 'chelonia modify <filename> set metadata <key> <value>', check with 'stat' if it is there
OK, but arbitrary state metadata eg file size can be modified to arbitrary types (fixed in svn)
Working with collections
- Create several collections in the root collection, list the root collection, check if they are there
OK, need mkdir -p
- Create several sub-collections within collections in the root collection, check if they are there
- Create deep collection-chain, e.g. /coll/coll/coll/coll/coll/coll/[...], check if it's OK
OK, time take increases with depth
- Move collections around, try to move something inside itself which should fail (e.g. move /coll/thing/ into /coll/thing/othercoll/)
- Upload files into collections, move files between collections, move collections with files in them, get the files from the new location
- Remove empty collections, remove non-empty collections (should fail)
OK but wrong error message
chelonia rm /david/dir /david/dir: nosuchLN
- Check a file's GUID, the unlink the file, then it still should be accessible with the GUID
- Close a collection with 'chelonia modfiy <dirname> set states closed yes', check with 'stat' if it is closed, try to move something into or out of this collections, check the closed state again with 'stat'
Doesn't work - permission denied (fixed in svn)
Configuring the client tool
- Test different ways to specify Bartender URLs for the client (config file, environment variable, -b flag)
No documentation on how to use config file/env vars
- Test different ways to specify ISIS URLs (config file, environment variable) without specifying Bartender URLs / with specifying Bartender URLs also
- Add wrong Bartender URLs (along with good ones), check the behaviour of the client
Empty error message (-v didn't give more)
> chelonia -b https://patentze:60001/BadBartender ls / ERROR: Cannot get an answer from the Bartender, reason:
- Add wrong ISIS URLs (along with good ones), check the behaviour of the client
OK - https://patntze:60001/ISIS - Failed to connect to ISIS.
- Misconfigure credentials, check the error message
OK for missing key/proxy files, but with an expired proxy, error message is not clear
> chelonia -v list / - Using default user config. - The proxy certificate file: /tmp/x509up_u1000-dc - The CA dir: /etc/grid-security/certificates - No Bartender URL found in the config or in the environment, try to get one from ISIS: [2009-10-27 11:49:35] [Arc.MCC.TLS] [ERROR] [29009/169307680] Failed to establish SSL connection [2009-10-27 11:49:35] [Arc.MCC.TLS] [ERROR] [29009/169307680] SSL error: -1 - (empty):(empty):(empty) [2009-10-27 11:49:35] [Arc.MCC.TLS] [ERROR] [29009/169307680] Failed to send content of buffer - https://patentze:60001/ISIS - Failed to connect to ISIS. ERROR: No Bartender URL found.
Policy isn't displayed by stat unless it is changed from the default, it is implicit that owner has all rights but not obvious.
New files should inherit parent's policy, default is only owner all rights.
No "group" policy, only individuals.
- Use multiple credentials (based on chapter 2 of the manual), try to list the root collection which a different user than the one who created
Fails, as expected
- Add read permissions for ALL to the root collections, check if all the available users can list it
OK, but need recursive, setting a collection doesn't affect the entries
- Try to create a collection in the root collection with a user who hasn't got addEntry rights, should fail
OK, but needs better error message
/davidsorange: failed to add child to parent
- Set permissions for individual users (with DNs), test them
- Try setting +read for ALL on the root collection, but set -read for one individual user, test if the others can read it but this user cannot
- Test different permission types: somebody who has no rights should not be able to do to, if somebody has rights should be able to do it
- addEntry (to put something into a collection)
- removeEntry (to remove something from a collection)
removeEntry doesn't work on file or directory. delete must be specified on the file to remove it
- delete (to remove a file or collection)
- modifyPolicy (to modify the policy of a file or colleciton)
- modifyStates (to set the number of needed replicas, or close a collecion)
- modifyMetadata (to add arbitrary key-value pairs)
- Test VOMS support (this is not trivial now: we need new profiles with VOMS support, need a user with VOMS attributes, install certificates): set a policy for a file, read permission only for a member of a VO ('chelonia policy <filename> set VOMS:<VOname> +read'), check if only the members of that we (and the owner of the file) can download it
No documentation available for how to use VOMS profile
- Try to follow chapter 6 of the manual
Works, but awkward syntax of URLs eg
It should be possible to set the bartender address in the client configuration file .arc/client.conf, using the bartender attribute. Some documentation on this attribute can be found in trunk/src/hed/libs/common/UserConfig.h and trunk/src/clients/client.conf.example. The usage of the bartender client configuration attribute should be documented in storage the documentation and it should be tested aswell.
- Try to follow chapter 7 of the manual
Works well, but quite slow and it is insecure
Testing the gateway
- We need globus libraries installed, and a working GridFTP server
- Try to follow chapter 5 of the manual
Not enough documentation to do this with ini style config
Testing with replicated metadata
- Based on chapter 4 of the manual, deploy a replicated Chelonia with all service on all nodes
OK, but takes a few restarts to get services working, even then I'm not sure they were all running - need an easy way to check this. After some time, arched processes are using more than 2GB of virtual memory.
- It would be ideal to do every test of the centralized case here as well, but the main things to test are these:
- While all the nodes are running upload some files, wait until they have the correct number of replicas
- Kill any of the three servers, then list and stat and download the files, upload new ones
- Trying https://patentze:60001/Bartender... ERROR: Cannot get an answer from the Bartender, reason: | Fault in SOAP message (Python exception: | Traceback (most recent call last): | File "/usr/local/lib/python2.5/site-packages/arcom/service.py", line 190, in process | outpayload = self._call_request(request_name, inmsg) | File "/usr/local/lib/python2.5/site-packages/arcom/service.py", line 157, in _call_request | return getattr(self,request_name)(inpayload) | File "/usr/local/lib/python2.5/site-packages/storage/bartender/bartender.py", line 979, in stat | response = self.bartender.stat(inpayload.auth, requests) | File "/usr/local/lib/python2.5/site-packages/storage/bartender/bartender.py", line 110, in stat | requests, traverse_response = self._traverse(requests) | File "/usr/local/lib/python2.5/site-packages/storage/bartender/bartender.py", line 196, in _traverse | raise Exception, 'Empty response from the Librarian' | Exception: Empty response from the Librarian | )
- Restart the killed server, then list and stat and download, and upload, wait until all the files have the correct number of replicas
Worked again after restart, multiple ISIS in config also worked.
- Kill any other server, then list/stat/download/upload
Most features work perfectly and the available documentation is very good. The main points of concern are
- A dedicated user manual is missing.
- The slow speed of data transfer when compared to a standard GridFTP server. I don't know if this is chelonia itself or the back-end SE causing this.
- The tight coupling of replica availability and checksumming should be removed. Checksumming is a heavy operation and the chance of files spontaneously corrupting on a disk is extremely low. The change of a whole storage element being offline however is relatively high. Therefore I'd want to have a high frequency check of SE availability, but a low frequency (or no) checksum checking.
- ACLs. On most Grid services I've seen, getting ACLs right is a major headache, both because the developers initially don't take into account what users want, and the users' demands change constantly. I think a study of real-life cases would be good to get an idea of what is needed.