This wiki is obsolete, see the NorduGrid web pages for up to date information.
Data Staging/DTR
DTR description
DTR stands for Data Transfer Request. This is the structure that contains several fields that fully describe the file transfer to be performed. One DTR is generated by the generator per each file transfer.
Fields of the DTR
More or less required:
- DTR ID
- source endpoint
- destination endpoint
- for source and destination, a list of metadata such as file size, checksum, creation date etc
- for source and destination (if applicable) a list of replicas
- for source and destination (if applicable) current replica
- for source and destination (if applicable) TURL or delivery-level URL used for transfer
- for source and destination (if applicable) request ID (in the case of asynchronous requests to remote storage services)
- credentials
- cache information
- if the file is cacheable, the filename in cache
- cache directories configuration
- caching state (already in cache, cache currently locked etc)
- local user information (uid/gid)
- Job ID this transfer belongs to
- priority of the transfer - a number set by the generator which flattens priorities
- transfer share this DTR belongs to
- sub-share the DTR belongs to - may be set by the Generator
- tries left
- flags to handle properties and strategies when dealing with index servers
- flag to say whether DTR is replicating inside the same logical filename
- flag to say whether DTR should force registration to an existing logical filename, if the source is different
- mapping info - mapping information of local files to which remote files may be mapped to in the configuration (copyurl/linkurl)
- status of the DTR
- error status
- type of error
- location of error
- text description of error detail
- number of bytes transferred/offset
- timing properties
- timeout - time which DTR is allowed to remain in current state
- creation time
- last modification time
- process time - wait until this time to do further processing
- cancel (set to true if request is to be cancelled)
- bulk operation flags to combine several DTRs in a bulk request
- delivery endpoint, whether Delivery is to be carried out by a local process or remote service
- current owner - who is in charge for this DTR right now
- logger object, so each DTR can have its own log
- lock, since DTRs can be modified by several processes, for avoiding writing collisions
Possible
- affiliation (if we use the affiliation of multiple DTRs, see right below).
- history of states
Multiple DTRs may be affiliated together. Possible reasons and uses:
- Belong to same job
- Belong to bunch of jobs which user indicated as preferably processed together
- Belong to same VO and assigned priorities to be applied within group
- Failure of one DTR in group may cancel processing of other DTRs (should be implemented in Generator)
State transitions of DTR
All possible states of a DTR, with arrows indicating the normal flow of DTRs between states. Each state is explained in detail below. Error conditions are not included here but are shown in another diagram further down.
Status codes
The following table describes all non-error status codes, and also the action taken in the event of a cancellation request being received while in that state. In general if all of the data transfer has been completed before receiving a cancellation request, the destination file is not deleted. The main reason for this is to preserve cache files, as the user may wish to run the same job soon after cancelling it.
Status Code | Text Description | Action on cancel |
---|---|---|
Statuses set by the generator | ||
NEW | The DTR has just been built by the generator | Return to generator |
CANCEL | A request has been made to cancel the DTR | n/a |
Statuses set by the scheduler | ||
CHECK_CACHE | The DTR destination is cacheable and the cache should be checked for the file's existence | Return to generator |
RESOLVE | The DTR source is a meta-protocol and should be resolved | Set to PROCESS_CACHE to remove any cache locks |
QUERY_REPLICA | The DTR source should be queried to check existence, check file size, checksum etc. | Set to REGISTER_REPLICA to remove pre-registered destination |
PRE_CLEAN | The destination in the DTR should be deleted before writing | Set to REGISTER_REPLICA to remove pre-registered destination |
STAGE_PREPARE_SOURCE | The DTR source is a meta-protocol which must be prepared or staged | Set to REGISTER_REPLICA to remove pre-registered destination |
STAGE_PREPARE_DESTINATION | The DTR destination is a meta-protocol which must be prepared or staged | Set to REGISTER_REPLICA to remove pre-registered destination |
TRANSFER_WAIT | The DTR is ready to be sent to delivery but must wait due to transfer limits or priority settings | Set to RELEASE_REQUEST |
TRANSFER | The DTR should be transferred immediately | Set to RELEASE_REQUEST |
RELEASE_REQUEST | The DTR transfer has finished and any requests made on remote storage should be released | Abort request and delete destination, set to REGISTER_REPLICA |
REGISTER_REPLICA | The DTR destination is a meta-protocol and the new replica should be registered | Delete destination and set to PROCESS_CACHE |
PROCESS_CACHE | The DTR destination is cacheable and the cached file should be unlocked and linked/copied to the session dir | Delete cache file |
DONE | The DTR completed successfully | Do nothing |
CANCELLED | The DTR has been cancelled succesfully | n/a |
ERROR | An error occurred with the DTR | Do nothing |
Statuses set by the pre-processor | ||
CHECKING_CACHE | The pre-processor is checking the cache | Wait until complete, then set to CACHE_CHECKED. The scheduler will then set to PROCESS_CACHE |
CACHE_WAIT | The cache file is locked and the scheduler should wait before trying to obtain the lock | Scheduler will return to generator |
CACHE_CHECKED | The cache check is complete | Scheduler will set to PROCESS_CACHE |
RESOLVING | The pre-processor is resolving replicas | Wait until complete, then set to RESOLVED. The scheduler will then set to REGISTER_REPLICA |
RESOLVED | The replica resolution is complete | Scheduler will set to REGISTER_REPLICA |
QUERYING_REPLICA | The pre-processor is querying a replica | Wait until complete, then set to REPLICA_QUERIED. The scheduler will then set to REGISTER_REPLICA |
REPLICA_QUERIED | The replica querying is complete | Scheduler will set to REGISTER_REPLICA |
PRE_CLEANING | The pre-processor is deleting the destination file | Wait until complete, then set to PRE_CLEANED. The scheduler will set to REGISTER_REPLICA |
PRE_CLEANED | The destination file has been deleted | The scheduler will set to REGISTER_REPLICA |
STAGING_PREPARING | The pre-processor is making a staging or preparing request | Wait until complete, then scheduler will set to RELEASE_REQUEST so it can be aborted |
STAGING_PREPARING_WAIT | The staging or preparing request is not ready and the scheduler should wait before polling the status of the request | Scheduler will set to RELEASE_REQUEST so it can be aborted |
STAGED_PREPARED | The staging or preparing request is complete | Scheduler will set to RELEASE_REQUEST so it can be aborted |
Statuses set by the delivery | ||
TRANSFERRING | The transfer of the DTR is on-going | Stop transfer and set to RELEASE_REQUEST. Delivery will delete the incomplete file and the request will be aborted |
TRANSFERRED | The transfer completed successfully | Scheduler will abort the request |
Statuses set by the post-processor | ||
RELEASING_REQUEST | The post-processor is releasing a stage or prepare request | Wait until finished, then set to REGISTER_REPLICA to unregister the file |
REQUEST_RELEASED | The release of stage or prepare request is complete | Set to REGISTER_REPLICA to unregister the file |
REGISTERING_REPLICA | The post-processor is registering a replica in an index service | Continue as normal |
REPLICA_REGISTERED | Replica registration is complete | Continue as normal |
PROCESSING_CACHE | The post-processor is releasing locks and copying/linking the cached file to the session dir | Continue as normal |
CACHE_PROCESSED | Cache processing is complete | Continue as normal |
Error Conditions of DTRs
The following diagram shows possible error conditions and actions taken. For simplicity and because all error handling logic takes place within the scheduler, the pre- and post-processor and the delivery layers are not shown.
Errors are categorised into the following types:
Error | Explanation | Retryable? | Action |
---|---|---|---|
INTERNAL_LOGIC_ERROR | Internal error in data staging logic | No | Stop processing and report back to generator |
INTERNAL_PROCESS_ERROR | Internal error like losing contact with an external process | Yes | Clean if necessary and retry |
SELF_REPLICATION_ERROR | Attempt to copy a file to itself | No | Return to generator |
CACHE_ERROR | A problem occurred in cache handling | Yes | Retry without caching |
TEMPORARY_REMOTE_ERROR | Error such as connection timeout on remote service | Yes | Retry with an increasing back-off |
PERMANENT_REMOTE_ERROR | Error such as file not existing, permission denied etc on remote service | No | Follow cancellation steps and return failed DTR to generator |
LOCAL_FILE_ERROR | Error with a local file | No | Follow cancellation steps and return to generator |
TRANSFER_SPEED_ERROR | Transfer rate was below specified limits | Yes | Retry transfer. If all retries fail, report back to generator - it will make the decision on whether to cancel other related DTRs. (Future work: make decision on whether other transfers caused slow transfer and whether cancelling others would help or should be done) |
STAGING_TIMEOUT_ERROR | The staging process took too long | No | Try a different replica - if none available, cancel and report back to generator |
Methods of DTRs
DTR::push (DTR, receiver) -- pass the DTR from one process to another, e.g. DTR::push (dtr, preprocessor)
Implementation
Within Data Staging framework there is a global list of DTRs. Pointers to the DTRs are passed around between components, which can modify them directly and push them between each other.