This wiki is obsolete, see the NorduGrid web pages for up to date information.

Data Staging/Data Scheduling Solutions

From NorduGrid
Jump to navigationJump to search

Data Scheduling Solutions

References

1. http://www.cs.umu.se/~elmroth/papers/fsgrid.pdf

2. File transfer scheduling in Grid -- http://arxiv.org/pdf/0901.0291v1

3. Transfer speed estimation in Grid -- http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04976545

4. Data-aware scheduling in Grid -- http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V06-4TJTX32-1&_user=674998&_coverDate=04%2F30%2F2009&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1294872975&_rerunOrigin=google&_acct=C000036598&_version=1&_urlVersion=0&_userid=674998&md5=c39358ee8aeb2e06cd338a9827d71c44

File transfer scheduling algorithm (Reference 2)

Monitoring services collect information about local network topologies, links, hosts and user requests. They pass this information to a special service -- optimizer, that composes an overall picture out of provided data and schedules user transfers. The optimizers has network represented as a graph inside, having a "path" between every two points if they are connected to each other, and knowing bandwidth between them.

When a user submits a transfer requests, the optimizer discovers possible paths between source and destination and passes them to the scheduler. The scheduler calculates possible start times, end times and bandwidth for such transfer and then the user must choose between the offered option. The transfer will not be scheduled until the user has confirmed one of the options. The bandwidth is recalculated by the scheduler dynamically; the more transfers are going along the path, the less bandwidth has each of the transfer. Usually bandwidth is determined as the minimum among all the links that are active along the path. So if the scheduler may schedule the request along the path which is already occupied by transfers -- it will scale the bandwidth accordingly in its reply to the user.

Users may have different priorities. In addition, users may specify the deadline for the transfer requests, specify the time before which the transfer should not start, ask to perform the request ASAP and ask for some bandwidth. The user also has an ability to ask for a specified bandwidth to be available at specified time among some nodes (for example, to make the dynamical transfer while running the computation).

Pros

  • The algorithm allows dynamical redistribution of bandwidth for scheduled or running transfers.
  • Priorities are taken into account. User-specified constraints are also supported.

Cons

  • It's only an algorithm, we have to implement it in our environment.
  • Their idea is to have a centralized service with the control over the whole infrastructure.