This wiki is obsolete, see the NorduGrid web pages for up to date information.

Parallel jobs

From NorduGrid
Jump to navigationJump to search

These are notes that went between Olli Tourunen, Daniel Johansson, Michael Gronager and Josva Kleist early 2009. They are what came from a discussion between Olli and Daniel with the purpose of creating a general model for handling parallel jobs (both multi-core, multi-node and mixed jobs).


I'm thinking about how to give everything we talked about in Budapest to the rest of the world. How to convert them to our religion. ;)

Here is what I wrote down on the paper in my hotel room:


General

Model describing resources for Infosys+arc.conf

Shared Memory Unit:
[ max memory per rank (MiB),
  number of cores,
  Available Memory (MiB)
]

So, this is basically a node in a cluster, sharing memory, capable of running threaded applications. "Max memory per rank" could be something else than available memory/number of cores, but then you might not be able to run with all CPUs (overallocing cores in favour of more memory per core).

Resource:
[ max ranks per job,
  job exclusive nodes (yes/no),
  bandwidth,
  allow_new_brokering (yes/no)
]

This is about general policy for a cluster/other resource. Could be per queue?

max ranks per job: maximum level of parallelism allowed. Should we also/instead have a list of the allowed levels, since some sites might only allow e.g. multiples of four (exclusive node allocation, but not necessarily coupled with the attribute below)

job exclusive nodes (yes/no): are we forced to allocate a full node (shared memory unit)

bandwidth: the bandwidth of the backend

allow_new_brokering (yes/no): Drawing a blank here. Daniel, what was the 'brave new brokering' again?


Thought: How do we enumerate the Shared Memory Units in infosys/arc.conf?

Xrsl

cpu_distribution This is for specifying the total number of processes/ranks and how many of those we want per node/shared memory unit.

arc.conf needs

per queue

max_ranks_per_job
max_mem_per_rank
max_mem_per_node
cores_per_node
job_exclusive_nodes
bandwidth_between_nodes

per cluster

allow_new_brokering

Arc-xrsl

count -> ranks cpu_distribution = 2ranks/node, maxranks/node

Here we need to be careful with the terms. Rank, thread, process, core...

Infosys

max_ranks_per_job
max_mem_per_rank
max_mem_per_node
cores_per_node
job_exclusive_nodes
bandwidth_between_nodes
allow_new_brokering

Documentation

  • cputime deprecated.
  • walltime = time on clock.
  • memory is per rank

info about new stuff. I don't know what e.g. Glue2 says about cputime? It would be nice to have it deprecated, though, having walltime + publishing benchmarks is some much clearer.

Summary

Things that need to be updated:

  • Client brokering
  • Publish info to jobs (env-vars)
  • arc.conf attributes
  • Infosys extensions
  • Backend extensions
  • Documentation

Backend extensions also need a way to publish the LRMS allocations to the job itself, if we are not always using runtime environments for that.


Final thoughts

This is a step in right direction. However (referring to the coffee table discussion at NG tech meeting), I think that many applications will benefit from having something more intelligent sitting on the frontend. Either hosted by HED as a web service or as a Runtime Environment. To enpower the REs in the latter case in a clean way we would need some improvements in the RE script <-> grid manager interface.

Case not closed, let's continue the discussion.