Parallel jobs
From NorduGrid
These are notes that went between Olli Tourunen, Daniel Johansson, Michael Gronager and Josva Kleist early 2009. They are what came from a discussion between Olli and Daniel with the purpose of creating a general model for handling parallel jobs (both multi-core, multi-node and mixed jobs).
I'm thinking about how to give everything we talked about in Budapest to
the rest of the world. How to convert them to our religion. ;)
Here is what I wrote down on the paper in my hotel room:
Contents |
General
Model describing resources for Infosys+arc.conf
Shared Memory Unit: [ max memory per rank (MiB), number of cores, Available Memory (MiB) ]
So, this is basically a node in a cluster, sharing memory, capable of running threaded applications. "Max memory per rank" could be something else than available memory/number of cores, but then you might not be able to run with all CPUs (overallocing cores in favour of more memory per core).
Resource: [ max ranks per job, job exclusive nodes (yes/no), bandwidth, allow_new_brokering (yes/no) ]
This is about general policy for a cluster/other resource. Could be per queue?
max ranks per job: maximum level of parallelism allowed. Should we also/instead have a list of the allowed levels, since some sites might only allow e.g. multiples of four (exclusive node allocation, but not necessarily coupled with the attribute below)
job exclusive nodes (yes/no): are we forced to allocate a full node (shared memory unit)
bandwidth: the bandwidth of the backend
allow_new_brokering (yes/no): Drawing a blank here. Daniel, what was the 'brave new brokering' again?
Thought: How do we enumerate the Shared Memory Units in infosys/arc.conf?
Xrsl
cpu_distribution This is for specifying the total number of processes/ranks and how many of those we want per node/shared memory unit.
arc.conf needs
per queue
max_ranks_per_job max_mem_per_rank max_mem_per_node cores_per_node job_exclusive_nodes bandwidth_between_nodes
per cluster
allow_new_brokering
Arc-xrsl
count -> ranks cpu_distribution = 2ranks/node, maxranks/node
Here we need to be careful with the terms. Rank, thread, process, core...
Infosys
max_ranks_per_job max_mem_per_rank max_mem_per_node cores_per_node job_exclusive_nodes bandwidth_between_nodes allow_new_brokering
Documentation
- cputime deprecated.
- walltime = time on clock.
- memory is per rank
info about new stuff. I don't know what e.g. Glue2 says about cputime? It would be nice to have it deprecated, though, having walltime + publishing benchmarks is some much clearer.
Summary
Things that need to be updated:
- Client brokering
- Publish info to jobs (env-vars)
- arc.conf attributes
- Infosys extensions
- Backend extensions
- Documentation
Backend extensions also need a way to publish the LRMS allocations to the job itself, if we are not always using runtime environments for that.
Final thoughts
This is a step in right direction. However (referring to the coffee table discussion at NG tech meeting), I think that many applications will benefit from having something more intelligent sitting on the frontend. Either hosted by HED as a web service or as a Runtime Environment. To enpower the REs in the latter case in a clean way we would need some improvements in the RE script <-> grid manager interface.
Case not closed, let's continue the discussion.