This wiki is obsolete, see the NorduGrid web pages for up to date information.

LRMS Backends/workshop2015

From NorduGrid
Jump to navigationJump to search


time: 10/9 2015 10:00 - 17:00

Location: NBI UCPH - Lille frokoststue (across from the kantine, Blegdamsvej 19)

fun: YES!

Provisional agenda:

10:00 - 10:30 localtransfer

10:30 - 12:00 common lrms calls between infosys and jobcontrol backends


13:00 - 15:00 re-integrating python-lrms


15:30 - 17:00 API and Ze Future



localtransfer should be removed from the backends, and the old uploader and downloader should be removed. We should keep the option of reimplementing the feature if there are requests. The option should be kept in arc.conf and the backends should log a suitable warning if it is turned on.


CUS - remove from the backends, documentation, add warning etc.

AW - remove uploaders and downloaders from distribution

Oxana - register as known issue

common lrms commands

We need a process that uses batch system command to fill in a table with full job and queue information. A full batch system state. This should be the full information needed for scan and the information system. When this job is done, it moves the information to a common file used by both backends and informationsystem.


We are targeting SLURM and CONDOR.

List all information needed for infosystem and scan - Florido, Chrulle

Decide on a "format" for common file. - Florido, Chrulle

Get the information from lrms - Florido, Chrulle

Reintegrating python


Add ssh option to gridmanager configuration section - Martin

Add new backend options to a-rex, conf and move over backends from branch - Christian?

Create separate package for python backends containing the inline python - AW


Finish overview table. Exactly who is writing/reading what in the control dir - Christian


  • Investigate whether localtransfer is used by any sites - Christian
 It does not seem to be in use anywhere. The following sites are not using it: UCPH, IJS, Arnes, HPC2N, UiO, CSC, Glasgow, Bristol, RAL, brunel, Cern,  UA,  UNIBE
 No sites have reported it in use
  • Create a full overview of the "API" i.e. all information flowing between arc and the backends - Christian, Florido
information description source comment
joboption_lrms Which lrms is configured backends set this should not have joboption prefix. It is not a joboption.
  • Investigate what lrms functions are called and with which options -Christian, Florido
Backend jobcontrol infosystem

llcancel <jobid>
llq -r %st %id <jobids>
llq -l <jobid>
llclass -l
llclass -l <queuename>
llsub <jobscript>

llclass -l <queuename>
llq -c <queuename>
llq -l -x <jobid>
llstatus -f %sta
llstatus -l
llstatus -R
llstatus -r %cpu %sta
llstatus -v


condor_rm <joboption_jobid>%.`hostname -f`
condor_submit <jobscript>

condor_q -constraint "NiceUser == False" -format "ClusterId = %V\n" ClusterId -format "ProcId = %V\n" ProcId -format "JobStatus = %V\n" JobStatus -format "CurrentHosts = %V\n" CurrentHosts -format "LastRemoteHost = %V\n" LastRemoteHost -format "RemoteHost = %V\n" RemoteHost -format "ImageSize = %V\n" ImageSize -format "RemoteWallClockTime = %V\n" RemoteWallClockTime -format "RemoteUserCpu = %V\n" RemoteUserCpu -format "RemoteSysCpu = %V\n" RemoteSysCpu -format "JobTimeLimit = %V\n" JobTimeLimit -format "JobCpuLimit = %V\n\n" JobCpuLimit
condor_status -format "%s\n" Machine
condor_status -format "Name = %V\n" Name -format "Machine = %V\n" Machine -format "State = %V\n" State -format "Cpus = %V\n" Cpus -format "TotalCpus = %V\n" TotalCpus -format "SlotType = %V\n\n" SlotType

Uses a find to collect job ids from controldir: find $controldir/processing -maxdepth 1 -name 'job.??????????*.status'


qdel <jobid>
qstat -a
qsub -r n -S /bin/bash -m n < <jobscript>

pbsnodes -a
qmgr -c "list server"
qstat -f
qstat -Q
qstat -Q -f <queuename>
showbf -u <userid>


bkill <jobid>
bjobs -a -u all
bjobs -W -w <jobid>
bsub < <jobscript>
bparams -a

bhosts -w
bjobs -W -w <jobid>
bqueues -w
bqueues -w <userid> <queue name>
bqueues -l <userid> <queue name>
lshosts -w
lsid -V


qdel <jobid>
qstat -j <jobid>
qstat -j <jobid> -f <briefaccttempfile>
qstat -u '*'
qsub -S @posix_shell@ < <jobscript>
qconf -spl
qconf -sc

qconf -sconf global
qconf -sg <array/list of queue names>
qconf -sql
qhost -f -H
qhost -xml
qstat -f
qstat -help
qstat -j <jobid>
qstat -u '*' for compatibility with other versions either -F or -f


scancel <jobid>
squeue -a -h -o "%i:%T" -t all -j <jobids>
sacct -j <jobid>.batch -o ExitCode -P
scontrol -o show job <jobid>
sacct -j <localid>.batch -o NCPUS,NNODES,CPUTime,Start,End,ExitCode,State -P
sbatch <jobscript>

scontrol show config
scontrol show node --oneliner
sinfo -a -h -o \"cpuinfo=%C\"
sinfo -a -h -o \"PartitionName=%P TotalCPUs=%C TotalNodes=%D MaxTime=%l\"
squeue -a -h -t all -o \"JobId=%i TimeUsed=%M Partition=%P JobState=%T ReqNodes=%D ReqCPUs=%C TimeLimit=%l Name=%j NodeList=%N\"


wsclient -e <endpoint URL> -m status -j <jobid>


code in
ps -e -o ppid,pid,vsz,time,etime,user,comm
ulimit -t

# profile code and write database to ./nytprof.out
perl -d:NYTProf

# convert database into a set of html files, e.g., ./nytprof/index.html
# and open a web browser on the nytprof/index.html file
nytprofhtml --open

# or into comma separated files, e.g., ./nytprof/*.csv
  • Investigate ssh in HED - Silje,Martin
  • inline python - Jon
Available in EPEL and Debian (turned out a gregor herrmann added it to debian in July...)