Ideas

From NorduGrid

Ideas and thoughs for future development

In this page we should collect summaries of various discussions during NG meetings. Summary must be short and concise, identify key issues, give suggestions.


Clients

From nordugrid-discuss:

arcrsync that does an rsync-like job of updating the destination directory tree to look like the source, where one (or both) of src and dst is an SE.

From NG2014:

Usability is the key issue in the client design.

Highlights:

  • Error handling should be done in a smarter way, e.g. to compensate lack

of information from underlying libraries. A dedicated client component for error handling is needed.

  • xrls parsing needs a validator, users never understands what is wrong. A parser defined using

off the shelf software might give better results. Some examples here: http://en.wikipedia.org/wiki/Parsing

  • Client status (what is the client doing?) must be done in a user-friendly way. Users do not understand where they are allowed to run.

Example: search of available matching sites should answer the question "Where am I allowed to run? Is there any place where I am already running joibs?

From NG2013-Visegrad retreat:

No internals should be shown:

  • zero configuration, i.e. automation of setup of:
    • Certificate stuff (own certificate, CA, CRLs)
      • fetch-crl? - cache last crl run and have arc tools check it
      • where to get CAs?
  • indexes "bootstrap" (should be esay to config for VO)
    • maybe configuration "per VO"


Dataset awareness: Many different science domains have their own definitions of dataset. Would be cool to be able to instruct arc clients what the dataset is like so that the user community does not have to re-learn ways of interacting with it.

  • "create your input dataset" generates a way for arc clients to access data
  • parameter sweep
  • automagically manipulate such dataset.
example: create xrsl for jobs working on different items in
datasets, based on common tasks on datasets
  • define output dataset: describe what is it like, define location
    • can be another dataset
    • can be just results
  • define some kind of structure that allows "groups" in datasets, to allow the user to select them in a easy way. Example groups: Analysis2014, Analysis2013...

How?: identify known patterns of data usage

  • files in a directory
  • datasets that need grouping
  • one output file for multiple input files


Semi-Automatic generation of xrsl for defined tasks

  • templating xrsl generation
    • might be done with a webpage
  • manipulate parameter sweep (also multiple)

How?: Consider and implement simple cases:

  • inputfiles, executables, output files;
  • automatic generation of structure for multiple input files

Job management:

  • if batch of jobs, manage job resubmission (can be automatic or manual)

How?: aCT possible solution for the above.

Job visualization/presentation/reporting:

  • "visualize" job processing
  • group or order them by different filter
  • visualize status
  • produce "reports" based on filters and groups
  • Job "groupings", i.e. jobs operating on the same dataset

How?:

  • maybe revise arcstat output is enough
  • maybe manipulating/visualizing data stored by aCT?

workflows

Manipulation of workflows requires a higher level client.


Conclusion: All these features require reimplementation of a new client. The client should use API/SDK.