Benchmarking results post-processing #70

ilectra · 2023-01-06T17:06:28Z

When a benchmark is run, it generates a number of outputs: perflogs and environment logs. Those files have to be moved to an appropriate location (see #16 ), and then we'd like to provide some generic tools/scripts for the users to process them and produce tables and plots. The intention is for the scripts provided in this repository to be quite basic, and the user would clone/fork it and add their own. Scripts needed:

fetch appropriate data from remote location
read in all available info into one big pandas dataframe. This info would come from:
- the perflog file path. It seems to be something like sysname/partition/environment/testname - see
  
  excalibur-tests/modules/utils.py
  
  Line 90 in 953bc6a
  
  def parse_path_metadata(path):
  
  The system name and partition are needed variables for reports and plots, and the testname includes the name of the app and the parameters used to run the specific benchmark.
  - understand what this path means, where it comes from and how to define/change: it's the field prefix, just above format in the handlers_perflog block - see below for details and links.
- the perflog content. The format of this content is defined by handlers_perflog.format in
  
  excalibur-tests/reframe_config.py
  
  Line 393 in 87dde00
  
  'handlers_perflog': [
  
  (note the last note here). Because this format is defined by the framework that generates the benchmarks, unfortunately the post-processing tools have to be in this repo and be kept in synch. The good news is that a lot of the functionality of reading and working with those perflogs exists already.
  The content of this file is most of what we need to present in a performance report, things like date the benchmark was run (for regression testing purposes), the number of nodes and threads used, the benchmark parameters (possibly mangled into some info string, but it's there), and the FOM values for each test.
- Some environment variables have to be added as well, things like compilers and their versions, MPI implementations, the benchmark app version, etc. Some of those might be parameters of the benchmarks that could be fished out from the perflogs, some will need some more work. Some functionality must exist already...
Once all the data is in a DF, there are various use cases that would need scripts for the generation of tables and/or plots:
- Single benchmark app run with various parameters (eg. castep). Each app can have different levels of things to create outputs/visualisations for, eg. different FOMs vs # nodes for different systems or for different input parameters, or both. The generic script should take the name of those FOMs/other parameters to create plots for, and fish them out of the DF.
- Single benchmark app performance (i.e. FOM) vs time - see the FEniCS repo. Since benchmark timestamp is one of the contents of the DF, this is a subset of the above case technically, but it might benefit from its own simplified script that would take only the name of app and FOM as input.
- Several benchmark apps, run on the same machine. Again, technically the same as the first case, but might benefit from simplified script.
- Other use cases?
A good solution for presenting the output is github-pages, and we have 2 examples of doing that: the above FEniCS one, and our own (see Use github-pages to publish a website with data visualisation #18).
System/environment info: whole different can of worms (or not 😄 ) that we have to think of separately. This might be relevant/useful.

The text was updated successfully, but these errors were encountered:

tkoskela · 2023-01-23T13:57:22Z

@t-young31 has done more work on this in the DiRAC project than I had realised. As far as I can understand, he has already written a lot of the code needed for parsing the perflogs and plotting data for different use cases. His feedback was

It was difficult for the plotting tool to be generic because there are too many variables to plot (compiler version, library versions, clusters, env variables, etc.)
It was not clear what the use cases are

ilectra · 2023-01-25T15:40:45Z

Need to save spack spec as well. There's a hash for every spec, save it in the perflog as a field, as well as the spec in a separate file (name same as hash for simplicity) which will be updated only when something changes.

ilectra · 2023-03-03T13:48:15Z

Different ways to print results in the perflog:

The loggable attributes check_... in the logging.handlers_perflog.format.
- the check_info variable is a message reporting the test name, the current partition and the current programming environment that the test is currently executing on, according to the docs
Environment variables: check_variables dictionary in the above list (changed to check_env_vars in ReFrame version 4.0)
Tags
Parameters. That's non-trivial. Try check_display_name and other variants of check_...name..., and see what is printed out.

After some tests, to see what's printed to the perflog for parameters and various names, the content of perflogs/myriad/compute-node/SombreroBenchmark_One_5.log is
2023-03-07T16:41:24|reframe 3.12.0|SombreroBenchmark %param_test1=One %param_test2=5 @myriad:compute-node+default|jobid=3420395|flops=1.07|num_tasks=1|num_cpus_per_task=1|num_tasks_per_node=1|ref=1|lower=-0.2|upper=null|units=Gflops/seconds|spack_spec=sombrero@2021-08-16|name=SombreroBenchmark_One_5|display_name=SombreroBenchmark %param_test1=One %param_test2=5|short_name=null|unique_name=SombreroBenchmark_One_5|descr=SombreroBenchmark %param_test1=One %param_test2=5|variables={"OMP_NUM_THREADS": "1"}|tags=
Note: This naming scheme changes with ReFrame version 4.0, see https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-naming-scheme

ilectra · 2023-03-03T13:57:12Z

For posterity, some of Tom's DiRAC work is in Lokesh's repo - the reading in the perflog part.

ilectra · 2023-03-08T15:15:16Z

Created a bunch of sub-issues to this one: #104 , #105 , #106, #107 , #108, #109 , #110

ilectra self-assigned this Jan 6, 2023

This was referenced Jan 6, 2023

Use github-pages to publish a website with data visualisation #18

Open

Keeping outputs and performance logs in seperate repositoy #16

Closed

tomdeakin mentioned this issue Jan 25, 2023

Performance Portability Post-Processing #79

Open

ilectra added the UCL label Jan 25, 2023

tkoskela assigned pineapple-cat Feb 2, 2023

ilectra mentioned this issue Mar 8, 2023

Create high-level script to run postprocessing #104

Closed

ilectra added the postprocessing label Mar 8, 2023

pineapple-cat linked a pull request Mar 27, 2023 that will close this issue

Add perflog parsing for post-processing #121

Merged

3 tasks

tkoskela removed a link to a pull request Jun 6, 2023

Add perflog parsing for post-processing #121

Merged

3 tasks

ilectra mentioned this issue Jul 21, 2023

Publish post-processing plots in some way #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking results post-processing #70

Benchmarking results post-processing #70

ilectra commented Jan 6, 2023 •

edited

Loading

tkoskela commented Jan 23, 2023

ilectra commented Jan 25, 2023

ilectra commented Mar 3, 2023 •

edited

Loading

ilectra commented Mar 3, 2023

ilectra commented Mar 8, 2023

Benchmarking results post-processing #70

Benchmarking results post-processing #70

Comments

ilectra commented Jan 6, 2023 • edited Loading

tkoskela commented Jan 23, 2023

ilectra commented Jan 25, 2023

ilectra commented Mar 3, 2023 • edited Loading

ilectra commented Mar 3, 2023

ilectra commented Mar 8, 2023

ilectra commented Jan 6, 2023 •

edited

Loading

ilectra commented Mar 3, 2023 •

edited

Loading