Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking results post-processing #70

Open
1 of 3 tasks
ilectra opened this issue Jan 6, 2023 · 5 comments
Open
1 of 3 tasks

Benchmarking results post-processing #70

ilectra opened this issue Jan 6, 2023 · 5 comments

Comments

@ilectra
Copy link
Collaborator

ilectra commented Jan 6, 2023

When a benchmark is run, it generates a number of outputs: perflogs and environment logs. Those files have to be moved to an appropriate location (see #16 ), and then we'd like to provide some generic tools/scripts for the users to process them and produce tables and plots. The intention is for the scripts provided in this repository to be quite basic, and the user would clone/fork it and add their own. Scripts needed:

  • fetch appropriate data from remote location
  • read in all available info into one big pandas dataframe. This info would come from:
    • the perflog file path. It seems to be something like sysname/partition/environment/testname - see
      def parse_path_metadata(path):
      The system name and partition are needed variables for reports and plots, and the testname includes the name of the app and the parameters used to run the specific benchmark.
      • understand what this path means, where it comes from and how to define/change: it's the field prefix, just above format in the handlers_perflog block - see below for details and links.
    • the perflog content. The format of this content is defined by handlers_perflog.format in
      'handlers_perflog': [
      (note the last note here). Because this format is defined by the framework that generates the benchmarks, unfortunately the post-processing tools have to be in this repo and be kept in synch. The good news is that a lot of the functionality of reading and working with those perflogs exists already.
      The content of this file is most of what we need to present in a performance report, things like date the benchmark was run (for regression testing purposes), the number of nodes and threads used, the benchmark parameters (possibly mangled into some info string, but it's there), and the FOM values for each test.
    • Some environment variables have to be added as well, things like compilers and their versions, MPI implementations, the benchmark app version, etc. Some of those might be parameters of the benchmarks that could be fished out from the perflogs, some will need some more work. Some functionality must exist already...
  • Once all the data is in a DF, there are various use cases that would need scripts for the generation of tables and/or plots:
    • Single benchmark app run with various parameters (eg. castep). Each app can have different levels of things to create outputs/visualisations for, eg. different FOMs vs # nodes for different systems or for different input parameters, or both. The generic script should take the name of those FOMs/other parameters to create plots for, and fish them out of the DF.
    • Single benchmark app performance (i.e. FOM) vs time - see the FEniCS repo. Since benchmark timestamp is one of the contents of the DF, this is a subset of the above case technically, but it might benefit from its own simplified script that would take only the name of app and FOM as input.
    • Several benchmark apps, run on the same machine. Again, technically the same as the first case, but might benefit from simplified script.
    • Other use cases?
  • A good solution for presenting the output is github-pages, and we have 2 examples of doing that: the above FEniCS one, and our own (see Use github-pages to publish a website with data visualisation #18).
  • System/environment info: whole different can of worms (or not 😄 ) that we have to think of separately. This might be relevant/useful.
@tkoskela
Copy link
Member

@t-young31 has done more work on this in the DiRAC project than I had realised. As far as I can understand, he has already written a lot of the code needed for parsing the perflogs and plotting data for different use cases. His feedback was

  1. It was difficult for the plotting tool to be generic because there are too many variables to plot (compiler version, library versions, clusters, env variables, etc.)
  2. It was not clear what the use cases are

@ilectra
Copy link
Collaborator Author

ilectra commented Jan 25, 2023

Need to save spack spec as well. There's a hash for every spec, save it in the perflog as a field, as well as the spec in a separate file (name same as hash for simplicity) which will be updated only when something changes.

@ilectra
Copy link
Collaborator Author

ilectra commented Mar 3, 2023

Different ways to print results in the perflog:

  • The loggable attributes check_... in the logging.handlers_perflog.format.
    • the check_info variable is a message reporting the test name, the current partition and the current programming environment that the test is currently executing on, according to the docs
  • Environment variables: check_variables dictionary in the above list (changed to check_env_vars in ReFrame version 4.0)
  • Tags
  • Parameters. That's non-trivial. Try check_display_name and other variants of check_...name..., and see what is printed out.

After some tests, to see what's printed to the perflog for parameters and various names, the content of perflogs/myriad/compute-node/SombreroBenchmark_One_5.log is
2023-03-07T16:41:24|reframe 3.12.0|SombreroBenchmark %param_test1=One %param_test2=5 @myriad:compute-node+default|jobid=3420395|flops=1.07|num_tasks=1|num_cpus_per_task=1|num_tasks_per_node=1|ref=1|lower=-0.2|upper=null|units=Gflops/seconds|spack_spec=sombrero@2021-08-16|name=SombreroBenchmark_One_5|display_name=SombreroBenchmark %param_test1=One %param_test2=5|short_name=null|unique_name=SombreroBenchmark_One_5|descr=SombreroBenchmark %param_test1=One %param_test2=5|variables={"OMP_NUM_THREADS": "1"}|tags=
Note: This naming scheme changes with ReFrame version 4.0, see https://reframe-hpc.readthedocs.io/en/stable/manpage.html#test-naming-scheme

@ilectra
Copy link
Collaborator Author

ilectra commented Mar 3, 2023

For posterity, some of Tom's DiRAC work is in Lokesh's repo - the reading in the perflog part.

@ilectra
Copy link
Collaborator Author

ilectra commented Mar 8, 2023

Created a bunch of sub-issues to this one: #104 , #105 , #106, #107 , #108, #109 , #110

@pineapple-cat pineapple-cat linked a pull request Mar 27, 2023 that will close this issue
3 tasks
@tkoskela tkoskela removed a link to a pull request Jun 6, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants