Add data transformations for post-processing plot data #226

pineapple-cat · 2023-10-23T16:17:30Z

Addresses #183 and #205.

Fix categorical sorting.
Add unit tests.
Update documentation.

… column.

post-processing/post_processing.py

… sorting not reflected in bokeh graph.

…tions.

asifsamiarain · 2023-12-04T11:32:09Z

Seems there is some issue as updated the filters as per this branch while w.r.t. main branch that was working at least.

Here the perflogs w.r.t. a app and a tag has been attached for your perusal (that seems OK at my end too @pineapple-cat).
ssne.tar.gz

Here is the config:

title: sphng_single_node
x_axis:
  value: "job_completion_time"
  units:
    custom: null
y_axis:
  value: "elapsed_time_value"
  units:
    column: "elapsed_time_unit"
filters:
  and: [["test_name", "==", "Sphng_Single_Node_evolution"]]
  or: []
series: [[num_tasks_per_node, 1], [num_tasks_per_node, 2], [num_tasks_per_node, 4], [num_tasks_per_node, 8], [num_tasks_per_node, 16], [num_tasks_per_node, 32], [num_tasks_per_node, 64], [num_tasks_per_node, 128]]
column_types: # e.g. str/string/object, int/int64, float/float64, datetime/datetime64
  job_completion_time: "datetime"
  elapsed_time_value: "float"
  elapsed_time_unit: "str"
  test_name: "str"
  num_tasks_per_node: "int"

But as we use the data from all the apps
all_apps.tar.gz
then the plot looks like:

tkoskela · 2023-12-04T12:09:17Z

There might be a bug in series scaling. In SiWeakScaling.log with

title: Si Weak Scaling

x_axis:
  value: "num_cores"
  units:
    custom: null

y_axis:
  value: "Runtime_value"
  units:
    column: "Runtime_unit"

series: [["num_threads",1],["num_threads",2],["num_threads",4],["num_threads",8]]

filters:
  and: []
  or: []

column_types:
  num_cores: "int"
  num_threads: "int"
  Runtime_value: "float"
  Runtime_unit: "str"

I have

When I add scaling of the y-axis by the first series,

  scaling:
    column:
      name: "Runtime_value"
      series: 0

If I've understood correctly, each x value should get divided by the corresponding x value in the num threads = 1 series. The scaled results I get is

The value of the num threads = 1 series is 1 for all x values which looks like what I'd expect. The other scaled series look incorrect however. For example, the num cores = 8 value in num threads = 2 should be greater than 1.

…e scaling.

pineapple-cat · 2023-12-04T15:40:20Z

I still can't replicate exactly what's going wrong with Asif's example, but the scaling issue Tuomas found was caused by a num_cores mismatch, which I fixed by sorting the dataframe before scaling.

Edit: I've figured out how to more-or-less replicate the first problem. It appears to also be related to dataframe sorting in some way, so I'll continue to investigate now that I have a lead.

Edit 2: Problem fixed by moving sorting before filtering to avoid filter mask interference.

Requested bugfixes.
Sorting QoL fixes (default sort, colour assignment, legend label sort).

asifsamiarain · 2023-12-06T10:13:16Z

Thanks @pineapple-cat for the fix and now we may see the plot look like:

Still wonder, how to order w.r.t. series rather x-axis?

Here is below an example to give a quick look at the perflogs data:

import os
import glob
import pandas as pd
from pivottablejs import pivot_ui
 
path = os.getcwd()
files = glob.glob(os.path.join(path,"perflogs/*/*/*.log"))
df_list = (pd.read_csv(file, delimiter="|") for file in files)
df = pd.concat(df_list, ignore_index=True)
pivot_ui(df)

Above code will generate a browser viewable pivottablejs.html file (and above similar Sphng Single Node data w.r.t. 20230707, 20231013, 20231124 job completion times will look like grouped+ordered as shown below):

…ts check.

tkoskela

The documentation makes sense to me. A couple of suggestions:

It might be easier to understand if the examples were in figures instead of tables.
Instead of A note on X I would just have X as the subtitle.

While playing with Asif's logfiles, I noticed one more bit of odd behaviour. When I use the full datetime as the x axis, the series don't get sorted by num_tasks_per_node (or rather, it looks like they are ordered by the x axis value of the first element in the series).

If I drop the time from the datetimes, so that my series get grouped together on the x axis, I get the series sorted by numerica value of num_tasks_per_node in the legend, but in the plot they seem sorted by the string representation of num_tasks_per_node (ie. 8 is the last entry)

Is this a bug or a feature?

Should we include some general formatting options in the config file? Something to think about for future. Things I always end up hacking by hand include

orientation of the x axis labels
export to png

pineapple-cat · 2023-12-15T17:32:42Z

Bokeh has its own x-axis sorting method that I need to undermine at every step if we want non-string data to be sorted properly on a categorical plot. I've fixed this for (x, series) groupings in the commits below, but this will need to be revisited if we want to expand to (x, series1, series2) groupings. Here's what the plot should look like now:

Ascending

Descending

Additionally, there's no need to hack anything to produce a PNG of the graph; this feature is available through the 'Save' button in the Bokeh toolbar:

Of course, we could save someone a button click by including this as a setting in the config, and it's true that having the option of vertical x-axis group labels would also be a good addition.

Edit: Sorting is always done by x-axis first and then by series. If it's preferable to order by series, like in Asif's example with the pivot table, consider if you couldn't just swap your series and x-axis columns to achieve the effect you're looking for:

…ndary to x-value sorting.

pineapple-cat added 2 commits October 23, 2023 17:07

Separated row filters into OR and AND categories.

706816f

Updated existing tests to account for filtering change.

9a777d3

This was linked to issues Oct 23, 2023

Add data transformations in config and high-level script #183

Closed

Add OR and AND functionality to filtering #205

Closed

pineapple-cat added 7 commits October 24, 2023 15:20

Updated filter documentation.

a60bad2

Slight filter mask code adjustment.

44c0b82

Added OR filter functionality unit test.

8fe70c2

Removed series implementation information from filter documentation.

c7c92e4

Added ability to scale axis values by a column.

380d2bc

Added column scaling unit tests.

35c211e

Added preliminary functionality to scale by specific value in a given…

a78b813

… column.

ilectra reviewed Nov 7, 2023

View reviewed changes

post-processing/post_processing.py Outdated Show resolved Hide resolved

pineapple-cat added 14 commits November 8, 2023 17:19

Added ability to scale axis values by one custom value.

ca2deeb

Added custom value scaling unit tests.

1c47a1a

Added preliminary functionality to scale by a series.

4daef87

Minor fixes + making axis label clearer.

2d6db1e

Bug fix for legend labels of plots without series.

b5f24d4

Updated read_config errors.

2576475

Added check to ensure custom scaling value cannot be zero.

d9fde62

Added initial attempt at sorting categorical x-axis. FIXME: dataframe…

a1a4b96

… sorting not reflected in bokeh graph.

Added more data transform unit tests.

556dca2

Updated documentation to explain scaling and possible data transforma…

608fd07

…tions.

Making use of titlecase library in graph labels to preserve acronyms.

b0ec251

Fixed simple categorical x-axis sorting.

c9ae3f2

Added note on sorting categorical x-axis.

b2d7ad9

Fixed stray missing detail in unit test.

562f4aa

pineapple-cat requested a review from ilectra December 1, 2023 16:07

Updated dataframe sorting and fixed scaling mismatch by sorting befor…

4ecbb82

…e scaling.

pineapple-cat added 2 commits December 4, 2023 15:09

Updated x-axis sorting to work as expected for non-string values.

e10e5be

Fixed sorting for graphs without series.

5073413

pineapple-cat added 3 commits December 5, 2023 16:49

Moved sorting to not interfere with filter mask.

385e155

Changed default categorical x-axis sort from descending to ascending.

e952108

Adjusted graph colour sorting.

697d476

pineapple-cat added 2 commits December 8, 2023 14:39

Adjusted legend label sorting + fixed default data sorting order.

32d4334

Style fixes (trimming long lines) + restored accidentally removed uni…

0a65375

…ts check.

tkoskela reviewed Dec 11, 2023

View reviewed changes

pineapple-cat added 3 commits December 15, 2023 17:36

Fixed grouped (x, series) sorting for non-string data.

28deebd

Adjusted grouped (x, series) sorting to ensure series sorting is seco…

7e24217

…ndary to x-value sorting.

Added some README clarifications + a config template.

7119ce7

ilectra approved these changes Dec 19, 2023

View reviewed changes

Rehomed note on replaced reframe columns.

49884d8

pineapple-cat merged commit ed45fd2 into main Dec 19, 2023
4 checks passed

pineapple-cat deleted the post-processing_data-transform branch December 19, 2023 16:34

ilectra mentioned this pull request Dec 19, 2023

Investigate pivottable as a possibility for visualisation #257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data transformations for post-processing plot data #226

Add data transformations for post-processing plot data #226

pineapple-cat commented Oct 23, 2023 •

edited

Loading

asifsamiarain commented Dec 4, 2023 •

edited

Loading

tkoskela commented Dec 4, 2023 •

edited

Loading

pineapple-cat commented Dec 4, 2023 •

edited

Loading

asifsamiarain commented Dec 6, 2023 •

edited

Loading

tkoskela left a comment •

edited

Loading

pineapple-cat commented Dec 15, 2023 •

edited

Loading

Add data transformations for post-processing plot data #226

Add data transformations for post-processing plot data #226

Conversation

pineapple-cat commented Oct 23, 2023 • edited Loading

asifsamiarain commented Dec 4, 2023 • edited Loading

tkoskela commented Dec 4, 2023 • edited Loading

pineapple-cat commented Dec 4, 2023 • edited Loading

asifsamiarain commented Dec 6, 2023 • edited Loading

tkoskela left a comment • edited Loading

Choose a reason for hiding this comment

pineapple-cat commented Dec 15, 2023 • edited Loading

Ascending

Descending

pineapple-cat commented Oct 23, 2023 •

edited

Loading

asifsamiarain commented Dec 4, 2023 •

edited

Loading

tkoskela commented Dec 4, 2023 •

edited

Loading

pineapple-cat commented Dec 4, 2023 •

edited

Loading

asifsamiarain commented Dec 6, 2023 •

edited

Loading

tkoskela left a comment •

edited

Loading

pineapple-cat commented Dec 15, 2023 •

edited

Loading