Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data transformations for post-processing plot data #226

Merged
merged 35 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
706816f
Separated row filters into OR and AND categories.
pineapple-cat Oct 23, 2023
9a777d3
Updated existing tests to account for filtering change.
pineapple-cat Oct 23, 2023
a60bad2
Updated filter documentation.
pineapple-cat Oct 24, 2023
44c0b82
Slight filter mask code adjustment.
pineapple-cat Oct 24, 2023
8fe70c2
Added OR filter functionality unit test.
pineapple-cat Oct 25, 2023
c7c92e4
Removed series implementation information from filter documentation.
pineapple-cat Nov 2, 2023
380d2bc
Added ability to scale axis values by a column.
pineapple-cat Nov 6, 2023
35c211e
Added column scaling unit tests.
pineapple-cat Nov 6, 2023
a78b813
Added preliminary functionality to scale by specific value in a given…
pineapple-cat Nov 6, 2023
ca2deeb
Added ability to scale axis values by one custom value.
pineapple-cat Nov 8, 2023
1c47a1a
Added custom value scaling unit tests.
pineapple-cat Nov 8, 2023
4daef87
Added preliminary functionality to scale by a series.
pineapple-cat Nov 9, 2023
2d6db1e
Minor fixes + making axis label clearer.
pineapple-cat Nov 10, 2023
b5f24d4
Bug fix for legend labels of plots without series.
pineapple-cat Nov 10, 2023
2576475
Updated read_config errors.
pineapple-cat Nov 17, 2023
d9fde62
Added check to ensure custom scaling value cannot be zero.
pineapple-cat Nov 17, 2023
a1a4b96
Added initial attempt at sorting categorical x-axis. FIXME: dataframe…
pineapple-cat Nov 20, 2023
556dca2
Added more data transform unit tests.
pineapple-cat Nov 30, 2023
608fd07
Updated documentation to explain scaling and possible data transforma…
pineapple-cat Dec 1, 2023
b0ec251
Making use of titlecase library in graph labels to preserve acronyms.
pineapple-cat Dec 1, 2023
c9ae3f2
Fixed simple categorical x-axis sorting.
pineapple-cat Dec 1, 2023
b2d7ad9
Added note on sorting categorical x-axis.
pineapple-cat Dec 1, 2023
562f4aa
Fixed stray missing detail in unit test.
pineapple-cat Dec 1, 2023
4ecbb82
Updated dataframe sorting and fixed scaling mismatch by sorting befor…
pineapple-cat Dec 4, 2023
e10e5be
Updated x-axis sorting to work as expected for non-string values.
pineapple-cat Dec 4, 2023
5073413
Fixed sorting for graphs without series.
pineapple-cat Dec 4, 2023
385e155
Moved sorting to not interfere with filter mask.
pineapple-cat Dec 5, 2023
e952108
Changed default categorical x-axis sort from descending to ascending.
pineapple-cat Dec 5, 2023
697d476
Adjusted graph colour sorting.
pineapple-cat Dec 5, 2023
32d4334
Adjusted legend label sorting + fixed default data sorting order.
pineapple-cat Dec 8, 2023
0a65375
Style fixes (trimming long lines) + restored accidentally removed uni…
pineapple-cat Dec 8, 2023
28deebd
Fixed grouped (x, series) sorting for non-string data.
pineapple-cat Dec 15, 2023
7e24217
Adjusted grouped (x, series) sorting to ensure series sorting is seco…
pineapple-cat Dec 15, 2023
7119ce7
Added some README clarifications + a config template.
pineapple-cat Dec 18, 2023
49884d8
Rehomed note on replaced reframe columns.
pineapple-cat Dec 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 184 additions & 6 deletions post-processing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,16 @@

The post-processing scripts provided with the ExCALIBUR tests package are intended to grant users a quick starting point for visualising benchmark results with basic graphs and tables. Their components can also be used inside custom users' scripts.

There are three main post-processing components:
There are four main post-processing components:
- **`Perflog parsing`:**
- Data from benchmark performance logs are stored in a pandas DataFrame.
- **`Data filtering`:**
- If more than one perflog is used for plotting, DataFrames from individual perflogs are concatenated together into one DataFrame.
- The DataFrame is then filtered, keeping only relevant rows and columns.
- **`Data transformation`:**
- Axis value columns in the DataFrame are scaled according to user specifications.
- **`Plotting`:**
- A filtered DataFrame is passed to a plotting script, which produces a graph and embeds it in a simple HTML file.
- A filtered and transformed DataFrame is passed to a plotting script, which produces a graph and embeds it in a simple HTML file.
- Users may run the plotting script to generate a generic bar chart. Graph settings should be specified in a configuration YAML file.

### Installation
Expand All @@ -34,49 +36,112 @@ Run `post_processing.py -h` for more information (including debugging flags).

### Configuration Structure

Before running post-processing, create a config file including all necessary information for graph generation (you must specify at least plot title, x-axis, y-axis, and column types). See below for an example.
Before running post-processing, create a config file including all necessary information for graph generation (you must specify at least plot title, x-axis, y-axis, and column types). See below for a template, an example, and some clarifying notes.

- `title` - Plot title.
- `x_axis`, `y_axis` - Axis information.
- `value` - Axis data points. Specified with a column name.
- `units` - Axis units. Specified either with a column name or a custom label (may be null).
- `scaling` - (Optional.) Scale axis values by either a column or a custom value.
- `sort` - (Optional.) Sort categorical x-axis in descending order (otherwise values are sorted in ascending order by default).
- `filters` - (Optional.) Filter data rows based on specified conditions. (Specify an empty list if no filters are required.)
- `and` - Filter mask is determined from a logical AND of conditions in list.
- `or` - Filter mask is determined from a logical OR of conditions in list.
- `Format: [column_name, operator, value]`
- `Accepted operators: "==", "!=", "<", ">", "<=", ">="`
- `series` - (Optional.) Display several plots in the same graph and group x-axis data by specified column values. (Specify an empty list if there is only one series.)
- `Format: [column_name, value]`
- `column_types` - Pandas dtype for each relevant column (axes, units, filters, series). Specified with a dictionary.
- `Accepted types: "str"/"string"/"object", "int"/"int64", "float"/"float64", "datetime"/"datetime64"`

### Complete Config Template

This template includes all possible config fields, some of which are optional or mutually exclusive (e.g. `column` and `custom`).

```yaml
title: <custom_label>

x_axis:
value: <column_name>
# use one of 'column' or 'custom'
units:
column: <column_name>
custom: <custom_label>
# optional (default: ascending)
sort: "descending"

y_axis:
value: <column_name>
# use one of 'column' or 'custom'
units:
column: <column_name>
custom: <custom_label>
# optional (default: no data transformation)
# use one of 'column' or 'custom'
scaling:
column:
name: <column_name>
series: <index>
x_value: <column_value>
custom: <custom_value>

# optional (default: include all data)
# entry format: [<column_name>, <operator>, <column_value>]
# accepted operators: ==, !=, <, >, <=, >=
filters:
and: <condition_list>
or: <condition_list>

# optional (default: no x-axis grouping, one plot per graph)
# entry format: [<column_name>, <column_value>]
series: <series_list>

# include types for each column that is used in the config
# accepted types: string/object, int, float, datetime
column_types:
<column_name>: <column_type>
```

### Example Config

This example more accurately illustrates what an actual config file may look like.

```yaml
title: "Plot Title"

x_axis:
value: "x_axis_col"
units:
custom: "unit_label"
sort: "descending"

y_axis:
value: "y_axis_col"
units:
column: "unit_col"
scaling:
column:
name: "scaling_col"
series: 0
x_value: "x_val_s"

filters: [["filter_col_1", "<=", filter_val_1], ["filter_col_2", "!=", filter_val_2]]
filters:
and: [["filter_col_1", "<=", filter_val_1], ["filter_col_2", "!=", filter_val_2]]
or: []

series: [["series_col", "series_val_1"], ["series_col", "series_val_2"]]

column_types:
x_axis_col: "str"
y_axis_col: "float"
unit_col: "str"
scaling_col: "float"
filter_col_1: "datetime"
filter_col_2: "int"
series_col: "str"
```

#### A Note on X-axis Grouping
#### X-axis Grouping

The settings above will produce a graph that will have its x-axis data grouped based on the values in `x_axis_col` and `series_col`. (`Note: only groupings with one series column are currently supported.`) If we imagine that `x_axis_col` has two unique values, `"x_val_1"` and `"x_val_2"`, there will be four groups (and four bars) along the x-axis:

Expand All @@ -85,7 +150,120 @@ The settings above will produce a graph that will have its x-axis data grouped b
- (`x_val_2`, `series_val_1`)
- (`x_val_2`, `series_val_2`)

#### A Note on Column Types
#### Scaling

When axis values are scaled, they are all divided by a number or a list of numbers. If using more than one number for scaling, the length of the list must match the length of the axis column being scaled. (`Note: scaling is currently only supported for y-axis data, as graphs with a non-categorical x-axis are still a work in progress.`)

**Custom Scaling**

Manually specify one value to scale axis values by.

```yaml
y_axis:
value: "y_axis_col"
units:
column: "unit_col"
scaling:
custom: 2
```

In the snippet above, all y-axis values are to be divided by 2.

|y_axis_col||scaled_y_axis_col|
|-|-|-|
|3.2|3.2 / 2.0 =|1.6|
|5.4|5.4 / 2.0 =|2.7|
|2.4|2.4 / 2.0 =|1.2|
|5.0|5.0 / 2.0 =|2.5|

**Column Scaling**

Specify one column to scale axis values by.

```yaml
y_axis:
value: "y_axis_col"
units:
column: "unit_col"
scaling:
column:
name: "scaling_col"
```

In the snippet above, all y-axis values are to be divided by the corresponding values in the scaling column.

|y_axis_col|scaling_col||scaled_y_axis_col|
|-|-|-|-|
|3.2|**`1.6`**|3.2 / 1.6 =|2.0|
|5.4|**`2.0`**|5.4 / 2.0 =|2.7|
|2.4|**`0.6`**|2.4 / 0.6 =|4.0|
|5.0|**`2.5`**|5.0 / 2.5 =|2.0|

**Series Scaling**

Specify one series to scale axis values by. This is done with an index, which is used to find the correct series from a list.

In the case of the list of series from the example config above, index 0 would select a scaling series of `["series_col", "series_val_1"]`, while index 1 would scale by `["series_col", "series_val_2"]`.

```yaml
y_axis:
value: "y_axis_col"
units:
column: "unit_col"
scaling:
column:
name: "scaling_col"
series: 0
```

In the snippet above, all y-axis values are to be split by series and divided by the corresponding values in the scaling series.

|y_axis_col|scaling_col|series_col||scaled_y_axis_col|
|-|-|-|-|-|
|3.2|**`1.6`**|`series_val_1`|3.2 / 1.6 =|2.0|
|5.4|**`2.0`**|`series_val_1`|5.4 / 2.0 =|2.7|
|2.4|0.6|series_val_2|2.4 / 1.6 =|1.5|
|5.0|2.5|series_val_2|5.0 / 2.0 =|2.5|

**Selected Value Scaling**

Specify one value from a column to scale axis values by.

```yaml
y_axis:
value: "y_axis_col"
units:
column: "unit_col"
scaling:
column:
name: "scaling_col"
series: 0
x_value: "x_val_s"
```

In the snippet above, all y-axis values are to be divided by the scaling value found by filtering the scaling column by both series and x-axis value.

|x_axis_col|y_axis_col|scaling_col|series_col||scaled_y_axis_col|
|-|-|-|-|-|-|
|x_val_1|3.2|1.6|series_val_1|3.2 / 2.0 =|1.6|
|`x_val_s`|5.4|**`2.0`**|`series_val_1`|5.4 / 2.0 =|2.7|
|x_val_2|2.4|0.7|series_val_2|2.4 / 2.0 =|1.2|
|x_val_s|5.0|2.5|series_val_2|5.0 / 2.0 =|2.5|

(`Note: if series are not present and x-axis values are all unique, it is enough to specify just the column name and x-value.`)

#### Filters

A condition list for filtering has entries in the format `[<column_name>, <operator>, <column_value>]`. AND filters and OR filters are combined with a logical AND to produce the final filter mask applied to the DataFrame prior to graphing. For example:

- `and_filters` = `cond1`, `cond2`
- `or_filters`= `cond3`, `cond4`

The filters above would produce the final filter `mask` = (`cond1` AND `cond2`) AND (`cond3` OR `cond4`).

#### Column Types

Types must be specified for all columns included in the config in the format `<column_name>:<column_type>`. Accepted types include `string/object`, `int`, `float`, and `datetime`.

All user-specified types are internally converted to their nullable incarnations. As such:

Expand Down
Loading