Skip to content

Commit

Permalink
Updated parser script and readme (#71)
Browse files Browse the repository at this point in the history
* Updated to NV internal version 5.2 (5d662c51)

19 Nov, 2020 
Changelist for v5.2
- Fixed bugs related to naming convention in graphing function (power data was being stored into a loadgen variable).
- Made clearer that a workload name is needed in the loadgen directory structure and will warn the user
-- However, if *summary.txt or *detail.txt is by itself, there is currently no warning in place that the data parse is skipped


18 Nov, 2020 
Changelist for v5.1
- Added energy calculation to data sets that have "watts" or "power" in the label name.
- *Note* unit labels are not handled and the user must be aware and responsible for the units of the data


15 Sep, 2020 
Changelist for v5 
- Bug fix to graph filtering by run ID
- Bug fix to stats parsing
- Graph function polishing
- Expanded stats & graph window controls (as internal variable controls)
- Added deskew to command-line parameters.  Allows for manual timing adjustments.
- Added verbose to command-line parameters.  Moved most script messages to this flag.  Only error messages will be display be default.
- Added stats to command-line parameters.  If data is a number format, outputs data statistics to stdout.  Optional list of case-sensitive strings sub-parameter to filter for specific data.
- Added csv to command-line parameters.  Used with stats parameter to save statistics to a CSV file.  Optional filename sub-parameter.
- Updated graph command-line parameter to take in an optional list of case-sensitive strings as a sub-parameter to filter for specific data.
- Added workload to command-line parameters.  Requires a list of case-sensitive list of strings to be used as either a filter or to specify an unsupported workload.  Default list of workloads: resnet, ssd-large, ssd-small, mobilenet, gnmt.


01 Jun, 2020
Changelist for v4 (let's skip v3)
- Moved to using Pandas DataFrames for better data handling and statistical calculations.
- Graphing can now handle any set of CSV data.  However, explicit 'Date' and 'Time' columns are required.
- The DASH webpage is now more interactive and responds to various user inputs.
  * Filtering by keywords (OR'ed) AND'ed run ID (OR'ed) affects all figures
  * Hiding/showing traces via the legend box triggers updates to the Lodagen stats table
  * Using selection tool (box or lasso) will generate statistics for the selected points
- pytz has been dropped in favor of a user-specified timedelta setting
- Loadgen CSV generation has been modified to output a proper CSV table.
  Previous CSVs generated by older versions of this script will no longer work with the graphing function.


21 Apr, 2020
Changelist for v2
- Changed date/time to unified, iso format (YYYY-mm-DD HH:MM:SS.fff)
- Added graphing feature (use -g/--graph)

* Updated to reflect v5

* Update parse_mlperf.py to v6

Changelist in v6:
- Removed ability to adjust statistical window for calculations
- Added colored window to highlight loadgen window found and used for statistics (plotly v4.12 or later required)

* Updated to reflect v6

* Update parse_mlperf.py

Minor bug fix:
- Fixed issue where loadgen timestamps are present but no associated data for drawing the loadgen window.  Therefore both loadgen and data need to be present
  • Loading branch information
nv-eric-hw authored Jan 23, 2021
1 parent 4e814dc commit efe4069
Show file tree
Hide file tree
Showing 2 changed files with 938 additions and 392 deletions.
102 changes: 68 additions & 34 deletions log_parsers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,62 +3,96 @@

# Dependencies

Developed under Python 3.
Developed under Python 3 for Windows.
Other OS enviroments should work.

Latest versions should work, but not actively tested. The versions below were used for development.

The graphing feature uses plotly.
To install:
```
pip install plotly==4.6.0
pip install dash==1.18.1
pip install plotly==4.14.1
```

Data handling uses pandas & numpy
To install:
```
pip install pandas==1.0.5
pip install numpy==1.19.1
```

Timezone adjustment features uses pytz
Date parsing uses dateutils
To install:
```
pip install pytz
pip install dateutil
```


# Script In-Line Paramters

Inside the parser script are some global variables/options.

The following variables are for timing offsets between Host (PTDaemon, usually uses local time) and DUT (usually in UTC).
```
g_power_tz = None # pytz.timezone( 'US/Pacific' )
# Refer to pytz for timezone list. This sets the timezone for the Host system
g_loadgen_tz = None # pytz.utc
# Refer to pytz for timezone list. This sets the timezone for the DUT system. Typically does not need to be set.
g_power_add_td = timedelta(seconds=3600)
# This parameter will add the variable of seconds to the timedelta between loadgen and powerlog timestamps
g_power_sub_td = timedelta(seconds=0)
# This parameter will subtract the variable of seconds to the timedelta between loadgen and powerlog timestamps
# This is b/c timedelta does not use negatives in the seconds place, only in the days place.
```

The following variables are for modifying the graphing and statistical windows.
```
g_power_window_before_td = timedelta(seconds=30)
g_power_window_after_td = timedelta(seconds=30)
# These parameters will collate additional data the variable of seconds BEFORE loadgen's BEGIN time
# and the variable of seconds AFTER loadgen's END time
g_power_stats_begin_td = timedelta(seconds=3) # not implemented yet
g_power_stats_end_td = timedelta(seconds=3) # not implemented yet
# g_power_window* : adjusts the time around POWER_BEGIN and POWER_END of the loadgen timestamps to show data in graph.
typically used to hide or show setup or settling behavior for further analysis
g_power_window_before_add_td = timedelta(seconds=0)
g_power_window_before_sub_td = timedelta(seconds=0)
g_power_window_after_add_td = timedelta(seconds=0)
g_power_window_after_sub_td = timedelta(seconds=10)
```

# Command-line Parameters

```
-lgi, --loadgen_in : Directory of loadgen log files to parse
-pli, --power_in : PTDaemon power log file to parse
-lgo, --loadgen_out : loadgen CSV output filename to write to
-plo, --power_out : power log CSV output filename to write to
-g, --graph : graph the data, if possible. Uses lgo and plo filenames as input
-h, --help show this help message and exit
-lgi LOADGEN_IN, --loadgen_in LOADGEN_IN
Specify directory of loadgen log files to parase from
-spl SPECPOWER_IN, --specpower_in SPECPOWER_IN
Specify PTDaemon power log file (in custom PTD format)
-pli POWERLOG_IN, --powerlog_in POWERLOG_IN
Specify power or data input file (in CSV format)
-lgo LOADGEN_OUT, --loadgen_out LOADGEN_OUT
Specify loadgen CSV output file (default:
loadgen_out.csv)
-plo POWERLOG_OUT, --powerlog_out POWERLOG_OUT
Specify power or data CSV output file (default:
power_out.csv)
-g [GRAPH [GRAPH ...]], --graph [GRAPH [GRAPH ...]]
Draw/output graphable data over time using the lgi/lgo
and pli/plo as input. (Optional) Input a list of
strings to filter data
-s [STATS [STATS ...]], --stats [STATS [STATS ...]]
Outputs statistics between loadgen & power/data
timestamps using lgi/lgo and pli/plo as inputs.
(Optional) Input a list of strings to filter data
-csv [CSV], --csv [CSV]
Outputs statistics to a CSV file (optional parameter,
default: stats_out.csv) instead of stdout.
-w WORKLOAD [WORKLOAD ...], --workload WORKLOAD [WORKLOAD ...]
Parse for workloads other than [mobilenet, gnmt,
resenet50|resnet, ssd-large|ssdresnet34, or ssd-
small|ssdmobilenet]
-v, --verbose
-deskew DESKEW, --deskew DESKEW
Adjust timing skew between loadgen and power/data logs
(in seconds)
```

# Graph/Plots

When using the graph (**-g**) option, the script will loop into server mode.
Use a browser to connect to http://localhost:8050 (if running on the same system) or to the IP of the system running the script to view the graph(s).

To terminate the server (and script), press Ctrl-C (or equivalent).


# Future plans

- Incorporate Dash (from plotly) for better presentation of data and statistics
- Maybe move global variables to command-line
- Possible performance enhancements
Loading

0 comments on commit efe4069

Please sign in to comment.