Skip to content

Commit

Permalink
Fixes for 2.0 submission.
Browse files Browse the repository at this point in the history
- Set VERSION.
- Updated README files (eventually HPC group will need to do the same)
- Fixed repo checker to run with python < 3.7
  • Loading branch information
emizan76 committed May 12, 2022
1 parent 4272f73 commit b16366c
Show file tree
Hide file tree
Showing 9 changed files with 52 additions and 29 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ pip uninstall mlperf-logging
- [rcp_checker](mlperf_logging/rcp_checker): utility running convergence checks in submission directories
- [package_checker](mlperf_logging/package_checker): top-level checker for a package, it calls compliance checker, system desc checker, and rcp checker
- [result_summarizer](mlperf_logging/result_summarizer): utility that parses package and prints out result summary
- [repo_checker](mlperf_logging/repo_checker): utility that checks source code files for github compliance

## Instructions

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.0.0
2.0.0-rc3
47 changes: 31 additions & 16 deletions mlperf_logging/compliance_checker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,41 @@ The checker works with both python2 and python3, requires PyYaml package.

To check a log file for compliance:

python -m mlperf_logging.compliance_checker [--config YAML] [--ruleset MLPERF_EDITION] FILENAME
python -m mlperf_logging.compliance_checker [--config YAML] [--usage training/hpc] [--ruleset MLPERF_EDITION] FILENAME

By default, 1.0.0 edition rules are used and the default config is set to `1.0.0/common.yaml`.
By default, 2.0.0 training edition rules are used and the default config is set to `2.0.0/common.yaml`.
This config will check all common keys and enqueue benchmark specific config to be checked as well.
Old editions, still supported are 0.7.0 amd 0.6.0
Old training editions, still supported are 1.1.0, 1.0.0, 0.7.0 amd 0.6.0

Prints `SUCCESS` when no issues were found. Otherwise will print error details.

As log examples use [NVIDIA's v0.6 training logs](https://github.com/mlperf/training_results_v0.6/tree/master/NVIDIA/results).
To check hpc compliance rules (only 1.0.0 ruleset is supported), set --usage hpc --ruleset 1.0.0.

### Existing config files
Prints `SUCCESS` when no issues were found. Otherwise will print error details.

1.0.0/common.yaml - currently the default config file, checks common fields complience and equeues benchmark-specific config file
1.0.0/resnet.yaml
1.0.0/ssd.yaml
1.0.0/minigo.yaml
1.0.0/maskrcnn.yaml
1.0.0/rnnt.yaml
1.0.0/unet3d.yaml
1.0.0/bert.yaml
1.0.0/dlrm.yaml
As log examples use [NVIDIA's training logs](https://github.com/mlperf/training_results_v{0.6,0,7,1.0,1.1}/tree/master/NVIDIA/results).

### Existing config files for training submissions

2.0.0/common.yaml - currently the default config file, checks common fields complience and equeues benchmark-specific config file
2.0.0/closed_common.yaml - the common rules file for closed submissions. These rules apply to all benchmarks
2.0.0/open_common.yaml - the common rules file for open submissions. These rules apply to all benchmarks
2.0.0/closed_resnet.yaml - Per-benchmark rules, closed submissions.
2.0.0/closed_ssd.yaml
2.0.0/closed_minigo.yaml
2.0.0/closed_maskrcnn.yaml
2.0.0/closed_rnnt.yaml
2.0.0/closed_unet3d.yaml
2.0.0/closed_bert.yaml
2.0.0/closed_dlrm.yaml
2.0.0/open_resnet.yaml - Per-benchmark rules, closed submissions.
2.0.0/open_ssd.yaml
2.0.0/open__minigo.yaml
2.0.0/open_maskrcnn.yaml
2.0.0/open_rnnt.yaml
2.0.0/open_unet3d.yaml
2.0.0/open_bert.yaml
2.0.0/open_dlrm.yaml

### Existing config files for HPC submissions

### Implementation details
Compliance checking is done following below algorithm.
Expand Down Expand Up @@ -160,6 +174,7 @@ Tested and confirmed working using the following software versions:
- Python 2.7.12 + PyYAML 3.11
- Python 3.6.8 + PyYAML 5.1
- Python 2.9.2 + PyYAML 5.3.1
- Python 3.9.10 + PyYAML 5.4.1

### How to install PyYaML

Expand Down
5 changes: 3 additions & 2 deletions mlperf_logging/package_checker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ To check an organization's submission package for compliance:
python3 -m mlperf_logging.package_checker FOLDER USAGE RULESET
```

Currently, USAGE in ["training"] and RULESET in ["0.6.0", "0.7.0", "1.0.0"] are supported.
Currently, USAGE in ["training"] and RULESET in ["0.6.0", "0.7.0", "1.0.0", "1.1.0", "2.0.0"] are supported.

The package checker checks:
1. The number of result files for each benchmark matches the required count. If
Expand All @@ -25,4 +25,5 @@ The package checker checks:
Tested and confirmed working using the following software versions:

Python 3.7.7
Pythin 3.9.2
Python 3.9.2
Python 3.9.10
8 changes: 5 additions & 3 deletions mlperf_logging/rcp_checker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@ Run Reference Convergence Point checks for a submission directory.
This consists of testing whether a submission does not converge
statistically faster than the reference.

RCPs are loaded from directory mlperf_logging/rcp_checker/1.0.0/*.json
For training, RCPs are loaded from directory mlperf_logging/rcp_checker/training_2.0.0/*.json

The RCP checker supports only the 1.0.0 version.
The RCP checker supports only the 1.0.0 version onwards.
The current training version is 2.0.0.

## Usage

Expand All @@ -30,6 +31,7 @@ python3 -m pip install numpy scipy

## Tested software versions

python v3.9.2
python 3.9.2
python 3.9.10


4 changes: 2 additions & 2 deletions mlperf_logging/repo_checker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ review process.
python3 -m mlperf_logging.repo_checker FOLDER USAGE RULESET
```

Currently, USAGE in ["training"] and RULESET in ["0.6.0", "0.7.0", "1.0.0"] are supported.
Currently, USAGE in ["training"] and only RULESET 2.0.0 is supported.

The repo checker checks:
1. Whether the repo contains filenames that github does not like, e.g. files with spaces,
Expand All @@ -22,4 +22,4 @@ The repo checker checks:
## Tested software versions
Tested and confirmed working using the following software versions:

Python 3.9.9
Python 3.9.10
5 changes: 3 additions & 2 deletions mlperf_logging/repo_checker/repo_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@ def _check_file_sizes(submission_dir):
"-size",
"+50M",
],
capture_output=True,
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)
if len(out.stdout) != 0:
logging.error('Files > 50MB: %s', out.stdout)
Expand Down
6 changes: 4 additions & 2 deletions mlperf_logging/result_summarizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ python3 -m mlperf_logging.result_summarizer FOLDER USAGE RULESET

Alternatively, multiple organizations' submissions can be processed:

Currently, USAGE in ["training"] and RULESET in ["0.6.0", "0.7.0", "1.0.0"] are supported.
Currently, USAGE in ["training"] and RULESET in ["0.6.0", "0.7.0", "1.0.0", "1.1.0", "2.0.0"] are supported.
FOLDER can be a single organization's submission folder like
`/path/to/training_results_v0.6/COMPANY_NAME`. For example,
```sh
Expand All @@ -34,4 +34,6 @@ corresponding to one row of a table like the
## Tested software versions
Tested and confirmed working using the following software versions:

Python 3.7.7, 3.9.2
Python 3.7.7
Python 3.9.2
Python 3.9.10
3 changes: 2 additions & 1 deletion mlperf_logging/system_desc_checker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ To check a system description json file for compliance:
python3 -m mlperf_logging.system_desc_checker FILENAME USAGE RULESET
```

Currently, USAGE in ["training"] and RULESET in ["0.6.0", "0.7.0", 1.0.0] are supported.
Currently, USAGE in ["training"] and RULESET in ["0.6.0", "0.7.0", "1.0.0", "1.1.0", "2.0.0"] are supported.

Prints SUCCESS when no issues were found. Otherwise, will print FAILURE with error details.

Expand All @@ -19,3 +19,4 @@ Tested and confirmed working using the following software versions:

Python 2.7.18
Python 3.7.7
Python 3.9.10

0 comments on commit b16366c

Please sign in to comment.