why did the specified data precision not work for MLPerf inference? #2006

Bob123Yang · 2024-12-31T08:55:09Z

Run Resnet50 without the precision parameter in the docker successfully and get the result (measurements.json) that displays data type is int8.

Then run Resnet50 again with the precision=float16 as below in the docker successfully but still get the data type is int8 from the result (measurements.json).

It seems that the parameter of precision=float16 didn't take effect. How can I run the model in different data precision conveniently on MLPerf?

cm run script --tags=run-mlperf,inference,_r4.1-dev
--model=resnet50
*--precision=float16 *
--implementation=nvidia
--framework=tensorrt
--category=edge
--scenario=Offline
--execution_mode=valid
--device=cuda
--division=closed
--rerun
--quiet

$ cat measurements.json 
{
  "starting_weights_filename": "https://zenodo.org/record/2592612/files/resnet50_v1.onnx",
  "retraining": "no",
  "input_data_types": "int8",
  "weight_data_types": "int8",
  "weight_transformations": "no"

arjunsuresh · 2024-12-31T20:00:23Z

Hi @Bob123Yang for nvidia implementation this is expected behaviour as the precision is automatically chosen by the implementation - often the best one satisfying the accuracy requirement for MLPerf. We don't have a choice to change this.

Bob123Yang · 2025-01-02T00:43:29Z

Thank you @arjunsuresh , so do you mean non-nvidia implementation have the choice of changing the precision by the paramter for MLPerf?

arjunsuresh · 2025-01-02T15:24:03Z

You're welcome @Bob123Yang Actually what I told is also true for other vendor implementations like Intel, AMD, Qualcomm etc. Reference implementations usually have fp16 and fp32 options especially for the pytorch models.

Bob123Yang · 2025-01-03T02:16:11Z

Oh it's a pity, thanks! @arjunsuresh

Could you help confirm one more question for NVIDIA multiple GPU scenario - how to run MLPerf inference on multiple GPUs which are connected with NVLink? Is there any parameter dedicated for that scenario or without any special parameter and just prepare the physical connection (such as NVLINK) working well for the multiple GPUs and then MLPerf running will automatically enable all GPU resources for usage?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why did the specified data precision not work for MLPerf inference? #2006

why did the specified data precision not work for MLPerf inference? #2006

Bob123Yang commented Dec 31, 2024

arjunsuresh commented Dec 31, 2024

Bob123Yang commented Jan 2, 2025

arjunsuresh commented Jan 2, 2025

Bob123Yang commented Jan 3, 2025

why did the specified data precision not work for MLPerf inference? #2006

why did the specified data precision not work for MLPerf inference? #2006

Comments

Bob123Yang commented Dec 31, 2024

arjunsuresh commented Dec 31, 2024

Bob123Yang commented Jan 2, 2025

arjunsuresh commented Jan 2, 2025

Bob123Yang commented Jan 3, 2025