Update Output Comparison for pytest #146

PhaneeshB · 2024-04-02T16:27:01Z

This PR changes the result comparison method from iree-run-module comparison to numpy based comparison of iree_outputs and golden_outputs

ScottTodd · 2024-04-02T17:06:40Z

.gitignore

@@ -28,6 +28,7 @@
 *.safetensors
 *.gguf
 *.vmfb
+iree_tests/onnx/node/generated/test_*/iree_output_*.npy


The iree_tests/ folder has its own .gitignore file to keep the top level file cleaner: https://github.com/nod-ai/SHARK-TestSuite/blob/main/iree_tests/.gitignore

ScottTodd · 2024-04-02T17:14:56Z

iree_tests/conftest.py

+        # TODO: add support for comparison of non numpy supported dtypes. using iree-run-module
+        # numerical error
+        self.test_numerical_accuracy()
+
+    def test_numerical_accuracy(self):
+        num_iree_output_files = len(list(self.test_cwd.glob("iree_output_*.npy")))
+        num_output_files = len(list(self.test_cwd.glob("output_*.npy")))
+        if num_iree_output_files != num_output_files:
+            raise AssertionError(f"Number of golden outputs ({num_output_files}) and iree outputs ({num_iree_output_files}) dont match")
+
+        for i in range(num_output_files):
+            iree_output = load((self.test_cwd / f"iree_output_{i}.npy"))
+            golden_output = load((self.test_cwd / f"output_{i}.npy"))
+            assert_allclose(iree_output, golden_output, atol=self.atol, rtol=self.rtol, equal_nan=False)


I'd like to continue using --expected_output, at least until we can prove that comparison in C++ is no longer sufficient.

We have data types that numpy does not support. For that we can use binary files instead of .npy

This style of testing aims to have as thin a test runner as possible, leaning mostly on the native tools themselves. Right now all the test runner does is

discover test cases

run native tools (iree-compile, iree-run-module) with flags

check the return codes

By having a thin test runner with a narrow set of responsibilities, other test runner implementations are possible and results are easier to reproduce outside of the test environment.

Someone could write a ctest (or Bazel, or something else) test runner that uses these tools. Those runners may not have as direct access to Python utilities like numpy. We should also be able to run tests on systems without Python (Android, the web, etc.)

Test case reproducers are just commands to run. This changes that to be more complicated - "to reproduce, run iree-run-module ... then run this numpy code"

How about we first see if we can modify https://github.com/openxla/iree/blob/main/runtime/src/iree/tooling/comparison.cc to be more permissive with numpy data type mismatches, or switch the expected outputs from numpy to binary files?

ScottTodd · 2024-04-02T17:15:47Z

iree_tests/conftest.py

+        self.atol=1e-05
+        self.rtol=1e-06


We can put comparison thresholds in the flagfiles themselves, rather than make that a property of the test runner

PhaneeshB · 2024-04-30T18:50:48Z

Alternate approach to keep using expected output added here #212

ScottTodd · 2024-05-22T17:09:50Z

Closing in favor of #212

PhaneeshB added 5 commits April 2, 2024 15:11

update onnx import to save test iree_outputs

e751519

add updated onnx import artefacts

1de50bb

update gitignore

442c2a3

update pytest to compare iree_results with golden results

0702930

update xfail tests list

df77c8d

PhaneeshB requested a review from ScottTodd April 2, 2024 16:27

ScottTodd requested changes Apr 2, 2024

View reviewed changes

ScottTodd mentioned this pull request Apr 2, 2024

Test case failures in iree-run-module with signed/unsigned/signless types iree-org/iree#16674

Closed

ScottTodd closed this May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Output Comparison for pytest #146

Update Output Comparison for pytest #146

PhaneeshB commented Apr 2, 2024 •

edited by ScottTodd

Loading

ScottTodd Apr 2, 2024

ScottTodd Apr 2, 2024

ScottTodd Apr 2, 2024

PhaneeshB commented Apr 30, 2024

ScottTodd commented May 22, 2024

Update Output Comparison for pytest #146

Update Output Comparison for pytest #146

Conversation

PhaneeshB commented Apr 2, 2024 • edited by ScottTodd Loading

ScottTodd Apr 2, 2024

Choose a reason for hiding this comment

ScottTodd Apr 2, 2024

Choose a reason for hiding this comment

ScottTodd Apr 2, 2024

Choose a reason for hiding this comment

PhaneeshB commented Apr 30, 2024

ScottTodd commented May 22, 2024

PhaneeshB commented Apr 2, 2024 •

edited by ScottTodd

Loading