Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Output Comparison for pytest #146

Closed
wants to merge 5 commits into from
Closed

Conversation

PhaneeshB
Copy link
Contributor

@PhaneeshB PhaneeshB commented Apr 2, 2024

  • This PR changes the result comparison method from iree-run-module comparison to numpy based comparison of iree_outputs and golden_outputs

Progress on iree-org/iree#16674

@PhaneeshB PhaneeshB requested a review from ScottTodd April 2, 2024 16:27
@@ -28,6 +28,7 @@
*.safetensors
*.gguf
*.vmfb
iree_tests/onnx/node/generated/test_*/iree_output_*.npy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iree_tests/ folder has its own .gitignore file to keep the top level file cleaner: https://github.com/nod-ai/SHARK-TestSuite/blob/main/iree_tests/.gitignore

Comment on lines +338 to +351
# TODO: add support for comparison of non numpy supported dtypes. using iree-run-module
# numerical error
self.test_numerical_accuracy()

def test_numerical_accuracy(self):
num_iree_output_files = len(list(self.test_cwd.glob("iree_output_*.npy")))
num_output_files = len(list(self.test_cwd.glob("output_*.npy")))
if num_iree_output_files != num_output_files:
raise AssertionError(f"Number of golden outputs ({num_output_files}) and iree outputs ({num_iree_output_files}) dont match")

for i in range(num_output_files):
iree_output = load((self.test_cwd / f"iree_output_{i}.npy"))
golden_output = load((self.test_cwd / f"output_{i}.npy"))
assert_allclose(iree_output, golden_output, atol=self.atol, rtol=self.rtol, equal_nan=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to continue using --expected_output, at least until we can prove that comparison in C++ is no longer sufficient.

  • We have data types that numpy does not support. For that we can use binary files instead of .npy

This style of testing aims to have as thin a test runner as possible, leaning mostly on the native tools themselves. Right now all the test runner does is

  1. discover test cases
  2. run native tools (iree-compile, iree-run-module) with flags
  3. check the return codes

By having a thin test runner with a narrow set of responsibilities, other test runner implementations are possible and results are easier to reproduce outside of the test environment.

  • Someone could write a ctest (or Bazel, or something else) test runner that uses these tools. Those runners may not have as direct access to Python utilities like numpy. We should also be able to run tests on systems without Python (Android, the web, etc.)
  • Test case reproducers are just commands to run. This changes that to be more complicated - "to reproduce, run iree-run-module ... then run this numpy code"

How about we first see if we can modify https://github.com/openxla/iree/blob/main/runtime/src/iree/tooling/comparison.cc to be more permissive with numpy data type mismatches, or switch the expected outputs from numpy to binary files?

Comment on lines +298 to +299
self.atol=1e-05
self.rtol=1e-06
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can put comparison thresholds in the flagfiles themselves, rather than make that a property of the test runner

@PhaneeshB
Copy link
Contributor Author

Alternate approach to keep using expected output added here #212

@ScottTodd
Copy link
Member

Closing in favor of #212

@ScottTodd ScottTodd closed this May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants