-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: parametrize over Array API modules #298
base: master
Are you sure you want to change the base?
Conversation
f742ae2
to
429b32a
Compare
21bed6b
to
cf262c8
Compare
Keep testing with array-api-strict, add numpy, torch and jax.numpy
cf262c8
to
9f499ef
Compare
This seems to be conceptually wrong. Now, what do we actually want to be testing here:
I'm sure this was discussed before, so am going to stop for now :-). EDIT: or maybe something like this can be useful for reporting, if we ever want to autogenerate compatibility reports for bare array api namespaces. |
The thing we really want to test here is that the tests themselves don't break whenever we update them or write new ones. This is especially important if we want to do some more serious refactorings here. The test suite is a little odd because we often will "break" upstream by adding a test that fails, because the upstream library is not conformant but the thing was just never tested before. But we don't want to "break" upstream by adding a test that's actually wrong, or testing something in a way that isn't really actually required by the standard (an example would be a floating-point closeness test that is too strict). So when it comes down to it, what should go on CI here really just comes down to developer convenience. The test suite will be run upstream eventually anyways, by the upstreams themselves (and compat is probably the primary upstream since it tests 4 different libraries). On the one hand, testing more libraries here will catch more things. But on the other hand, we have to maintain a set of skips/xfails here to effectively do that, because many failures are expected. This is the case even for compat. We could reuse the xfail files in the compat repo, although that would effectively amount to just duplicating the CI here from the compat repo. The benefit of that would be catching failures faster. But also consider that this repo doesn't have releases. If something breaks, you can very easily revert it. The strict library is nice because it has almost no xfails. Plan NumPy 2.1 is like that as well, although that could change whenever new functions get added to the standard. For now, I've made heavy use of local testing against various libraries when doing development here. If you feel that having that on CI would be better, then you should set it up. It really comes down to what's the most convenient. That would mean either maintaining a set of xfails, or just ignoring failures entirely and just using the CI as a vehicle to run the tests. You definitely should check any updated test against the major libraries one way or the other. |
Regarding reporting, there was a lot of work on this previously (see reporting.py), and there was much discussion about whether the tests should be run here or in the upstream projects. The idea at the time was to run the tests in the upstream projects and have them upload reports. That work will probably start up again at some point. For what it's worth, I feel now that the approach of running the reporting tests in the upstream repos was getting too complicated, and so I would now suggest that just running the reporting tests here or in some other data-apis controlled repo would be a much better approach. |
towards #235