Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We don't know where ufuncs are from! #70

Open
saulshanabrook opened this issue Aug 18, 2020 · 6 comments
Open

We don't know where ufuncs are from! #70

saulshanabrook opened this issue Aug 18, 2020 · 6 comments
Labels
bug Something isn't working

Comments

@saulshanabrook
Copy link
Contributor

During the tracing, it's helpful to know not only which methods on the ufuncs class are called (__call__, reduce, etc) but also which ufuncs themselves are used (add, multiple, etc).

Currently, we are presenting the results of those, not as the product of those two features, but as their union. i.e. we should stats for the reduce method on the ufunc class, but we don't show how many times reduce was called on add vs multiple. That's one "issue", but the other more pressing one is we don't know where ufuncs come from!

All we know is their names. Up until now, I had been assuming they are all defined in the numpy module. However, scipy for example has many that are not.

We should somehow figure out how to understand where they were defined, or what module they were imported from.

I guess to do this, we would have to do some kind of traversal of imported modules, to understand where they are defined? This also could be helpful for the related problem of recording, which module, exports a certain type instead of which module it was defined in.

@saulshanabrook saulshanabrook added the bug Something isn't working label Aug 18, 2020
@saulshanabrook
Copy link
Contributor Author

Another issue, that I touch on briefly, is that we don't differentiate between the call signatures for different ufuncs.

The issue is really that we are trying to represent the calls to different ufunc types. So we want to say something like:

"When the ufunc name is "sin" we called "call" with these args".

But how do we write a type definition for that? How do we represent that in our current type hierarchy, where we talk about classes and methods?

@saulshanabrook
Copy link
Contributor Author

i.e. if we look at the typing for ufuncs, they are similar to the ones we generate, but are not seperated by ufunc name: https://github.com/numpy/numpy-stubs/pull/44/files#diff-542b8065c42915076a70d8a091c6f08c

@Zac-HD
Copy link

Zac-HD commented Sep 16, 2020

We should somehow figure out how to understand where [ufuncs] were defined, or what module they were imported from.

This is tricky when ufunc objects don't have a __module__ attribute! The best solution we've got so far in Hypothesis is to just check known modules which might define it... this actually works pretty well, but it would be nice to have a more principled way to do it.

@saulshanabrook
Copy link
Contributor Author

@Zac-HD Ha yeah, we will have to do something similar, hopefully in some way that isn't hard coded though ideally.

Since we already have tracing, I think we can do this, just looking at all imports or getattr (on modules) bytecode executions to see when they return a ufunc, then we know where it came from.

I am curious, what do you use the modules for?

@Zac-HD
Copy link

Zac-HD commented Sep 17, 2020

I needed module names to write import statements for the Hypothesis Ghostwriter, which outputs the source code for a property-based test! Actual output example:

$ hypothesis write numpy.matmul
import hypothesis.extra.numpy as npst
import numpy
from hypothesis import given, strategies as st

@given(
    data=st.data(),
    shapes=npst.mutually_broadcastable_shapes(signature="(n?,k),(k,m?)->(n?,m?)"),
    types=st.sampled_from(numpy.matmul.types).filter(lambda sig: "O" not in sig),
)
def test_gufunc_matmul(data, shapes, types):
    input_shapes, expected_shape = shapes
    input_dtypes, expected_dtype = types.split("->")
    array_st = [npst.arrays(d, s) for d, s in zip(input_dtypes, input_shapes)]

    a, b = data.draw(st.tuples(*array_st))
    result = numpy.matmul(a, b)
    assert result.shape == expected_shape
    assert result.dtype.char == expected_dtype

@saulshanabrook
Copy link
Contributor Author

saulshanabrook commented Sep 17, 2020

@Zac-HD Wow, that is amazing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants