Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow compatible mapped values in __setitem__ #7

Merged
merged 3 commits into from
Jun 10, 2024

Conversation

SimonHeybrock
Copy link
Member

This was not implemented initially for simplicity, but downstream use showed that is it essential.


@staticmethod
def try_from(obj: Any, *, axis_zero: int = 0) -> ScippDataArrayAdapter | None:
try:
import scipp

if isinstance(obj, scipp.Variable):
return ScippDataArrayAdapter(scipp.DataArray(obj))
return ScippDataArrayAdapter(scipp.DataArray(obj), scipp=scipp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit awkward that we have to pass the module as an arg everywhere...
Was it to avoid importing scipp in multiple places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, making the code cleaner.

Comment on lines +756 to +759
{
'a': sc.array(dims=['x'], values=[1, 2, 3]),
'b': sc.array(dims=['x'], values=[11, 12, 13]),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm a little confused because the name above is a ScippDataArrayAdapter, so I thought it would be a sc.DataArray, like the pandas DataFrame above.
But I guess you just need a mapping of key to ArrayLike? Does this mean that you never use the .data in the DataArray?
I assume you need a structure that can be sliced/indexed. Could it be a DataGroup instead of a DataArray internally?

I realize this is besides the point of the current PR...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScippDataArrayAdapter also handles scipp.Variable, interpreting the latter as an data array without coords.

I assume you need a structure that can be sliced/indexed. Could it be a DataGroup instead of a DataArray internally?

Using a dictcurrently, could in principle add support for Dataset and DataGroup. But not "instead of": The DataArray holds the values for a single node, the dict (or DataGroup, ...) maps to multiple nodes, just like the columns of pandas.DataFrame vs. its rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I see now


graph = cb.Graph(g)
mapped = graph.map(node_values).reduce('c', name='d')
mapped['x'] = mapped['d']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the 'x' here is unrelated to the 'x' dimension in

        {
            'a': sc.array(dims=['x'], values=[1, 2, 3]),
            'b': sc.array(dims=['x'], values=[11, 12, 13]),
        }

above.

If so, can we use a different name here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking because at first it confused me, as I thought they were related.

g.add_edge('a', 'b')
graph = cb.Graph(g)
mapped1 = graph.map({'a': [1, 2]}).reduce('b', name='d')
mapped2 = graph.map({'a': sc.array(dims=('x',), values=[1, 2])}).reduce(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why this is different from the above? In mapped1 you have [1, 2] and in mapped2 a Scipp Variable. Isn't it the same as with the numpy array, that the types are different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it actually raises for different reasons, this test is to ensure that both code paths work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question was why is it raising for a different reason, I would have expected to raise with the same reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an artifact of the usual problem of having two checks in a particular order, so depending on the exact properties of the input you get one exception or the other.

@nvaytet nvaytet self-assigned this Jun 6, 2024
@jl-wynen
Copy link
Member

jl-wynen commented Jun 7, 2024

If you still want my review, can you explain what this PR is supposed to do? This is hard to tell from the diff alone.

Base automatically changed from fix-ancestor-removal-logic to main June 10, 2024 03:18
@SimonHeybrock
Copy link
Member Author

If you still want my review, can you explain what this PR is supposed to do? This is hard to tell from the diff alone.

It allows compatible mapped values in __setitem__. Previously it was impossible to use this it the graph being set, originated, e.g., from a subgraph of the same graph, and map was applied beforehand. This shows up in practice, e.g., when mapping a reduction workflow graph over a filename, with subsequent reduce calls in several places, to be recombined into a single graph.

@pytest.mark.parametrize(
'node_values',
[
{'a': [1, 2, 3], 'b': [11, 12, 13]},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that {'a': [1, 2, 3]} is sufficient to make the test fail with the code on main. So there seems to be no need to have multiple mapped nodes to test this behaviour. Can you add a test that only maps one node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

src/cyclebane/node_values.py Outdated Show resolved Hide resolved
src/cyclebane/node_values.py Outdated Show resolved Hide resolved
@SimonHeybrock SimonHeybrock merged commit 3b67e2d into main Jun 10, 2024
3 checks passed
@SimonHeybrock SimonHeybrock deleted the allow-compat-indicies-and-values-insetitem branch June 10, 2024 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants