Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method to check that two Dask Series have the same index #1112

Open
MarcoGorelli opened this issue Jul 27, 2024 · 1 comment
Open

Method to check that two Dask Series have the same index #1112

MarcoGorelli opened this issue Jul 27, 2024 · 1 comment

Comments

@MarcoGorelli
Copy link

Hi - I discussed this a bit with @phofl , as I'm aiming to have zero-cost-abstraction around Dask DataFrame in Narwhals

One thing I'd like to check is how to check that two Dask Series have the same index. Or, rather, that concatenating them would not result in any index alignment

Patrick pointed me to dask_expr._expr.are_co_aligned, which seemed to work great for me until I tried using __getitem__. Here's an example:

In [21]: df = dd.from_pandas(pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]}))

In [22]: dask_expr._expr.are_co_aligned(df._expr, df['a'][[1,2,0]]._expr)
Out[22]: True

This isn't quite what I was expecting - if I compute df['a'][[1,2,0]], then the index has indeed been shuffled with respect to df

Is this a bug in are_co_aligned? If not, is there another way to check that index alignment does not happen?

Thanks 🙏

@phofl
Copy link
Collaborator

phofl commented Jul 27, 2024

are_co_aligned only check with respect to the partitions being properly aligned, not the actual values within the partition.

That said, most methods that change the index values are also returning false for are co aligned.

I don’t have a good solution for this of the top of my head unfortunately, let me think about this a little. are_co_aligned is a bit weaker than what you are looking for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants