Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #3329: hstack to match numpy #4105

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

1RyanK
Copy link
Collaborator

@1RyanK 1RyanK commented Feb 21, 2025

Since hstack uses concatenate in numpy, I redid concatenate on server side. It seems to work with normal pdarrays but not Strings or Categorical. I made some modifications on other things that also rely on ConcatenateMsg.chpl.

I also updated vstack because I somewhat doubt that functioned properly before. I noticed that the code where the vstack and delete tests were (tests/array_manipulation_tests.py) was not included in the pytest.ini, so I added it. The tests for delete failed, but I didn't touch it, so I suspect that was ongoing prior to my adding it.

While trying to clean things up a bit, I noticed some interesting behavior. Some things rely on the concatenate with ordered=False (meaning that when concatenating two arrays, the result may have the two arrays mixed together, rather than a whole block where you have the first, followed by everything in the second). Specifically, when comparing two Categorical objects together (i.e. for equality), it would concatenate the Strings of categories and also two aranges (which were, I guess, meant to associate the categories to numbers). Basically, it seemed to rely on the fact that they would get concatenated in the same way. This seems very sketchy to me. At some point we should probably further investigate set_categories in arkouda/categorical.py and the functions it calls. It takes a shockingly long time to test equality on two Categoricals with three things, but different categories. Definitely feels like optimizations are possible. Also ConcatenateMsg.chpl could probably benefit from optimization for Strings.

Closes #3329: hstack to match numpy

@1RyanK
Copy link
Collaborator Author

1RyanK commented Feb 26, 2025

I keep getting arkouda/pdarraysetops.py:427: error: Cannot determine type of "shape" [has-type] approximately on the line shape1 = arrays[i - 1].shape. Wherever the first reference is to .shape mypy decides that it doesn't know what type that is, and that that's a problem. However, there is no problem in hstack's version. I tried printing out some info about the shape and got this:

(3,)
<class 'tuple'>
<class 'int'>

Note that I printed out the shape, type(shape), and then type(shape[0]).
I would greatly appreciate help resolving this and then after I add some tests for hstack and vstack it should be all set to be merged in.
Another thing I wasn't sure about came from here: #2802
Should I set max_bits to the minimum or maximum of the positive max_bits across the input arrays? In the example there's only one. I currently have it set to minimum, which would throw away bits. Kind of like in significant figures you don't pretend you have precision that you don't actually have.

@1RyanK 1RyanK marked this pull request as ready for review February 27, 2025 17:26
@1RyanK 1RyanK changed the title Draft PR: will close #3329 hstack to match numpy Closes #3329: hstack to match numpy Feb 27, 2025
@1RyanK 1RyanK added this to the Numpy Alignment milestone Feb 27, 2025
…for both. Created new

concatenate on server side because the old one didn't allow for
multi-dimensional things.
@1RyanK 1RyanK force-pushed the 3329_hstack_to_match_numpy branch from 9e355de to 4f878b9 Compare February 27, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hstack to match numpy
1 participant