Closes #3329: hstack to match numpy #4105
Open
+477
−63
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since hstack uses concatenate in numpy, I redid concatenate on server side. It seems to work with normal pdarrays but not Strings or Categorical. I made some modifications on other things that also rely on ConcatenateMsg.chpl.
I also updated vstack because I somewhat doubt that functioned properly before. I noticed that the code where the vstack and delete tests were (tests/array_manipulation_tests.py) was not included in the pytest.ini, so I added it. The tests for delete failed, but I didn't touch it, so I suspect that was ongoing prior to my adding it.
While trying to clean things up a bit, I noticed some interesting behavior. Some things rely on the concatenate with ordered=False (meaning that when concatenating two arrays, the result may have the two arrays mixed together, rather than a whole block where you have the first, followed by everything in the second). Specifically, when comparing two Categorical objects together (i.e. for equality), it would concatenate the Strings of categories and also two aranges (which were, I guess, meant to associate the categories to numbers). Basically, it seemed to rely on the fact that they would get concatenated in the same way. This seems very sketchy to me. At some point we should probably further investigate set_categories in arkouda/categorical.py and the functions it calls. It takes a shockingly long time to test equality on two Categoricals with three things, but different categories. Definitely feels like optimizations are possible. Also ConcatenateMsg.chpl could probably benefit from optimization for Strings.
Closes #3329: hstack to match numpy