-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use arrow-csv #646
Use arrow-csv #646
Conversation
Yes! I think in general it's better to use arrow tools where possible, and add the spatial part on top, rather than relying on a row-based geozero and having to do the conversion to Arrow ourselves. |
Yeah... we have some known issues around the union types. E.g. I got stuck in #498 around something quite similar. I should check with @paleolimbot and @jorisvandenbossche how we want to handle union types canonically. Should we expect that GeoArrow Geometry arrays have all fields or just the minimal fields? I added a comment to geoarrow/geoarrow#43 (comment) about this. |
@kylebarron any movement on the union type issue? I ask because I'm running into this issue w/ parquet writing ATM. It looks like you have some skeleton code here: geoarrow-rs/src/io/parquet/writer/metadata.rs Lines 244 to 266 in ccc636c
Do you reckon that's valid, at least as a guess on where things will settle? |
I don't think that itself is a relevant part of the code. You're probably hitting union type issues somewhere else. Do you have a specific traceback? |
Yeah:
This is happening on a simple roundtrip test in stac-geoparquet (write a mixed geometry table to some bytes, then read it back). |
But can you remind me what line of code that's coming from? It's not in the writer code you linked to |
#714 that'll be a big PR to work on next week. |
Sweet ... guessing you've got it, but if need be a minimum reproducible example is in a draft PR here: #717. |
### Change list - Fix DataType creation to match the spec, with hard-coded type ids. - Don't include geoarrow metadata on underlying arrays when exporting to arrow-rs. Only include geoarrow metadata on top-level `geoarrow.geometry` array - We no longer need a `map` attribute in the struct because the ordering of the fields is guaranteed by the spec now. - Don't store underlying arrays under an `Option` Closes #717, closes #714 Unblocks #646 Progress towards #679
@gadomski can you try again from latest main? |
@kylebarron looks like it's fixed for me over at stac-geoparquet! I'll see if I can update this PR as well 🙇🏼 |
Closing in favor of #826 |
To get more familiar with things around here, I'm taking a stab at #613. Opening up this draft PR to ask two questions:
MixedGeometryArray::into_arrow
. Specifically,into_arrow
produces a "minimal" union (only the types actually present in the mixed array) whileChunkedMixedGeometryArray::data_type
produces a superset union of all possible types. I've provided some test output below for more info.Related issues
More information
The code in the PR works, but feels wrong to me. The "correct" code (to my naive eyes) produces schema mismatch.
Code and test output
Test output: