-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coord buffer refactor #844
base: main
Are you sure you want to change the base?
Conversation
/// | ||
/// In the case of interleaved coordinates, each slot will be a clone of the same | ||
/// reference-counted buffer. | ||
pub(crate) buffers: [ScalarBuffer<f64>; 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't scrutinized the entire PR, but I'd expect any big perf wins from using generics to be here - doing something like [ScalarBuffer<f64>; T]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more background, this was a port of @paleolimbot's C implementation here and here.
In GeoArrow we have two ways to store coordinate sequences. Either interleaved
in one big Vec with all dimensions or separated
with a separate vec for each dimension.
Currently GeoArrow uses an enum CoordBuffer with Interleaved and Separated buffers as variants. This means that there's an enum dispatch on every single coordinate access. In theory, this should be slow, or at least slower than not having an enum dispatch, I think.
@paleolimbot's C implementation above is able to do this without branching in a clever way. It always stores 4 underlying buffers, but not all 4 of these may have valid coordinates. For interleaved coordinates, it'll store a reference to the same buffer in each position.
Note that here ScalarBuffer
wraps Buffer
, and Buffer
has an Arc
under the hood. So storing four of the same cloned buffer should be cheap.
And then here's the meat for coordinate access:
geoarrow-rs/rust/geoarrow/src/scalar/coord/scalar.rs
Lines 20 to 38 in 9e6a953
impl<'a> CoordTrait for Coord<'a> { | |
type T = f64; | |
fn dim(&self) -> geo_traits::Dimensions { | |
self.buffer.dim.into() | |
} | |
fn nth_unchecked(&self, n: usize) -> Self::T { | |
self.buffer.buffers[n][self.i * self.buffer.coords_stride + n] | |
} | |
fn x(&self) -> Self::T { | |
self.buffer.buffers[0][self.i * self.buffer.coords_stride] | |
} | |
fn y(&self) -> Self::T { | |
self.buffer.buffers[1][self.i * self.buffer.coords_stride + 1] | |
} | |
} |
It no longer needs any branching, regardless of whether the underlying Arrow storage is interleaved or separated, by using these "cloned references" and a stride.
I assumed this would be faster, but then the total_bounds
bench was reliably slower on this PR. Which makes me guess that maybe rustc/llvm is able to do effective branch prediction when it repeatedly calls the same enum dispatch within a single array?
In discussion with Dewey around this PR #844, I realized that regardless of whether we remove the underlying `CoordBuffer` enum, we can remove the const generic on the CoordBuffer to determine its physical dimension. Even if this is slightly slower, I think it's very important for maintainability. I realize now that this is what b4l was trying to argue for in #822, but I couldn't see what he meant there without an example. For #822, for #801
Total bounds is 30% slower on this branch (on
ns_water_line
from https://geoarrow.org/data):But area is 20% faster on this branch:
This is a big PR but the new CoordBuffer is defined here:
geoarrow-rs/rust/geoarrow/src/array/coord/array.rs
Lines 16 to 47 in 9e6a953
and the new coordinate access is now here:
geoarrow-rs/rust/geoarrow/src/scalar/coord/scalar.rs
Lines 20 to 38 in 9e6a953