Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coord buffer refactor #844

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Coord buffer refactor #844

wants to merge 3 commits into from

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Oct 30, 2024

Total bounds is 30% slower on this branch (on ns_water_line from https://geoarrow.org/data):

total_bounds            time:   [47.531 ms 47.877 ms 48.375 ms]
                        change: [+28.606% +33.051% +36.096%] (p = 0.00 < 0.05)
                        Performance has regressed.

But area is 20% faster on this branch:

area                    time:   [43.739 µs 43.991 µs 44.328 µs]
                        change: [-22.455% -19.528% -17.738%] (p = 0.00 < 0.05)
                        Performance has improved.

This is a big PR but the new CoordBuffer is defined here:

/// A GeoArrow coordinate buffer
#[derive(Clone, Debug)]
pub struct CoordBuffer {
/// We always store 4 buffers in an array, but not all 4 of these may have valid coordinates.
/// The number of valid buffers is stored in `num_buffers` and matches the physical size of the
/// dimension. For example, `XY` coordinates will only have two valid buffers, in slots `0` and
/// `1`. `XYZ` and `XYM` will have three valid buffers and `XYZM` will have four valid buffers.
///
/// In the case of interleaved coordinates, each slot will be a clone of the same
/// reference-counted buffer.
pub(crate) buffers: [ScalarBuffer<f64>; 4],
/// The number of coordinates in this buffer
pub(crate) num_coords: usize,
/// The number of valid buffers in the buffers array. (i.e., number of physical dimensions)
/// TODO: unsure if this is needed since we also store the logical dimension? Maybe still
/// faster than doing the Dimension enum lookup on coord access.
pub(crate) num_buffers: usize,
/// The number of elements to advance a given value pointer to the next ordinate.
///
/// - For interleaved coordinates, `coords_stride` will equal `num_buffers`.
/// - For struct coordinates, `coords_stride` will be 1.
pub(crate) coords_stride: usize,
/// The coordinate type of this buffer (interleaved or separated).
pub(crate) coord_type: CoordType,
/// The dimension of this buffer (e.g. `XY`, `XYZ`).
pub(crate) dim: Dimension,
}

and the new coordinate access is now here:

impl<'a> CoordTrait for Coord<'a> {
type T = f64;
fn dim(&self) -> geo_traits::Dimensions {
self.buffer.dim.into()
}
fn nth_unchecked(&self, n: usize) -> Self::T {
self.buffer.buffers[n][self.i * self.buffer.coords_stride + n]
}
fn x(&self) -> Self::T {
self.buffer.buffers[0][self.i * self.buffer.coords_stride]
}
fn y(&self) -> Self::T {
self.buffer.buffers[1][self.i * self.buffer.coords_stride + 1]
}
}

///
/// In the case of interleaved coordinates, each slot will be a clone of the same
/// reference-counted buffer.
pub(crate) buffers: [ScalarBuffer<f64>; 4],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't scrutinized the entire PR, but I'd expect any big perf wins from using generics to be here - doing something like [ScalarBuffer<f64>; T]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more background, this was a port of @paleolimbot's C implementation here and here.

In GeoArrow we have two ways to store coordinate sequences. Either interleaved in one big Vec with all dimensions or separated with a separate vec for each dimension.

Currently GeoArrow uses an enum CoordBuffer with Interleaved and Separated buffers as variants. This means that there's an enum dispatch on every single coordinate access. In theory, this should be slow, or at least slower than not having an enum dispatch, I think.

@paleolimbot's C implementation above is able to do this without branching in a clever way. It always stores 4 underlying buffers, but not all 4 of these may have valid coordinates. For interleaved coordinates, it'll store a reference to the same buffer in each position.

Note that here ScalarBuffer wraps Buffer, and Buffer has an Arc under the hood. So storing four of the same cloned buffer should be cheap.

And then here's the meat for coordinate access:

impl<'a> CoordTrait for Coord<'a> {
type T = f64;
fn dim(&self) -> geo_traits::Dimensions {
self.buffer.dim.into()
}
fn nth_unchecked(&self, n: usize) -> Self::T {
self.buffer.buffers[n][self.i * self.buffer.coords_stride + n]
}
fn x(&self) -> Self::T {
self.buffer.buffers[0][self.i * self.buffer.coords_stride]
}
fn y(&self) -> Self::T {
self.buffer.buffers[1][self.i * self.buffer.coords_stride + 1]
}
}

It no longer needs any branching, regardless of whether the underlying Arrow storage is interleaved or separated, by using these "cloned references" and a stride.

I assumed this would be faster, but then the total_bounds bench was reliably slower on this PR. Which makes me guess that maybe rustc/llvm is able to do effective branch prediction when it repeatedly calls the same enum dispatch within a single array?

kylebarron added a commit that referenced this pull request Nov 14, 2024
In discussion with Dewey around this PR
#844, I realized that
regardless of whether we remove the underlying `CoordBuffer` enum, we
can remove the const generic on the CoordBuffer to determine its
physical dimension. Even if this is slightly slower, I think it's very
important for maintainability.

I realize now that this is what b4l was trying to argue for in
#822, but I couldn't see
what he meant there without an example.

For #822, for
#801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants