Coord buffer refactor #844

kylebarron · 2024-10-30T04:26:16Z

Total bounds is 30% slower on this branch (on ns_water_line from https://geoarrow.org/data):

total_bounds            time:   [47.531 ms 47.877 ms 48.375 ms]
                        change: [+28.606% +33.051% +36.096%] (p = 0.00 < 0.05)
                        Performance has regressed.

But area is 20% faster on this branch:

area                    time:   [43.739 µs 43.991 µs 44.328 µs]
                        change: [-22.455% -19.528% -17.738%] (p = 0.00 < 0.05)
                        Performance has improved.

This is a big PR but the new CoordBuffer is defined here:

geoarrow-rs/rust/geoarrow/src/array/coord/array.rs

Lines 16 to 47 in 9e6a953

    
           /// A GeoArrow coordinate buffer 
        
           #[derive(Clone, Debug)] 
        
           pub struct CoordBuffer { 
        
               /// We always store 4 buffers in an array, but not all 4 of these may have valid coordinates. 
        
               /// The number of valid buffers is stored in `num_buffers` and matches the physical size of the 
        
               /// dimension. For example, `XY` coordinates will only have two valid buffers, in slots `0` and 
        
               /// `1`. `XYZ` and `XYM` will have three valid buffers and `XYZM` will have four valid buffers. 
        
               /// 
        
               /// In the case of interleaved coordinates, each slot will be a clone of the same 
        
               /// reference-counted buffer. 
        
               pub(crate) buffers: [ScalarBuffer<f64>; 4], 
        
               /// The number of coordinates in this buffer 
        
               pub(crate) num_coords: usize, 
        
               /// The number of valid buffers in the buffers array. (i.e., number of physical dimensions) 
        
               /// TODO: unsure if this is needed since we also store the logical dimension? Maybe still 
        
               /// faster than doing the Dimension enum lookup on coord access. 
        
               pub(crate) num_buffers: usize, 
        
               /// The number of elements to advance a given value pointer to the next ordinate. 
        
               /// 
        
               /// - For interleaved coordinates, `coords_stride` will equal `num_buffers`. 
        
               /// - For struct coordinates, `coords_stride` will be 1. 
        
               pub(crate) coords_stride: usize, 
        
               /// The coordinate type of this buffer (interleaved or separated). 
        
               pub(crate) coord_type: CoordType, 
        
               /// The dimension of this buffer (e.g. `XY`, `XYZ`). 
        
               pub(crate) dim: Dimension, 
        
           }

and the new coordinate access is now here:

geoarrow-rs/rust/geoarrow/src/scalar/coord/scalar.rs

Lines 20 to 38 in 9e6a953

    
           impl<'a> CoordTrait for Coord<'a> { 
        
               type T = f64; 
        
               fn dim(&self) -> geo_traits::Dimensions { 
        
                   self.buffer.dim.into() 
        
               } 
        
               fn nth_unchecked(&self, n: usize) -> Self::T { 
        
                   self.buffer.buffers[n][self.i * self.buffer.coords_stride + n] 
        
               } 
        
               fn x(&self) -> Self::T { 
        
                   self.buffer.buffers[0][self.i * self.buffer.coords_stride] 
        
               } 
        
               fn y(&self) -> Self::T { 
        
                   self.buffer.buffers[1][self.i * self.buffer.coords_stride + 1] 
        
               } 
        
           }

michaelkirk · 2024-10-30T20:22:58Z

rust/geoarrow/src/array/coord/array.rs

+    ///
+    /// In the case of interleaved coordinates, each slot will be a clone of the same
+    /// reference-counted buffer.
+    pub(crate) buffers: [ScalarBuffer<f64>; 4],


I haven't scrutinized the entire PR, but I'd expect any big perf wins from using generics to be here - doing something like [ScalarBuffer<f64>; T]

For more background, this was a port of @paleolimbot's C implementation here and here.

In GeoArrow we have two ways to store coordinate sequences. Either interleaved in one big Vec with all dimensions or separated with a separate vec for each dimension.

Currently GeoArrow uses an enum CoordBuffer with Interleaved and Separated buffers as variants. This means that there's an enum dispatch on every single coordinate access. In theory, this should be slow, or at least slower than not having an enum dispatch, I think.

@paleolimbot's C implementation above is able to do this without branching in a clever way. It always stores 4 underlying buffers, but not all 4 of these may have valid coordinates. For interleaved coordinates, it'll store a reference to the same buffer in each position.

Note that here ScalarBuffer wraps Buffer, and Buffer has an Arc under the hood. So storing four of the same cloned buffer should be cheap.

And then here's the meat for coordinate access:

geoarrow-rs/rust/geoarrow/src/scalar/coord/scalar.rs

Lines 20 to 38 in 9e6a953

impl<'a> CoordTrait for Coord<'a> {

type T = f64;

fn dim(&self) -> geo_traits::Dimensions {

self.buffer.dim.into()

}

fn nth_unchecked(&self, n: usize) -> Self::T {

self.buffer.buffers[n][self.i * self.buffer.coords_stride + n]

}

fn x(&self) -> Self::T {

self.buffer.buffers[0][self.i * self.buffer.coords_stride]

}

fn y(&self) -> Self::T {

self.buffer.buffers[1][self.i * self.buffer.coords_stride + 1]

}

}

It no longer needs any branching, regardless of whether the underlying Arrow storage is interleaved or separated, by using these "cloned references" and a stride.

I assumed this would be faster, but then the total_bounds bench was reliably slower on this PR. Which makes me guess that maybe rustc/llvm is able to do effective branch prediction when it repeatedly calls the same enum dispatch within a single array?

In discussion with Dewey around this PR #844, I realized that regardless of whether we remove the underlying `CoordBuffer` enum, we can remove the const generic on the CoordBuffer to determine its physical dimension. Even if this is slightly slower, I think it's very important for maintainability. I realize now that this is what b4l was trying to argue for in #822, but I couldn't see what he meant there without an example. For #822, for #801

kylebarron added 3 commits October 29, 2024 23:43

Coord buffer refactor

5b00fe0

Add total_bounds bench

ec67b67

Fix buffer access

9e6a953

michaelkirk reviewed Oct 30, 2024

View reviewed changes

kylebarron mentioned this pull request Nov 14, 2024

Remove const generic from coord buffers #845

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coord buffer refactor #844

Coord buffer refactor #844

kylebarron commented Oct 30, 2024 •

edited

Loading

michaelkirk Oct 30, 2024

kylebarron Oct 30, 2024

	/// A GeoArrow coordinate buffer
	#[derive(Clone, Debug)]
	pub struct CoordBuffer {
	/// We always store 4 buffers in an array, but not all 4 of these may have valid coordinates.
	/// The number of valid buffers is stored in `num_buffers` and matches the physical size of the
	/// dimension. For example, `XY` coordinates will only have two valid buffers, in slots `0` and
	/// `1`. `XYZ` and `XYM` will have three valid buffers and `XYZM` will have four valid buffers.
	///
	/// In the case of interleaved coordinates, each slot will be a clone of the same
	/// reference-counted buffer.
	pub(crate) buffers: [ScalarBuffer<f64>; 4],

	/// The number of coordinates in this buffer
	pub(crate) num_coords: usize,

	/// The number of valid buffers in the buffers array. (i.e., number of physical dimensions)
	/// TODO: unsure if this is needed since we also store the logical dimension? Maybe still
	/// faster than doing the Dimension enum lookup on coord access.
	pub(crate) num_buffers: usize,

	/// The number of elements to advance a given value pointer to the next ordinate.
	///
	/// - For interleaved coordinates, `coords_stride` will equal `num_buffers`.
	/// - For struct coordinates, `coords_stride` will be 1.
	pub(crate) coords_stride: usize,

	/// The coordinate type of this buffer (interleaved or separated).
	pub(crate) coord_type: CoordType,

	/// The dimension of this buffer (e.g. `XY`, `XYZ`).
	pub(crate) dim: Dimension,
	}

	impl<'a> CoordTrait for Coord<'a> {
	type T = f64;

	fn dim(&self) -> geo_traits::Dimensions {
	self.buffer.dim.into()
	}

	fn nth_unchecked(&self, n: usize) -> Self::T {
	self.buffer.buffers[n][self.i * self.buffer.coords_stride + n]
	}

	fn x(&self) -> Self::T {
	self.buffer.buffers[0][self.i * self.buffer.coords_stride]
	}

	fn y(&self) -> Self::T {
	self.buffer.buffers[1][self.i * self.buffer.coords_stride + 1]
	}
	}

Coord buffer refactor #844

Are you sure you want to change the base?

Coord buffer refactor #844

Conversation

kylebarron commented Oct 30, 2024 • edited Loading

michaelkirk Oct 30, 2024

Choose a reason for hiding this comment

kylebarron Oct 30, 2024

Choose a reason for hiding this comment

kylebarron commented Oct 30, 2024 •

edited

Loading