Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial Index improvements (index-per-graph + kryo) #3026

Open
Aklakan opened this issue Feb 21, 2025 · 0 comments · May be fixed by #3027
Open

Spatial Index improvements (index-per-graph + kryo) #3026

Aklakan opened this issue Feb 21, 2025 · 0 comments · May be fixed by #3027
Labels
enhancement Incrementally add new feature

Comments

@Aklakan
Copy link
Contributor

Aklakan commented Feb 21, 2025

Version

5.4.0-SNAPSHOT

Feature

This proposal is to enhance the spatial index with support for index-per-graph as well as to improve its serialization using kryo - via Apache Sedona's kryo/jts implementation.

This is an incremental improvement of the existing JTS-based in-memory implementation - its not a complete overhaul such as a disk-based incrementally updated transaction-aware R-tree (if someone contributed that then this issue's PR could be discarded 😄 ).

The impact of this work have been evaluated and presented at the GeoLD workshop last year proceedings:

Simon Bin, Claus Stadler, Lorenz Bühmann, and Michael Martin
Getting practical with GeoSPARQL and Apache Jena
Slides

The essence is presented on the following slides:

Using an index per graph (unsurprisingly) boosts the performance when multiple graphs have geometries and only a subset is queried (slide 15):

Image

As for serialization performance (slide 16), while index building became a bit slower, this is outweighed by near-instant loading of the spatial index. The reason for the writing overhead is, that the index tree is now serialized as a tree - before, the items were written out as a flat list, and the tree had to be rebuilt from scratch on restart.

Image

A new geosparql:indexPerGraph option (boolean) is added to the geosparql:GeosparqlDataset assembler.

The implementation has been mainly done by @LorenzBuehmann - the writing and presentation is the work of @SimonBin - I supported in evaluation.

As for compatibility, I need to check for whether it is backward compatible but I think due to the change of the serializer, existing spatial indexes would have to be rebuilt.

For reference, a bit of related discussion has happened in #2645.

Are you interested in contributing a solution yourself?

Yes

@Aklakan Aklakan added the enhancement Incrementally add new feature label Feb 21, 2025
@Aklakan Aklakan linked a pull request Feb 21, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Incrementally add new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant