Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use only frozen 32bit roaring bitmaps and search for parents using prefix search vs storing every level. #2

Open
GavinClarke0 opened this issue Dec 5, 2024 · 5 comments

Comments

@GavinClarke0
Copy link
Owner

There are two issues with the current implementation that slow all steps down:

  1. We use frozen 64RoaringBitmaps which require processing to read and thus require de-serialization. Using a custom data structure that can be loaded directly and a different approach will help speed things up.
  2. We store all ancestors cells in the index. This is not needed as we can just search the prefix to determine if a cell is in the index vs storing every value.
@GavinClarke0
Copy link
Owner Author

Work is being preformed on branch https://github.com/GavinClarke0/RoaringGeoMaps/tree/development-v0.2.

The working copy is available on the branch.

Benchmarks are now as follow:

Write benchmark completed in 21436 ms.
Build benchmark completed in 6389 ms.
Init benchmark completed in 35 ms.
Mean query execution time: 86.4558 microseconds
99th percentile (p99) query execution time: 120 microseconds
Query benchmark completed in 1769 ms.

@GavinClarke0
Copy link
Owner Author

Merged pull request with this change here: #3.

Index can be optimized to avoid reading blocks to determine if a value is likely present. This will be of greater importance once index size grows and uses cases where blocks reside compressed on disk before processing emerge.

@GavinClarke0
Copy link
Owner Author

Interesting note:

As we need to query ranges for values, BloomFilters are not optimal. However this data structure https://github.com/christophanneser/FST/blob/master/test/test_fst_ints.cpp may allow us to index a blocks contents compactly without querying it.

@GavinClarke0
Copy link
Owner Author

GavinClarke0 commented Dec 20, 2024

Results from removing locality on writes.

Ordered Keys based s2 locality

100000 circles.
10m wide.

Write benchmark completed in 2717 ms.
Build benchmark completed in 1037 ms.
Init benchmark completed in 8 ms.
Mean query execution time: 36.4156 microseconds
99th percentile (p99) query execution time: 53 microseconds
Query benchmark completed in 748 ms. 20k executions. 

Unordered Keys based on insertion order

Write benchmark completed in 6851 ms.
Build benchmark completed in 1360 ms.
Init benchmark completed in 52 ms.
Mean query execution time: 163.42 microseconds
99th percentile (p99) query execution time: 439 microseconds
Query benchmark completed in 3293 ms. 20k executions

@GavinClarke0
Copy link
Owner Author

Locality Based Writes ( Keys are ordered by max S2 cell in their region cover which ensure keys near each other are placed next to each other in the index.


Bench Mark: [Circles: 100000] [Radius: 1000m]

Write benchmark completed in 2189 ms.
Build benchmark completed in 429 ms.
Init benchmark completed in 2 ms.
Mean query execution time: 527.453 microseconds
99th percentile (p99) query execution time: 858 microseconds

Random keys writes.


Bench Mark: [Circles: 100000] [Radius: 1000m]

Write benchmark completed in 5022 ms.
Build benchmark completed in 192 ms.
Init benchmark completed in 12 ms.
Mean query execution time: 2165.6 microseconds
99th percentile (p99) query execution time: 3343 microseconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant