Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Geofence Reducer Performance ⚡ #15

Merged
merged 3 commits into from
Jan 17, 2025

Conversation

InfoLorenzo
Copy link
Collaborator

@InfoLorenzo InfoLorenzo commented Dec 5, 2024

This PR refactors the geofence reducer library by replacing the substring-based uniqueness check with a frequency-based algorithm, improving the efficiency of geohash processing while maintaining input-output consistency.

Summary of Results

Precision Time (Original) Time (Alternative) Speedup Factor Number of Geofences
6 569,936.63 ms 1,212.32 ms ~470.12x 2,226
5 20,601.51 ms 49.96 ms ~412.33x 2,028
4 2,002.47 ms 6.13 ms ~326.91x 2,118

For additional details and extended results, see the Geofence Reducer Library Improvement document.

@CLAassistant
Copy link

CLAassistant commented Dec 5, 2024

CLA assistant check
All committers have signed the CLA.

@filipecorrea
Copy link
Contributor

Hi @InfoLorenzo. Can you share the performance tests and results in this PR description?

@InfoLorenzo
Copy link
Collaborator Author

Hi @InfoLorenzo. Can you share the performance tests and results in this PR description?

Hi @filipecorrea, It’s great to hear from you again! I’m currently on holiday, but I’ll post updates as soon as I’m back.
Thanks for getting involved, I really appreciate it!

@InfoLorenzo
Copy link
Collaborator Author

Hey @filipecorrea,

While testing, I noticed that the same geohash sometimes appears multiple times for a single geofence, which is causing discrepancies in my frequency map. I also checked the vicinityhash library and saw the same geohash show up four times for a single geofence at precision 5.

Which is the reason behind why we're getting these duplicates when “compress” is false? Could it be related to how we mirror coordinates in different quadrants, or is there another reason?

I would love to hear your thoughts on this. Thank you in advance!

@InfoLorenzo
Copy link
Collaborator Author

Hi @filipecorrea,

Just a quick update!

  • Added the performance tests. There’s also a document with the results to explain the changes and shows some results.
  • Updated the unit test to check that the input and output geohashes cover the same areas with the same precision. This makes sure everything is correct.
  • Added a uniquify step to handle the duplicate geohashes from vicinityhash for now, ensuring the results remain consistent.

Let me know what you think when you have time! 😊

@filipecorrea
Copy link
Contributor

Maybe there's an issue with vicinityhash compression part, because it's intent is to remove the geohashes duplicates and use a single, broader, geohash to represent the same area. Can you take a look and see if you can find the issue? I think no one reviewed this code so far and I really appreciate if you could do so.

@filipecorrea
Copy link
Contributor

I can't access your document, unfortunately, but I think I got the picture by reading the code change.

I can't see your updates on the unit tests though. Are you sure you pushed it?

@InfoLorenzo
Copy link
Collaborator Author

InfoLorenzo commented Jan 16, 2025

Maybe there's an issue with vicinityhash compression part, because it's intent is to remove the geohashes duplicates and use a single, broader, geohash to represent the same area. Can you take a look and see if you can find the issue? I think no one reviewed this code so far and I really appreciate if you could do so.

Sure! I did some testing, and the compression works quite well overall. The only issue is that when compression is set to false, the library can sometimes return the same geohash a few times for the same geofence. This happens because we don’t use a Set to handle the geohashes in this case, unlike when compression is set to true.

One potential fix could be to use a Set even when compression is false. However, based on my tests, this approach caused a 30% performance drop, significantly slowing down execution. Another option would be to explore the rest of the code for a more efficient solution.

I can't access your document, unfortunately, but I think I got the picture by reading the code change.

Ah, I see. I thought the document was accessible, but it seems it’s not. I’ve attached a PDF to this comment. The design isn’t great, but all the data is included.

Public - Geofence Reducer Library Improvement.pdf

I can't see your updates on the unit tests though. Are you sure you pushed it?

Just pushed them now! I had a bit of trouble with the GPG keys, classic development hiccup, haha.

@InfoLorenzo
Copy link
Collaborator Author

InfoLorenzo commented Jan 16, 2025

If you'd like to see where I got those numbers from, here’s the full report:

Performance Comparison (precision 4..6).pdf

@filipecorrea
Copy link
Contributor

Thanks for sending the report. You made my day!

You did a great job and the changes are approved from my side, but I don't have push access to this repo anymore. Can you merge it?

@MartinodF MartinodF self-requested a review January 17, 2025 14:03
Copy link
Member

@MartinodF MartinodF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@InfoLorenzo
Copy link
Collaborator Author

Thanks for sending the report. You made my day!

You did a great job and the changes are approved from my side, but I don't have push access to this repo anymore. Can you merge it?

You made my day too! Thank you for the quick review and for getting involved on this 💪

I held off on merging earlier since I wasn’t able to, but now that I can, I’ll go ahead and do it. Thank you @MartinodF!

@InfoLorenzo InfoLorenzo merged commit d7c340e into klarna:master Jan 17, 2025
1 check passed
@InfoLorenzo InfoLorenzo self-assigned this Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants