-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Geofence Reducer Performance ⚡ #15
Conversation
Hi @InfoLorenzo. Can you share the performance tests and results in this PR description? |
Hi @filipecorrea, It’s great to hear from you again! I’m currently on holiday, but I’ll post updates as soon as I’m back. |
Hey @filipecorrea, While testing, I noticed that the same geohash sometimes appears multiple times for a single geofence, which is causing discrepancies in my frequency map. I also checked the vicinityhash library and saw the same geohash show up four times for a single geofence at precision 5. Which is the reason behind why we're getting these duplicates when “compress” is false? Could it be related to how we mirror coordinates in different quadrants, or is there another reason? I would love to hear your thoughts on this. Thank you in advance! |
Hi @filipecorrea, Just a quick update!
Let me know what you think when you have time! 😊 |
Maybe there's an issue with vicinityhash compression part, because it's intent is to remove the geohashes duplicates and use a single, broader, geohash to represent the same area. Can you take a look and see if you can find the issue? I think no one reviewed this code so far and I really appreciate if you could do so. |
I can't access your document, unfortunately, but I think I got the picture by reading the code change. I can't see your updates on the unit tests though. Are you sure you pushed it? |
Sure! I did some testing, and the compression works quite well overall. The only issue is that when compression is set to false, the library can sometimes return the same geohash a few times for the same geofence. This happens because we don’t use a Set to handle the geohashes in this case, unlike when compression is set to true. One potential fix could be to use a Set even when compression is false. However, based on my tests, this approach caused a 30% performance drop, significantly slowing down execution. Another option would be to explore the rest of the code for a more efficient solution.
Ah, I see. I thought the document was accessible, but it seems it’s not. I’ve attached a PDF to this comment. The design isn’t great, but all the data is included. Public - Geofence Reducer Library Improvement.pdf
Just pushed them now! I had a bit of trouble with the GPG keys, classic development hiccup, haha. |
If you'd like to see where I got those numbers from, here’s the full report: |
Thanks for sending the report. You made my day! You did a great job and the changes are approved from my side, but I don't have push access to this repo anymore. Can you merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
You made my day too! Thank you for the quick review and for getting involved on this 💪 I held off on merging earlier since I wasn’t able to, but now that I can, I’ll go ahead and do it. Thank you @MartinodF! |
This PR refactors the geofence reducer library by replacing the substring-based uniqueness check with a frequency-based algorithm, improving the efficiency of geohash processing while maintaining input-output consistency.
Summary of Results
For additional details and extended results, see the Geofence Reducer Library Improvement document.