Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: hot tags filters #199

Closed
SHAcollision opened this issue Nov 18, 2024 · 4 comments · Fixed by #222
Closed

Feat: hot tags filters #199

SHAcollision opened this issue Nov 18, 2024 · 4 comments · Fixed by #222
Assignees
Labels

Comments

@SHAcollision
Copy link
Collaborator

Somewhat similar to #89

  • Friends/Following/Follower reach setting. See the scores of those in your WoT.
  • All time / 24h / Last Month. We should explore Redis Timeseries? We can also use cronjobs and populate many tables.

I believe @amirRamirfatahi was working on this one?

@amirRamirfatahi
Copy link
Collaborator

For global hot tags, I've come up with multiple solutions so I thought maybe we can talk about which one to pick here. Initially, I wanted to "using multiple sorted sets as different buckets and keep rolling them up into the previous buckets as we move forward", but I've come to realize we have better ways to do it:

  1. Use "Top-K" data type in redis, instead of sorted sets. It allows for a much lower memory footprint, while giving us 99.99% accuracy. As Redis cannot do the union of multiple keys here, we need to have predefined windows.
  2. Use "sorted sets" for time buckets, and go with very fine-grained time buckets. for example, 5 minutes, then last hour window is the union of 12 of these together. Last day is a union of 24 1-hour periods, and so on.
  3. Caching. Basically we let the query hit the DB, but we cache the whole result in Redis for a period of time (which is going to be dynamically set based on the time window requested. for example hot tags today could be cached for 1 hour, while hot tags for this month can be renewed daily.)

Which one do you think we should go with? @SHAcollision @tipogi

@tipogi
Copy link
Collaborator

tipogi commented Nov 25, 2024

I will go with 3rd option. Setting up different timeframes to execute the query against database and after store the result. I will not over-complicate for that feature. When it would be expensive the query database, we could go maybe with the first option or other implementation? The problem that I see here is when we add filters to that time window as Followers, Following and Friends

@amirRamirfatahi
Copy link
Collaborator

@tipogi I'm also leaning on #3. When talking about non-global ones, (followers, following, etc) we're already letting it hit the database. So #3 is kinda the only option we have for that. For global, we can do 1 and 2 too.

@tipogi
Copy link
Collaborator

tipogi commented Nov 25, 2024

For example, in the hot tags module of the app, there is also an option to filter by reach and/or timeline. In such cases, as you say, we should query the graph. But at this point, adding a separate implementation just to handle global hot tags filtered only by timeline (without the reach filter) doesn’t seem reasonable to me.

@SHAcollision SHAcollision linked a pull request Nov 25, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants