Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: improve influencers endpoint #234

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

amirRamirfatahi
Copy link
Collaborator

Pre-submission Checklist

For tests to work you need a working neo4j and redis instance with the example dataset in docker/db-graph

  • Testing: Implement and pass new tests for the new features/fixes, cargo test.
  • Performance: Ensure new code has relevant performance benchmarks, cargo bench

@amirRamirfatahi amirRamirfatahi marked this pull request as ready for review December 3, 2024 13:48
@amirRamirfatahi amirRamirfatahi self-assigned this Dec 3, 2024
@tipogi tipogi self-requested a review December 6, 2024 09:21
Copy link
Collaborator

@tipogi tipogi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, it would be helpful to avoid merging one PR into another, as it can make things a bit confusing to review

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Influencers struct can remain minimal, as its primary purpose is limited to this module. Unnecessary data structures like the HashMap can be removed, simplifying the model and reducing overhead.

  • The Influencers struct is only used within this module, there are no external dependencies on its structure. The stream module (src/user/stream.rs), only needs pubkys to after create UserViews so we can return the vector of tuples (Vec<(String, u64)>)
  • All required functionality, including filtering and ranking influencers by timeframe, can be achieved directly with Redis Sorted Sets.
  • The cypher query for influencers, filtered by timeframe, can be optimized to return an array of arrays (e.g., [[pubky, score], [...]]). This eliminates the need for Influencers struct conversions.
let result = retrieve_from_graph::<Vec<(String, u64)>>(query, "influencers").await?;
  • All Redis indexes that have an expiration time are prefixed with CACHE keyword for clarity and consistency throughout the project. This prefix denotes their cache nature and helps distinguish them from other keys in Redis

@@ -31,7 +34,10 @@ pub struct UserStreamQuery {
("skip" = Option<usize>, Query, description = "Skip N followers"),
("limit" = Option<usize>, Query, description = "Retrieve N followers"),
("source" = Option<UserStreamSource>, Query, description = "Source of users for the stream."),
("depth" = Option<usize>, Query, description = "User trusted network depth, user following users distance. Numbers bigger than 4, will be ignored")
("source_reach" = Option<StreamReach>, Query, description = "The target reach of the source. Supported in 'influencers' source."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To continue with the naming conventions, it should be reach

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reach is related to the source so it's source's reach, hence the naming.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reach parameter name is concise, aligns with existing conventions, and avoids unnecessary redundancy. The swagger description already clarifies its relationship to the source. Another point, source_reach cannot be combined with the other UserStreamSource so better candidate, if you want to add a prefix, would be influencer_reach but I do not see neither...

query.depth,
Some(timeframe),
query.preview,
)
.await
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it does not have any influencers in the selected timeframe, it throws a 500 error which it should be 404 type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using the same function we already have? I don't think so as this function is returning a None if the inner user id list is empty. Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fixed it.

use crate::queries;
use crate::RedisOps;

const GLOBAL_INFLUENCERS_PREFIX: &str = "Cached:Influencers";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong cache prefix, it has to be Cache

Comment on lines +16 to +44
#[derive(Deserialize, Serialize, ToSchema, Debug, Clone)]
pub struct Influencer {
pub id: String,
pub score: f64,
}

// Define a newtype wrapper
#[derive(Serialize, Deserialize, Debug, ToSchema, Default, Clone)]
pub struct Influencers(pub Vec<Influencer>);

impl RedisOps for Influencers {}

// Create a Influencers instance directly from an iterator of Influencer items
// Need it in collect()
impl FromIterator<Influencer> for Influencers {
fn from_iter<I: IntoIterator<Item = Influencer>>(iter: I) -> Self {
Influencers(iter.into_iter().collect())
}
}

// Implement Deref so Influencers can be used like Vec<String>
impl Deref for Influencers {
type Target = Vec<Influencer>;

fn deref(&self) -> &Self::Target {
&self.0
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of Influencer struct feels unnecessary, as a tuple (score, pubky) seems to capture the same logic effectively. A vector of such tuples, Vec<(score, pubky)>, Influencers struct now, would suffice, reducing complexity without losing clarity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call it unnecessary.
Having a struct for it means in the future we can easily add more properties to it if needed, without having to refactor a lot of code.
This isn't hurting readability, performance or consistency, so I don't think it hurts to keep it here.
What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what I think from the beginning 😁. That data about users will always go to get UserViews so now and in the future, I do not think that we will need that struct. @SHAcollision any opinion about that?

Some(user_id) => {
Influencers::get_influencers_by_reach(
user_id,
reach.unwrap(),
Copy link
Collaborator

@tipogi tipogi Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if reach is None. Risk of panic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reach couldn't be none because of previous checks.
Nonetheless, you are correct. will fix it.

Comment on lines -250 to +274
.map(|set| set.into_iter().map(|(user_id, _score)| user_id).collect()),
.map(|result| {
result
.iter()
.map(|influencer| influencer.id.clone())
.collect()
}),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous implementation was correct. We do not need to clone() the user pubky plus do not the create new structs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that depends in the Influencers struct 👍

Comment on lines 112 to 122
match UserStream::get_by_id(&UserStreamInput {
user_id: query.user_id.clone(),
viewer_id: query.viewer_id,
skip: Some(skip),
limit: Some(limit),
source: source.clone(),
source_reach: query.source_reach,
depth: query.depth,
timeframe: Some(timeframe),
preview: query.preview,
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function exceeds the recommended maximum of 7 parameters, which may cause a clippy warning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why it's a struct and the function is just getting one argument.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ups! you are right, I miss that struct haahh

@SHAcollision SHAcollision linked an issue Dec 19, 2024 that may be closed by this pull request
@SHAcollision SHAcollision changed the title Feature/influencers feat (service): add reach timeframe query params to influencer endpoint Dec 19, 2024
use crate::{db::kv::index::sorted_sets::SortOrder, RedisOps};
use crate::{get_neo4j_graph, queries};
use serde::{Deserialize, Serialize};
use tokio::task::spawn;
use utoipa::ToSchema;

pub const USER_MOSTFOLLOWED_KEY_PARTS: [&str; 2] = ["Users", "MostFollowed"];
pub const USER_PIONEERS_KEY_PARTS: [&str; 2] = ["Users", "Pioneers"];
pub const USER_INFLUENCERS_KEY_PARTS: [&str; 2] = ["Users", "Influencers"];
Copy link
Collaborator

@SHAcollision SHAcollision Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change of Index schemas and we do not have a "migration". What is the plan? Is this rename needed just yet? If we update or staging server the feature will stop working.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was fixed like this because the graph is still too small for a reindex to be costly

@amirRamirfatahi amirRamirfatahi changed the title feat (service): add reach timeframe query params to influencer endpoint influencers Dec 20, 2024
@SHAcollision SHAcollision changed the title influencers Feat: improve influencers endpoint Jan 1, 2025
@SHAcollision SHAcollision requested a review from tipogi January 14, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat: improve pioneers/influencers
3 participants