-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add tags streams #68
Changes from 6 commits
2a0c393
365ee4d
1b7011d
8260a06
f8b4594
e0ea9ee
6736b5c
f62e0e7
58d55fb
88692b2
2530f8d
3c7b3f2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -208,3 +208,71 @@ pub fn get_user_following(user_id: &str, skip: Option<usize>, limit: Option<usiz | |
} | ||
query(&query_string).param("user_id", user_id) | ||
} | ||
|
||
// Retrieves popular tags across the entire network | ||
// Results ordered by post count (descending), effectively ranking "hot" tags. | ||
pub fn get_global_hot_tags_scores() -> Query { | ||
query(" | ||
MATCH (u:User)-[tag:TAGGED]->(p:Post) | ||
WITH tag.label AS label, COUNT(DISTINCT p) AS uniquePosts | ||
RETURN COLLECT([toFloat(uniquePosts), label]) AS hot_tags | ||
") | ||
} | ||
|
||
// Retrieves popular hot tags taggers across the entire network | ||
pub fn get_global_hot_tags_taggers(tag_list: &[&str]) -> Query { | ||
query(" | ||
UNWIND $labels AS tag_name | ||
MATCH (u:User)-[tag:TAGGED]->(p:Post) | ||
WHERE tag.label = tag_name | ||
WITH tag.label AS label, COLLECT(DISTINCT u.id) AS userIds | ||
RETURN COLLECT(userIds) AS tag_user_ids | ||
") | ||
.param("labels", tag_list) | ||
} | ||
|
||
// Analyzes tag usage for a specific list of user IDs. Groups tags by name, | ||
// showing for each: label, post count, list of user IDs and total usage count. | ||
// Orders by user count and usage (descending). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ordering seems to be only by usage count (times), not by user count. |
||
// Note: Only considers users from the provided ID list. | ||
pub fn get_tags_from_user_ids(user_ids: &[&str]) -> Query { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: when naming functions we are typically using |
||
query(" | ||
UNWIND $ids AS id | ||
MATCH (u:User)-[tag:TAGGED]->(p:Post) | ||
WHERE u.id = id | ||
WITH tag.label AS label, COLLECT(DISTINCT u.id) AS taggers, COUNT(DISTINCT p) AS uniquePosts, COUNT(*) AS times | ||
WITH { | ||
label: label, | ||
times: times, | ||
tagger_ids: taggers, | ||
post_count: uniquePosts | ||
} AS hot_tag | ||
ORDER BY times DESC | ||
RETURN COLLECT(hot_tag) AS hot_tags | ||
") | ||
.param("ids", user_ids) | ||
} | ||
|
||
// Finds tags used by specified user IDs (Followers | Following | Friends), then counts their usage across all users. | ||
// Note: Initial tag set from input user list, but final counts include all users. | ||
pub fn get_general_count_tags_from_user_ids(user_ids: &[&str]) -> Query { | ||
query(" | ||
UNWIND $ids AS id | ||
MATCH (u:User)-[tag:TAGGED]->(Post) | ||
WHERE u.id = id | ||
WITH COLLECT(DISTINCT tag.label) AS userTags | ||
UNWIND userTags AS label | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the difference of WITH COLLECT(DISTINCT tag.label) AS userTags
UNWIND userTags AS label and WITH DISTINCT tag.label AS label It is generally more efficient than the |
||
MATCH (u:User)-[tag:TAGGED]->(p:Post) | ||
WHERE tag.label = label | ||
WITH label, COUNT(*) AS times, COLLECT(DISTINCT u.id) AS taggers, COUNT(DISTINCT p) AS uniquePosts | ||
WITH { | ||
label: label, | ||
times: times, | ||
tagger_ids: taggers, | ||
post_count: uniquePosts | ||
} AS hot_tag | ||
ORDER BY times DESC | ||
RETURN COLLECT(hot_tag) AS hot_tags | ||
") | ||
.param("ids", user_ids) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
use neo4rs::Query; | ||
use serde::{Deserialize, Serialize}; | ||
use utoipa::ToSchema; | ||
use std::error::Error; | ||
|
||
use crate::db::kv::index::sorted_sets::Sorting; | ||
use crate::RedisOps; | ||
use crate::{db::connectors::neo4j::get_neo4j_graph, queries}; | ||
use crate::models::user::{UserStream, UserStreamType}; | ||
|
||
pub const GLOBAL_HOT_TAGS: [&str; 3] = ["Tags", "Global", "Hot"]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
#[derive(Deserialize, Serialize, ToSchema, Debug)] | ||
pub struct HotTag { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if too late or if it matters. It feels like we have I do not think it matters much at all, but it is a slightly different approach :) |
||
label: String, | ||
tagger_ids: Vec<String>, | ||
post_count: u64 | ||
} | ||
|
||
impl HotTag { | ||
fn new(label: String, tagger_ids: Vec<String>, post_count: u64) -> Self { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not strictly needed, although we still have many of these |
||
Self { label, tagger_ids, post_count } | ||
} | ||
} | ||
|
||
impl RedisOps for HotTag {} | ||
|
||
type TagList = Vec<String>; | ||
|
||
|
||
|
||
impl HotTag { | ||
pub async fn get_global_tags_stream(skip: Option<usize>, limit: Option<usize>) -> Result<Option<Vec<Self>>, Box<dyn Error + Send + Sync>> { | ||
let hot_tags = match Self::try_from_index_sorted_set( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: this one is a matter of taste. Instead of a match we could do
|
||
&GLOBAL_HOT_TAGS, | ||
None, | ||
None, | ||
skip, | ||
limit, | ||
Sorting::Descending | ||
) | ||
.await? { | ||
Some(tags) => tags, | ||
None => return Ok(None) | ||
}; | ||
|
||
let tag_list: Vec<&str> = hot_tags.iter().map(|(label, _)| label.as_ref()).collect(); | ||
let query = queries::get_global_hot_tags_taggers(tag_list.as_slice()); | ||
let tag_user_list = retrieve_hot_tags_from_graph(query).await?.unwrap(); | ||
|
||
let hot_tags_stream: Vec<HotTag> = hot_tags | ||
.into_iter() | ||
.zip(tag_user_list) | ||
.map(|((label, score), tagger_ids)| { | ||
HotTag::new(label, tagger_ids, score as u64) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we delete |
||
}).collect(); | ||
|
||
Ok(Some(hot_tags_stream)) | ||
} | ||
|
||
pub async fn get_stream_tags_by_reach(user_id: String, reach: UserStreamType) -> Result<Option<Vec<Self>>, Box<dyn Error + Send + Sync>> { | ||
// We cannot use here limit and skip because we want to get all the users reach by | ||
let users = UserStream::get_user_list_from_reach(&user_id, reach, None, Some(10000)).await?; | ||
match users { | ||
Some(users) => retrieve_users_tags_by_reach(&users).await, | ||
None => Ok(None), | ||
} | ||
} | ||
} | ||
|
||
async fn retrieve_users_tags_by_reach(users: &[String]) -> Result<Option<Vec<HotTag>>, Box<dyn Error + Send + Sync>> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We typically use If all of these functions are specific for HotTags maybe they belong as HotTags methods. We can also avoid using |
||
let user_slice = users.iter().map(AsRef::as_ref).collect::<Vec<&str>>(); | ||
let query = queries::get_tags_from_user_ids(user_slice.as_slice()); | ||
retrieve_by_reach_hot_tags(query).await | ||
} | ||
|
||
async fn retrieve_hot_tags_from_graph(query: Query) -> Result<Option<Vec<TagList>>, Box<dyn Error + Send + Sync>> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't think much of this, but maybe we can reduce repetitive code. The code for retrieving results from Neo4j is similar across these functions. Maybe something like this with some fixes works?
|
||
let mut result; | ||
{ | ||
let graph = get_neo4j_graph()?; | ||
|
||
let graph = graph.lock().await; | ||
result = graph.execute(query).await?; | ||
} | ||
if let Some(row) = result.next().await? { | ||
let hot_tags: Vec<TagList> = row.get("tag_user_ids")?; | ||
return Ok(Some(hot_tags)); | ||
} | ||
Ok(None) | ||
} | ||
|
||
async fn retrieve_by_reach_hot_tags(query: Query) -> Result<Option<Vec<HotTag>>, Box<dyn Error + Send + Sync>> { | ||
let mut result; | ||
{ | ||
let graph = get_neo4j_graph()?; | ||
|
||
let graph = graph.lock().await; | ||
result = graph.execute(query).await?; | ||
} | ||
if let Some(row) = result.next().await? { | ||
let hot_tags: Vec<HotTag> = row.get("hot_tags")?; | ||
return Ok(Some(hot_tags)); | ||
} | ||
Ok(None) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query does not seem to order the results by desceding post count. It would need something like this before the
RETURN
statement:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tipogi This query still does not sort by descending post count. Maybe it works as intended this way, but the docstring is not correct.