feat: add graph return control for all ingestion events #253

tipogi · 2024-12-16T06:56:41Z

Pre-submission Checklist

For tests to work you need a working neo4j and redis instance with the example dataset in docker/db-graph

Testing: Implement and pass new tests for the new features/fixes, cargo test.
Performance: Ensure new code has relevant performance benchmarks, cargo bench

PR description

The initial implementation was risky due to the lack of clear feedback when an event failed to index in the graph. This made troubleshooting difficult. We’ve improved observability and can now identify specific failure scenarios, such as missing relationships in the graph. Knowing this, we can avoid ineffective retries and instead take alternative actions for such cases.

One proposed solution is to implement a RetryManager(#247), which would manage retries intelligently by analyzing failures and determining the best course of action—whether retrying after resolving dependencies or flagging for manual intervention. This approach enhances both the reliability and efficiency of the indexing process

SHAcollision · 2025-01-03T12:52:37Z

src/db/graph/queries/put.rs

+    ");
+
+    // Create the generic part of the query depending the post type
+    if action == "replies" {


Let's use a Match and instead of matching a string "replies" and "repost" let's make "action" and Enum.

SHAcollision

It's looking cleaner!

SHAcollision · 2025-01-06T00:14:11Z

src/db/graph/queries/put.rs

-                new_relationships.push("MERGE (new_post)-[:REPOSTED]->(repost_parent_post)");
-            }
-        }
+    if let Some(_) = &post_relationships.replied {


if post_relationships.replied.is_some() {

SHAcollision · 2025-01-06T00:14:27Z

src/db/graph/queries/put.rs

+        ");
+        new_relationships.push("MERGE (new_post)-[:REPLIED]->(reply_parent_post)");
+    };
+    if let Some(_) = &post_relationships.reposted {


if post_relationships.reposted.is_some() {

SHAcollision · 2025-01-06T00:27:04Z

src/models/post/relationships.rs

@@ -88,6 +89,23 @@ impl PostRelationships {
        }
    }

+    /// Constructs a `Self` instance by extracting relationships from a `PubkyAppPost` object
+    pub fn get_from_homeserver(post: &PubkyAppPost) -> Self {


from_homeserver() if we want to match the convention we are using in other models

SHAcollision · 2025-01-06T00:35:54Z

src/models/post/relationships.rs

+        if let Some(parent_uri) = &post.parent {
+            relationship.replied = Some(parent_uri.to_string());
+        }


This if can be removed and simplified as:

relationship.replied = post.parent.clone();

SHAcollision · 2025-01-06T00:39:34Z

src/events/handlers/post.rs

@@ -104,44 +99,55 @@ pub async fn sync_put(
            )
            .await?;
        }
+         // Populate the reply parent keys to after index the reply


// Populate the... to after....

Suggest to improve this comment to:
Define the reply parent key to index the reply later .

Applies to other comments that contain "to after".

Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.134 to 1.0.135. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](serde-rs/json@v1.0.134...v1.0.135) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

SHAcollision

We are now implicitly using the Option<bool> returned by most graph queries to have a stricter control of what did really happend when we made a graph query. However Option<bool> is not very expressive even though we are making a very sistematic usage of it. This will be hard to understand by anyone reading the code, and even hard for us when we add new queries or revisit existing queries in the future.

Do you think there's a way to make this even clearer and more obvious in the 1. Queries themselves, 2. the way we execute them, and 3. the returned type (something more expressive than Option) ?

In addition, for instance, exec_boolean_row could be renamed to something even clearer for its new very clear purpose. As long as we are consistent in the way we use this function to execute queries.

SHAcollision · 2025-01-08T10:32:04Z

src/db/graph/exec.rs

@@ -13,18 +13,19 @@ pub async fn exec_single_row(query: Query) -> Result<(), DynError> {
 }

 // Exec a graph query that has a single "boolean" return
-pub async fn exec_boolean_row(query: Query) -> Result<bool, DynError> {
+pub async fn exec_boolean_row(query: Query) -> Result<Option<bool>, DynError> {


Have we documented this well? For example, it can be document the implicit usage we are making of the returned type of this function to control for non existing objects.

This function is usually used for queries that will return a row where the response is:

None: Some dependency is missing (e.g., a reply's parent).

Some(true): The node/relationship already existed, and from this, we deduce that it is an EDIT.

Some(false): The node/relationship did not exist, so we have created a new node or relationship.

We could use the following enum

pub enum QueryResult { Pending, // None: Some dependency is pending Edited, // Some(true): The node/relationship existed (an edit was performed) Created, // Some(false): The node/relationship did not exist (a new one was created) }

SHAcollision · 2025-01-08T18:00:57Z

src/db/graph/exec.rs

+    if let Some(row) = result.next().await? {
+        // The "flag" field indicates a specific condition in the query
+        match row.get("flag")? {
+            true => Ok(OperationOutcome::Updated),
+            false => Ok(OperationOutcome::Created),
+        }
+    } else {
+        Ok(OperationOutcome::Pending)
    }


match result.next().await? { Some(row) => match row.get("flag")? { true => Ok(OperationOutcome::Updated), false => Ok(OperationOutcome::Created), }, None => Ok(OperationOutcome::Pending), }

SHAcollision

Woop, woop!! This is true quality upgrade. Hoping to see much fewer partial synchronization issues from now on 🥳

SHAcollision · 2025-01-09T11:18:51Z

tests/watcher/utils/watcher.rs

+/// # Arguments
+/// * `event_line` - A string slice that represents the URI of the event to be retrieved
+///   from the homeserver. It contains the event type and the homeserver uri
+pub async fn retrieve_event_from_homeserver(event_line: &str) -> Result<(), DynError> {


Suggested change

pub async fn retrieve_event_from_homeserver(event_line: &str) -> Result<(), DynError> {

pub async fn retrieve_and_handle_event_line(event_line: &str) -> Result<(), DynError> {

SHAcollision · 2025-01-09T11:23:14Z

tests/watcher/tags/fail_index.rs

+
+    // Switch OFF the event processor to simulate the pending events to index
+    // In that case, shadow user
+    test = test.remove_event_processing().await;


SHAcollision · 2025-01-09T11:24:10Z

tests/watcher/tags/fail_index.rs

+    test.put(tag_url.as_str(), tag_blob).await?;
+
+    // Create raw event line to retrieve the content from the homeserver. Event processor is deactivated
+    // Like this, we can trigger the error in that test


// Like this, we can trigger the error in that test
clarify, like what? and what test?

SHAcollision · 2025-01-09T11:24:59Z

tests/watcher/tags/fail_index.rs

+
+    assert!(
+        sync_fail,
+        "Cannot exist the tag because it is not in sync the graph with events"


Profile control in PUT post

e830dab

tipogi self-assigned this Dec 16, 2024

tipogi added enhancement New feature or request 🔮 nexus 👀 watcher labels Dec 16, 2024

tipogi added 4 commits December 16, 2024 13:27

add in PUT events, user profile control. File PUT missing

edb7c4e

add in DEL event, user profile control. Files missing

06cdea4

minor changes in file DEL

1b80258

Add note

76f95f1

SHAcollision changed the title ~~feat: Add user profile control for events~~ feat: add graph return control for all ingestion events Jan 1, 2025

post dependecy control

bc1be0b

SHAcollision reviewed Jan 3, 2025

View reviewed changes

tipogi added 2 commits January 3, 2025 20:20

refactor post creation query

2c6a477

Remove the PostInteraction enum in favor of PostRelationships

7c83996

SHAcollision reviewed Jan 6, 2025

View reviewed changes

tipogi added 4 commits January 7, 2025 12:43

added integration test when the graph is unsync with homeserver

a8a6752

added another test

e2b2b9b

reviewed watcher tests

df02e70

small fixes

9e22f2d

tipogi requested review from amirRamirfatahi and SHAcollision January 7, 2025 17:06

tipogi marked this pull request as ready for review January 7, 2025 17:06

SHAcollision and others added 2 commits January 7, 2025 19:38

deps: bump axum to 0.8.1 (#276)

b7832c5

SHAcollision reviewed Jan 8, 2025

View reviewed changes

tipogi marked this pull request as draft January 8, 2025 16:27

tipogi removed the request for review from amirRamirfatahi January 8, 2025 16:27

be more ideomatic with the query results

efd64d5

SHAcollision reviewed Jan 8, 2025

View reviewed changes

change query return values to match OperationOutcome enum

3ecc287

tipogi marked this pull request as ready for review January 9, 2025 06:31

tipogi added 3 commits January 9, 2025 10:24

merge main

f22b4d5

rename return value of the query when adding user tags

b46c375

small fixes

7b290ec

SHAcollision approved these changes Jan 9, 2025

View reviewed changes

SHAcollision force-pushed the main branch from 2bd53cf to 31fe413 Compare January 9, 2025 13:02

tipogi added 2 commits January 9, 2025 16:46

review fixes

09d17fb

review fixes

5d39d4b

tipogi merged commit 950be28 into main Jan 9, 2025
3 checks passed

tipogi deleted the feat/user-control branch January 9, 2025 15:53

tipogi restored the feat/user-control branch January 9, 2025 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add graph return control for all ingestion events #253

feat: add graph return control for all ingestion events #253

tipogi commented Dec 16, 2024 •

edited

Loading

SHAcollision Jan 3, 2025

SHAcollision left a comment

SHAcollision Jan 6, 2025

SHAcollision Jan 6, 2025

SHAcollision Jan 6, 2025

SHAcollision Jan 6, 2025

SHAcollision Jan 6, 2025 •

edited

Loading

SHAcollision left a comment •

edited

Loading

SHAcollision Jan 8, 2025

tipogi Jan 8, 2025

SHAcollision Jan 8, 2025

SHAcollision left a comment

SHAcollision Jan 9, 2025

SHAcollision Jan 9, 2025

SHAcollision Jan 9, 2025

SHAcollision Jan 9, 2025

	pub async fn retrieve_event_from_homeserver(event_line: &str) -> Result<(), DynError> {
	pub async fn retrieve_and_handle_event_line(event_line: &str) -> Result<(), DynError> {

feat: add graph return control for all ingestion events #253

feat: add graph return control for all ingestion events #253

Conversation

tipogi commented Dec 16, 2024 • edited Loading

Pre-submission Checklist

PR description

Choose a reason for hiding this comment

SHAcollision left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SHAcollision Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

SHAcollision left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SHAcollision left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tipogi commented Dec 16, 2024 •

edited

Loading

SHAcollision Jan 6, 2025 •

edited

Loading

SHAcollision left a comment •

edited

Loading