Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4911 Introduced fix_rd_broken_links command #5095

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

albertisfu
Copy link
Contributor

@albertisfu albertisfu commented Feb 19, 2025

This PR introduces the Django command outlined in #4911 to fix RDs affected by the broken links issue.

  • It works by checking all dockets with a date_modified equal to or greater than --start-date, which should be the date we started our last full re-index. This ensures we fix all broken links that may have been scrambled.
  • For performance reasons, this query relies on the Docket table, where date_modified has an index, whereas the DocketEvent table does not. As a result, this query returns the docket IDs that changed after this date.
  • The previous query is then used as a subquery in a DocketEvent query to retrieve the counts of PGH events for each of these docket IDs. For this, the column pgh_obj_id is used instead of id for performance reasons, as the first is an indexed column.
  • Next, for each docket event count iteration, an additional query is performed to retrieve the total count of DocketEvent entries for the docket_id, using the current docket slug as a filter. The idea behind this is that if the total number of events for a docket is equal to the total number of events filtered by the current docket slug, then all events that belong to the docket have the same slug, meaning that the slug hasn't changed.
  • If we detect that the slug has changed, the docket_id is scheduled for re-indexing. This includes re-indexing all associated RECAPDocuments in bulk so they get fixed.

The command runs in a two-step process:

  1. The first step runs the queries and performs the condition checks to determine the total number of dockets that need to be fixed.
  2. The second step runs the queries again, but this time the tasks to fix the documents are scheduled. This approach allows us to track the task's progress and estimate how long it will run.

The command can be executed as follows:

manage.py fix_rd_broken_links --start-date 2024-03-25 --queue celery --chunk-size 50

March 25, 2024 is the approximate date when the last full re-index for RECAP was started, according to Slack records.

This command should be run after #5086 is merged.

@albertisfu albertisfu marked this pull request as ready for review February 19, 2025 20:11
@albertisfu albertisfu requested a review from mlissner February 19, 2025 20:11
@mlissner mlissner requested review from ERosendo and removed request for mlissner February 19, 2025 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Do
Development

Successfully merging this pull request may close these issues.

2 participants