Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deduplication step to sync #72

Open
stopfstedt opened this issue Jan 9, 2025 · 4 comments
Open

Add deduplication step to sync #72

stopfstedt opened this issue Jan 9, 2025 · 4 comments
Assignees
Labels

Comments

@stopfstedt
Copy link
Member

stopfstedt commented Jan 9, 2025

Currently, the sync process does not handle duplicate user accounts (duplicate by campus id) at the source.
When the sync runs, all users attached to a given sync source - learner group, cohort, instructor group etc. - are processed individually. this leads to a problem of repeat suspension/reactivation of enrollments for users with duplicate accounts on the Ilios side, where at least one account is enabled and at least one account is disabled.

To fix this, I'm proposing to fix this by adding a user de-duplication step the sync process with the following rules:

  • if all dupe accounts are active, then only the most recent one should be processed.
  • if all dupe accounts are inactive, then only the most recent one should be processed.
  • if there is a mix of active and inactive dupe accounts, then only the most recent active account should be processed.

"most recent" - most recently created, has the max user id value amongst its duplicates.

Edit: tightened up language on the first rule. [ST 2025/01/09]

@stopfstedt stopfstedt self-assigned this Jan 9, 2025
@stopfstedt stopfstedt added the bug label Jan 9, 2025
@michaelchadwick
Copy link

@stopfstedt If "most recently created" is the heuristic used, won't a new account made in 2024 that's actually related to an account with 2023 data be used with its "blank" data, obscuring the 2023 data? That case feels like a data entry issue (when new account is created, make sure identical account doesn't already exist), so not sure what post-processing can feasibly do about it.

@stopfstedt
Copy link
Member Author

stopfstedt commented Jan 9, 2025

@stopfstedt If "most recently created" is the heuristic used, won't a new account made in 2024 that's actually related to an account with 2023 data be used with its "blank" data, obscuring the 2023 data? That case feels like a data entry issue (when new account is created, make sure identical account doesn't already exist), so not sure what post-processing can feasibly do about it.

picking most recent here is essentially a tie breaker, we could also pick earliest with the same result. deduplication happens on campus id, we just have to pick one record. from a Moodle perspective, the Ilios user record is ephemeral, we don't store any linkage to concrete Ilios user records on the Moodle side.
the important decision point in all of this is to let the active user record win out over its inactive duplicate(s).
so far, this seem tight enough to me.
if this proposed solution turns out to be incomplete or flawed then i'll amend the ticket.

@jrjohnson
Copy link
Member

I'd lean towards erroring in this case and fixing the account (when there are two that are active). For historical data reasons we allow duplicate campus IDs, but it's not my favorite thing. If we error we could then clean up the data and ensure there is only one active account for a campus ID.

@ctam
Copy link
Member

ctam commented Jan 15, 2025

I probably don't know enough about Ilios's code now to make any comment. But is there any better way than using the 'max user id value' to determine if an account is 'most recent'? Is there a creation date time field we could use? If not, can that be added? It feels a little code smell with this approach.

"most recent" - most recently created, has the max user id value amongst its duplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants