Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update artifact collection state - consider download order when updating end time #89

Open
kaidaguerre opened this issue Jan 27, 2025 · 0 comments
Assignees

Comments

@kaidaguerre
Copy link
Contributor

artifact collection state has an EndTime property - this is the time up to which we are confident we have collected ALL data
the end time is calculated as the lastEntryTime-granularity

i.e. if the graunlarity is 1 day and the last entry time is jan 23 we are confident we have collected all data up to jan 22

This assumes OnCollected is called in date order

file discovery events are ordered - so assuming the artifact source discovers files in date order, we assume the the collection state should track the valid last entry time (and therefore end time)

However the downloads are async - which means the download events come out of order - this means OnCollected is called out of order, which breaks our logic

For now we have work around this by setting a min granularity of 1 day and hoping this works around the problem

Strategic solution is for colleciton stater to track discovered events then with an artifact state map
Each time an artifact is discovered add to the map

when an artifact is dowloaded, update the state int he map and determine the new end time (i.e. if the downloaded artifact is the EARLIEST in the map, use to update the end time)

@kaidaguerre kaidaguerre self-assigned this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant