Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidier cannot succeed when all the oldest data is not in ICAT #111

Open
stuartpullinger opened this issue Jun 30, 2020 · 1 comment
Open
Labels
bug Something isn't working
Milestone

Comments

@stuartpullinger
Copy link
Contributor

I suspect that this may end up as a 'won't fix' but I am noting the issue here so that, if that is the case, we can record the decision. This problem was observed in a preproduction IDS where the connected ICAT has a subset of the production database.

In normal operation, the IDS puts data in its cache. If it finds that the volume of data in its cache has exceeded the high threshold, it requests that the storage plugin walk the filesystem, finding the list of 'old' files which, if deleted, would free up enough space to take it below its low threshold. It then looks up the file locations in ICAT and loops over the results to request that the files are archived. (Any that are currently requested won't be archived because of the logic in the deferredOpsQueue).

So, if all the 'old' files are not found in ICAT, then it never frees up any space. What it doesn't do (and it is debatable whether it should) is try to delete more data to correct for the files that it skipped.

I think the behaviour here may not be optimal but I'm not sure the use case of lots of unknown data sitting in the IDS cache was foreseen or even if it should be accommodated. Could this problem arise in a production environment?

Solutions to this are complicated because the Tidier delegates finding files to the storage plugin and archiving/deleting files to the deferredOpsQueue. Some possible approaches:

  • We issue a warning when an 'old' file is not found in ICAT (and therefore the space it occupies won't be freed)
  • We aggregate the size of all the skipped files in the loop and, if this total is >= the difference between the thresholds, then we know that the disk space will never be freed. We issue an error in this case as the Tidier can no longer prevent the disk from running out of space.
  • We document that the main storage should be clear or only contain ICAT files in the installation instructions.
@stuartpullinger stuartpullinger added the bug Something isn't working label Jun 30, 2020
@RKrahl
Copy link
Member

RKrahl commented Jun 30, 2020

There are a few fundamental assumptions that the design of IDS is based upon. These assumptions include:

  1. IDS has exclusive access to the main and archive storage. If other processes have access to the storage, those processes must behave, e.g. their actions must be in line with the internal processes within IDS and concurrent access must be protected using locking.

  2. the files in the storage must always be consistent with the content of ICAT, e.g. for every Datafile object in ICAT (having a non NULL location attribute) there must be a corresponding file in the storage and vice versa.

If these assumptions are not met, this constitutes an out-of-specification use which results in undefined behavior. Based on this, we could consider this bug as invalid.

However, I just checked the sources: error handling in the Tidier is essentially non-existent. So I agree with your first bullet and would even say that the Tidier should log an error for every Dataset or Datafile reported from the plugin it finds missing in ICAT. In that sense, this issue is a variant of #79. Obviously this would only help if anybody actually checks the error logs. I also agree with your last bullet that the documentation may need improvment.

@RKrahl RKrahl added this to the 2.0.0 milestone Jan 6, 2021
@RKrahl RKrahl modified the milestones: 2.0.0, 3.0.0 Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants