Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuing a process from PARSING phase loses retrieved temp folder #4783

Open
chrisjsewell opened this issue Feb 26, 2021 · 2 comments
Open

Comments

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 26, 2021

Originally posted by @chrisjsewell in #4648 (comment)

@sphuber it seems here:

elif self.data == RETRIEVE_COMMAND:
node.set_process_status(process_status)
# Create a temporary folder that has to be deleted by JobProcess.retrieved after successful parsing
temp_folder = tempfile.mkdtemp()
await self._launch_task(task_retrieve_job, node, transport_queue, temp_folder)
result = self.parse(temp_folder)

the temp_folder is created fresh everytime you reach this part of the code. So if the daemon worker is stopped during the parsing, and a new daemon worker picks up the task when the daemon is restarted, it will get a new temp_folder (and also the old temp_folder will not be removed), but the retrieval will be skipped and you ed up with an empty temp_folder:

if node.get_state() == CalcJobState.PARSING:
logger.warning(f'CalcJob<{node.pk}> already marked as PARSING, skipping task_retrieve_job')
return

so perhaps the temp_folder location should be stored on the node, e.g. something like

temp_folder = node.get_retrieved_temp_folder()
if not temp_folder or not os.path.exists(temp_folder):
    temp_folder = tempfile.mkdtemp()
    node.set_retrieved_temp_folder(temp_folder)
@chrisjsewell
Copy link
Member Author

chrisjsewell commented Feb 26, 2021

or alternatively (as noted by @sphuber) you could re-do the retrieval

This definitely used to work at some point and without storing the temporary folder on the node. Instead after the restart it would generate a new temp folder and retrieve again. This is because you cannot guarantee that after the restart the original temp folder still exists so you have to accept the inefficiency of retrieving again.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Feb 26, 2021

@sphuber I'm really not sure this ever worked lol, because there is multiple places in the code that would stop this being possible. For example, you would also need to change here:

# If the calculation already has a `retrieved` folder, simply return. The retrieval was apparently already completed
# before, which can happen if the daemon is restarted and it shuts down after retrieving but before getting the
# chance to perform the state transition. Upon reloading this calculation, it will re-attempt the retrieval.
link_label = calculation.link_label_retrieved
if calculation.get_outgoing(FolderData, link_label_filter=link_label).first():
EXEC_LOGGER.warning(
f'CalcJobNode<{calculation.pk}> already has a `{link_label}` output folder: skipping retrieval'
)
return

you would need to rework this so that if the retrieved folder already exists and retrieve_temporary_list is not empty then you only retrieve these files

@chrisjsewell chrisjsewell self-assigned this Feb 26, 2021
@sphuber sphuber modified the milestones: Preparations for the new repository, v2.0.0 Apr 28, 2021
@chrisjsewell chrisjsewell modified the milestones: v2.0.0, Post 2.0 May 5, 2021
@sphuber sphuber removed this from the v2.3.0 milestone Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants