Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add threads for alias and output updates on startup #214

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Atticus1806
Copy link
Contributor

Fix #213

@albertz
Copy link
Member

albertz commented Oct 29, 2024

How thread-safe is Sisyphus? There is no potential issue with that?

@Atticus1806
Copy link
Contributor Author

Since this is just alias and output updates I dont think there is a problem. These ops should be executable any time

@albertz
Copy link
Member

albertz commented Oct 29, 2024

I'm more thinking about all the internal caching mechanisms of jobs, outputs, paths, the graph, etc. All of that must be thread-safe. (But it might be already, as we also use multi-threading elsewhere. I don't recall the details now.)

@Atticus1806
Copy link
Contributor Author

While my impression is they are, I am passing the question to the experts since I dont know for sure.

Copy link
Contributor

@critias critias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any problem running create_aliases in a separate thread, but check_output might be problematic. If it slows down the startup to much you could consider removing it here. It will be called again inside the manager loop anyway.
That should be tested, but I think that's less dangerous than running it in an extra thread.

Side note: It would be nicer to use the already existing manager thread_pool instead of starting new threads.

@Atticus1806
Copy link
Contributor Author

Atticus1806 commented Nov 13, 2024

I updated to use the thread pool.

It will be called again inside the manager loop anyway.

While that is true, its called with other parameters right? So a "full" update is only done at the beginning. I dont mind removing it here and then saying "user needs to call update for a full update", but this should be discussed.
Or do we add a flag Skip_Output_update_on_startup or smth.?
If we want to keep it but thread it: any suggestions on how to check if this actually has no unwanted behavior? Maybe we can call it only for non active outputs or smth?

@critias
Copy link
Contributor

critias commented Nov 14, 2024

Good point, a simple solution might be to just add a semaphore at the start of check_output to ensure that it's not run twice in parallel.

@Atticus1806
Copy link
Contributor Author

I am currently testing the semaphore version, but what I am wondering about is this line:

self.check_output(write_output=False, update_all_outputs=True)

This looks quite redundant too me, since it is called in all relevant paths later aswell. Maybe this can be removed here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creating Alias and Outputs on startup are slow
4 participants