Creating Alias and Outputs on startup are slow #213

Atticus1806 · 2024-10-29T08:36:54Z

I am currently running into the issue that my manager startup is slowed down by updating all alias and outputs every time. I am wondering, is this even required for the manager to work properly? I guess this can potentially cause files to be "missing" from output and alias, but behaviour itself should be safe right?

sisyphus/sisyphus/manager.py

Line 551 in 3181d72

create_aliases(self.sis_graph.jobs())

For context I am currently looking at 52 secs until config load, 280 secs for alias and 1370 secs for outputs. While this probably is also related to slow fs and creating quite a number of outputs/alias these are hard to fix for me right now. In case the behaviour is not endangered I would create a PR adding a flag to disable the full update on startup.

My first test shows that this should work, but since I am not that familiar with the manager loop I want to make sure this does not implicitly break anything.

michelwi · 2024-10-29T08:55:52Z

is this even required for the manager to work properly? I guess this can potentially cause files to be "missing" from output and alias, but behaviour itself should be safe right?

I think I would agree, in principle the manager could already start without all outputs in place. Unless of cause you are a naughty person and define tk.Paths into your output folder.

disable the full update on startup.

I am not sure when else the full update will be happening.

I guess in cases where you kill the manager to clear Jobs that go into error state, there is not much use in updating everything every time.
But when you kill the manager, change your graph and the outputs and then restart it, then we would need to update on startup; otherwise all aliases and outputs would still point to the old versions before the change and (the outputs) will only be updated once the manager finishes

sisyphus/sisyphus/manager.py

Line 632 in 3181d72

    
           self.check_output(write_output=self.link_outputs, update_all_outputs=True, force_update=True)

(assuming you are not impatient like me and hit ctrl+c a couple of times to get the shell back quicker)

Maybe the update could be pushed into a thread that runs in parallel to the manager loop?

JackTemaki · 2024-10-29T08:57:11Z

Maybe the update could be pushed into a thread that runs in parallel to the manager loop?

This sounds like a good idea.

Atticus1806 linked a pull request Oct 29, 2024 that will close this issue

Add threads for alias and output updates on startup #214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating Alias and Outputs on startup are slow #213

Creating Alias and Outputs on startup are slow #213

Atticus1806 commented Oct 29, 2024

michelwi commented Oct 29, 2024

JackTemaki commented Oct 29, 2024

Creating Alias and Outputs on startup are slow #213

Creating Alias and Outputs on startup are slow #213

Comments

Atticus1806 commented Oct 29, 2024

michelwi commented Oct 29, 2024

JackTemaki commented Oct 29, 2024