-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sisyphus is too slow #90
Comments
With current master (after #84, #85, #87), I get this flame graph: There is So there is
But then also |
Ok, removing this gets me from 14 secs to 8 secs, so this has quite some impact. Also, the hash does not change for me. But I'm not sure if it might be dangerous in other cases. (@critias ?) |
the |
But you plot shows why it is slow? Due to What wonders me much more is why |
Yes, why are they slow? They just do some simple string manipulation. This should be very fast. |
This is join for python3.8 on posix-related systems: def join(a, *p):
"""Join two or more pathname components, inserting '/' as needed.
If any component is an absolute path, all previous path components
will be discarded. An empty last part will result in a path that
ends with a separator."""
a = os.fspath(a)
sep = _get_sep(a)
path = a
try:
if not p:
path[:0] + sep #23780: Ensure compatible data type even if p is null.
for b in map(os.fspath, p):
if b.startswith(sep):
path = b
elif not path or path.endswith(sep):
path += b
else:
path += sep + b
except (TypeError, AttributeError, BytesWarning):
genericpath._check_arg_types('join', a, *p)
raise
return path |
Yes I looked at |
For me it brings about 1 second if I replace it in |
As @JackTemaki mentioned this can have problematic side effects. e.g.:
I would not expect the results to change after a Job was created. So if we want to support this as a speed up option we have to put a big warning sign next to it... |
I didn't expect os.path.join to be that expensive! In any case, it would be nicer to use |
Note that Windows also supports |
Maybe we can still get a lot of the speedup by doing a more clever Some of the big objects I have are |
True, this would be annoying with f-strings. Using os.path.sep was more of a principle thing, but it's probably not worth the effort at this point. Side note: A while ago when I tried to introduced f-strings there where complains of some people still using Python 3.5. I'm considering to bump the minimal Python version to 3.6 to be able to use f-strings 🤔
You could consider implementing a
A copy on write logic would be tough to implement, you would have to catch all write access to the original object and everything nested inside of it. I don't think I would like to rely on it. In any case if you want to try, I found this attempt to implement CoW: https://github.com/bannsec/pyCoW I'll take a look at the state update strategy, I already have a few ideas how to clean and speed up that code. |
What I measure (what is most relevant for me): startup time of the sis manager, up to the prompt.
My current benchmark:
i6_experiments.users.zeyer.experiments.chris_hybrid_2021.test_run
. This creates a graph of 526 jobs.I run this benchmark on a local computer with extremely fast FS, to separate the influence of slow FS at this point.
I would expect that the startup time takes a few millisecs, not more.
It took around 21 seconds the first time I tried this.
Now, via #84, #85, #87, it takes around 14 seconds for me.
For #85, I use
GRAPH_WORKER = 1
.So, this issue here is to discuss why this specific case is still so slow, and what we can potentially do to improve it.
I guess other cases will be different, so maybe we should better discuss them separately, to not mix-up things. When the FS is slow, I think we also can improve a lot. I would still expect that this runs in maybe 2-3 seconds max. But let's discuss that separately.
For profiling, I current use austin, and then I visualize the output in VSCode. Not sure what you would recommend to use instead. I run:
Or just timing:
The text was updated successfully, but these errors were encountered: