-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slow repo_list with many refs #310
Conversation
klaus/repo.py
Outdated
Cache is invalidated if one of the ref targets changes, | ||
eg. a new commit has been made and 'refs/heads/master' was changed. | ||
""" | ||
if len(self.refs.keys()) > 10000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10000 still seems like quite a lot, there's no way to meaningfully display that - and it still means 10k random accesses of the repository.
What if there are more than 10000 heads, as is the case with the nix repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah damn I thought that most heads are branches but eg. GH PRs are also heads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the number used here is not displayed anywhere, just the maximum timestamp of the refs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah damn I thought that most heads are branches but eg. GH PRs are also heads
Wait no they are not
That's the initial request, still taking quite a long time (in fact almost hitting the default gunicorn timeout again). For posterity's sake; the nixpkgs repository itself (which at this point already had its timestamp cached):
|
Hmm I wonder why it's still so slow. For me it takes around 1s. Is the repo on a slow hard drive? @jelmer is there a way to batch-lookup timestamps with Dulwich that's faster? |
The main alternative is to iterate over most of the repository using something like Repo.object_store.iterobject_subset(). That will at least remove the need to reinflate delta bases more than once (which you would probably be doing when doing random access). That's a new API (in 0.21.0 I think), and not used much outside of dulwich itself yet. |
next(repo.object_store.iterobjects_subset([repo[b"HEAD"].id]))
Am I using it wrong? |
That seems right, and works here (although on a different repo, obviously). Does repo[repo[b'HEAD'].id] work? What version of dulwich are you on? (it's possible this is a bug, but curious what triggers it) |
A slow SSD, which should still be a lot faster than an HDD but not up to speed with current SSDs. |
Unfortunately it is actually slower:
|
@benaryorg what's your CPU? I wonder how it can be 20x slower than mine |
Intel(R) Atom(TM) CPU N2800 @ 1.86GHz This is probably one of my oldest and slowest servers *sigh* |
@benaryorg please try this, closes #309