-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure our pre-flight HEAD request is always by-tag if we're pushing a tag #58
Conversation
…a tag This is a latent bug the system has had for large manifests that we now have for all manifests instead! 😄 (This also would artificially inflate our speedups since all the pre-flight HEAD requests would be by-digest which are infinitely cacheable and thus very, very fast.)
I found this while trying to figure out why my freshly-built
(I'm about to test it on that deploy now to see if it fixes it) |
Confirmed, |
This reverts commit c70a534. (After fixing the implementation bug, this made things worse, not better 😭)
Now that we know #56 does actually make things worse, I've included a partial revert of it here (keeping the addition of the comment). |
Confirmed, rebased my personal pipeline on this change and re-ran while that (twice run, consistent) 5m's run cache was still hot and we're back down to the 3m ballpark. |
Do we know why the short-lived #56 made things worse? More requests? More prone to hit rate limiting? Something else? |
I have some educated guesses, but it's definitely worth trying again with some tracing to find out with stronger data. 👍 |
Hmm, the obvious answer would be applying a CPU profile, but this isn't CPU bound, so a CPU profile actually won't tell us anything. We need something that will be inclusive of time spent waiting for Edit: gonna try playing with https://pkg.go.dev/runtime/trace -- I'm hoping we don't have to go all the way to something low-level like https://blog.golang.org/http-tracing or as all-encompassing as OpenTracing to answer this, but will continue exploring options |
Ironically, I think it's actually our read-only caching proxy at fault here -- that proxy works really well when we're doing lookups by digest because it can cache them effectively indefinitely, but when doing a lookup by-tag it has to proxy to Docker Hub on every request, and I think what we're seeing is the result of that extra multi-hop latency. |
Good news -- I found our proxy had several bad behaviors that exacerbated this:
I've now fixed the proxy such that:
Before making these fixes, I had tested with the proxy totally bypassed (all requests going directly to Docker Hub), which is how I narrowed down that the proxy was responsible. After making these fixes, I've re-tested and got ~exactly the same speed as the direct Hub requests via the proxy, so we should be good to go to re-apply #56 and see accross-the-board speed gains 🥳 |
This is a latent bug the system has had for large manifests that we now have for all manifests instead! 😄
(This also would artificially inflate our speedups since all the pre-flight HEAD requests would be by-digest which are infinitely cacheable and thus very, very fast.)
Follow-up to #56