Ensure our pre-flight HEAD request is always by-tag if we're pushing a tag #58

tianon · 2024-06-26T00:14:03Z

This is a latent bug the system has had for large manifests that we now have for all manifests instead! 😄

(This also would artificially inflate our speedups since all the pre-flight HEAD requests would be by-digest which are infinitely cacheable and thus very, very fast.)

Follow-up to #56

…a tag This is a latent bug the system has had for large manifests that we now have for all manifests instead! 😄 (This also would artificially inflate our speedups since all the pre-flight HEAD requests would be by-digest which are infinitely cacheable and thus very, very fast.)

tianon · 2024-06-26T00:14:53Z

I found this while trying to figure out why my freshly-built tianon/buildkit:0.13 image was not being pushed, even though the output looks good:

Pushing manifest sha256:1b479e0976d87ecf6d76e9d7fff51a24efcdbfef118449bae49b724bd8b14c56:
 - tianon/buildkit:0.13.2
 - tianon/buildkit:0.13

(I'm about to test it on that deploy now to see if it fixes it)

tianon · 2024-06-26T00:17:00Z

Confirmed, tianon/buildkit:0.13 was stuck at sha256:dab7ad84db6d538ecae96cad77648ee21f47f0c1ecaabf91e5d5f527ecfa1556 before and now is successfully correctly updated to sha256:1b479e0976d87ecf6d76e9d7fff51a24efcdbfef118449bae49b724bd8b14c56

tianon · 2024-06-26T00:27:36Z

Dramatically slower, because tag lookups are slow (fair/expected).

My personal no-op deploys were at ~3m30s before #56, dropped to 30s (and then 10s, once the cache was really hot) after, and now with this change go back up to ~5m8s, so #56 might actually make things worse not better 😬 😭

This reverts commit c70a534. (After fixing the implementation bug, this made things worse, not better 😭)

tianon · 2024-06-26T00:32:53Z

Now that we know #56 does actually make things worse, I've included a partial revert of it here (keeping the addition of the comment).

tianon · 2024-06-26T00:39:25Z

Confirmed, rebased my personal pipeline on this change and re-ran while that (twice run, consistent) 5m's run cache was still hot and we're back down to the 3m ballpark.

whalelines · 2024-06-26T15:33:45Z

Do we know why the short-lived #56 made things worse? More requests? More prone to hit rate limiting? Something else?

tianon · 2024-06-26T16:15:34Z

I have some educated guesses, but it's definitely worth trying again with some tracing to find out with stronger data. 👍

tianon · 2024-06-26T20:45:48Z

Hmm, the obvious answer would be applying a CPU profile, but this isn't CPU bound, so a CPU profile actually won't tell us anything. We need something that will be inclusive of time spent waiting for net/http requests.

Edit: gonna try playing with https://pkg.go.dev/runtime/trace -- I'm hoping we don't have to go all the way to something low-level like https://blog.golang.org/http-tracing or as all-encompassing as OpenTracing to answer this, but will continue exploring options

tianon · 2024-06-26T21:50:58Z

Ironically, I think it's actually our read-only caching proxy at fault here -- that proxy works really well when we're doing lookups by digest because it can cache them effectively indefinitely, but when doing a lookup by-tag it has to proxy to Docker Hub on every request, and I think what we're seeing is the result of that extra multi-hop latency.

tianon · 2024-06-26T23:10:33Z

Good news -- I found our proxy had several bad behaviors that exacerbated this:

our proxy changes GET of a tag into a HEAD of the tag that then redirects to a GET of the digest (good for cache!)
an unintentional side effect of that was that a HEAD of a tag was also being turned into a redirect, which is what causes the really bad behavior we were seeing here (every HEAD of a tag becomes a 2x HEAD, which is understandably much slower and the requests are all so tiny they're not really cacheable)
additionally a system in front of our proxy was "helpfully" converting all our HEAD requests into GET requests to the backend since it's a violation of the spec for HEAD/GET to have differing behavior/responses (and that's the cache layer, so it would need to cache)

I've now fixed the proxy such that:

HEAD does not redirect -- it just returns the upstream HEAD data directly (with munging of cache headers only for cacheable requests like by-digest)
GET on a tag still redirects, giving us better cache behavior
the cache in front of the proxy will only cache requests with sha256: in them (thus allowing HEAD of a tag to stay HEAD all the way back)

Before making these fixes, I had tested with the proxy totally bypassed (all requests going directly to Docker Hub), which is how I narrowed down that the proxy was responsible. After making these fixes, I've re-tested and got ~exactly the same speed as the direct Hub requests via the proxy, so we should be good to go to re-apply #56 and see accross-the-board speed gains 🥳

tianon requested a review from yosifkit as a code owner June 26, 2024 00:14

yosifkit previously approved these changes Jun 26, 2024

View reviewed changes

Revert "Attempt to speed up deploy"

4473698

This reverts commit c70a534. (After fixing the implementation bug, this made things worse, not better 😭)

tianon dismissed yosifkit’s stale review via 4473698 June 26, 2024 00:32

yosifkit approved these changes Jun 26, 2024

View reviewed changes

tianon merged commit d85d365 into docker-library:main Jun 26, 2024
1 check passed

tianon deleted the lookup-by-digest branch June 26, 2024 00:39

tianon mentioned this pull request Jun 26, 2024

Revert "Revert "Attempt to speed up deploy"" #60

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure our pre-flight HEAD request is always by-tag if we're pushing a tag #58

Ensure our pre-flight HEAD request is always by-tag if we're pushing a tag #58

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

whalelines commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024 •

edited

Loading

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

Ensure our pre-flight HEAD request is always by-tag if we're pushing a tag #58

Ensure our pre-flight HEAD request is always by-tag if we're pushing a tag #58

Conversation

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

whalelines commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024 • edited Loading

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024

tianon commented Jun 26, 2024 •

edited

Loading