Slow resolution from Azure DevOps feed #11379

thomasaarholt · 2025-02-10T08:50:56Z

Summary

Hi!
I'm trying to find the best way to debug why installing from my work's package index is so slow. The dependency resolution is the slow part, at 25 sec for our index vs 0.3 sec for pypi.

I have followed the uv instructions for alternative package indexes. We mostly mirror pypi, and for these packages I don't expect there to be any real difference, except that it should be a bit faster since we probably host fewer packages.

There is no authentication window that I need to click on, everything happens automatically. We've added keyring to uv tool with uv tool install keyring --with artifacts-keyring.

Summary logs

# From pypi
❯ uv pip install pandas numpy scikit-learn --no-cache
Resolved 10 packages in 379ms
Prepared 10 packages in 2.44s
Installed 10 packages in 617ms
# (package versions printed here)

# From our index
uv pip install pandas numpy scikit-learn --no-cache --keyring-provider=subprocess --index-url=<redacted>
⠴ Resolving dependencies...                                                                                                                                                                                  [Information] [CredentialProvider]VstsCredentialProvider - Acquired bearer token using 'REDACTED'
[Information] [CredentialProvider]VstsCredentialProvider - Attempting to exchange the bearer token for an Azure DevOps session token.
Resolved 10 packages in 25.31s # <-------------------- 25 sec vs 0.3 sec!
Prepared 10 packages in 3.70s
Installed 10 packages in 985ms
# (package versions printed here)

Verbose logs

I include full logs when appending --verbose to the calls for the pypi and other index-url case.

Platform

Windows 11 x86_64

Version

uv 0.5.25 (9c07c3f 2025-01-28)

Python version

3.12.8

The text was updated successfully, but these errors were encountered:

konstin · 2025-02-10T10:32:23Z

When resolving, uv needs the metadata of every version used in the resolution. This information we get from a file called METADATA inside the <name>-<version>.dist-info directory of the wheel. PyPI exposes this file directly in the API (PEP 658). For most other server we can use HTTP Range requests to read only the METADATA file from the zip without downloading and unpacking the entire file. The index you're using does neither support PEP 658 nor HTTP Range requests, so we have to download each wheel file to extract its metadata.

The hint is in this log message:

WARN Range requests not supported for numpy-2.2.2-cp311-cp311-win_amd64.whl; streaming wheel

thomasaarholt · 2025-02-10T16:35:18Z

Thank you so much for this quick response, this makes perfect sense. I’m contacting those responsible for adding support for this.

I’m closing this, but any pointers or advice for adding support for this sort of thing to a tracker is very much appreciated! I’m very pro not spending unnecessary energy boiling the oceans, especially in a CI!

thomasaarholt · 2025-02-12T12:41:39Z

Reopening this because I have some new information and am trying to find a minimum repro.

Full disclosure: I actually work at Microsoft, but as a Data Scientist and not affiliated with the Azure teams. My opinions are my own.

I raised this internally within Microsoft, and learnt that we do not support PEP 658 yet, but we do support HTTP range requests. However, we think that uv might not be handling the fact that we redirect from Azure Storage to Azure Front Door, a caching service.

My intention is now to see if we can work out why uv doesn't handle the redirect for HTTP range requests.

I have created a public tracker on Azure DevOps (ADO) where I can add specific python packages from PyPI to make them available to the public on that tracker. I have added PyPI as an "upstream provider", which lets an authenticated user to the ADO project (me) add any PyPI package version to the feed for public consumption. I do that via pip install <package>==<version> while authenticating via artifacts-keyring (I install it directly in the venv, pip didn't pick up the uv tool). There's actually a (possibly redirect-related?) bug with uv here, as I can't get uv to authenticate correctly for packages that aren't already in the index. Will make a new issue on that, but that requires authentication to the server to debug.

export INDEX='https://pkgs.dev.azure.com/thomasaarholt/thomasaarholt-python/_packaging/thomas-python-feed/pypi/simple/'
uv venv --seed --quiet
source .venv/bin/activate.fish # activate however you normally do
uv pip install numpy pandas scikit-learn --index-url=$INDEX --no-cache -v  &| grep stream # I pipe stdout and stderr into grep

I've run the above a few times now, and always get at least one package warning, sometimes two or three:

❯ export INDEX='https://pkgs.dev.azure.com/thomasaarholt/thomasaarholt-python/_packaging/thomas-python-feed/pypi/simple/'
  uv venv --seed --quiet
  source .venv/bin/activate.fish
  uv pip install numpy pandas scikit-learn --index-url=$INDEX --no-cache -v  &| grep stream # I pipe stdout and stderr into grep
# WARN Range requests not supported for scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl; streaming wheel

# rerun
❯ export INDEX='https://pkgs.dev.azure.com/thomasaarholt/thomasaarholt-python/_packaging/thomas-python-feed/pypi/simple/'
  uv venv --seed --quiet
  source .venv/bin/activate.fish
  uv pip install numpy pandas scikit-learn --index-url=$INDEX --no-cache -v  &| grep stream # I pipe stdout and stderr into grep
# WARN Range requests not supported for pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl; streaming wheel
# WARN Range requests not supported for scikit_learn-1.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl; streaming wheel
# WARN Range requests not supported for numpy-2.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl; streaming wheel

Is there any chance I could get some help debugging this from the uv team?

zanieb · 2025-02-12T14:01:52Z

There's actually a (possibly redirect-related?) bug with uv here, as I can't get uv to authenticate correctly for packages that aren't already in the index. Will make a new issue on that, but that requires authentication to the server to debug.

I'm not seeing the opt-in to using the keyring and a username in the index URL.

On the range requests...

Is it feasible to create a simple nginx container or something that replicates the setup?

Can you reproduce with async_http_range_reader directly?

cc @konstin I think you have some familiarity with async_http_range_reader?

thomasaarholt · 2025-02-12T16:33:24Z

I'm not seeing the opt-in to using the keyring and a username in the index URL.

Yes, I'm showing the regular index url here since the user shouldn't need authentication when installing packages that are on the feed. When trying to get uv to authenticate to add packages from PyPI to my public feed, I was using:

export UV_INDEX='https://[email protected]/thomasaarholt/thomasaarholt-python/_packaging/thomas-python-feed/pypi/simple/'

uv venv --seed --quiet
source .venv/bin/activate.fish
uv pip install numpy==2.1.1 --index-url=$UV_INDEX --keyring-provider='subprocess' -v

Here's the log for that in a gist.

To clear one possible point of confusion: One must be an authenticated user in order to install an upstream (PyPI) package through an ADO feed. This is by design (purple box).

On the range requests...

I might be able to. Never worked with nginx before, and only a bit with rust. I will try if I can find the time.

zanieb · 2025-02-12T16:40:33Z

So a range request does work with curl if you provide the -L flag to follow the redirects

❯ curl -H "Range: bytes=0-1048575" "https://pkgs.dev.azure.com/thomasaarholt/5e230929-47ac-48ad-97db-15ab1d2aa0c5/_packaging/56fc1e3a-f592-420b-bd14-a8c967a65c28/pypi/download/pandas/2.2.3/pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=c124333816c3a9b03fbeef3a9f230ba9a737e9e5bb4060aa2107a86cc0a497fc" -f -L > chunk

zanieb · 2025-02-12T16:45:36Z

But... if you make a HEAD request, the index returns a HTTP 405 method not allowed

❯ curl --head -L "https://pkgs.dev.azure.com/thomasaarholt/5e230929-47ac-48ad-97db-15ab1d2aa0c5/_packaging/56fc1e3a-f592-420b-bd14-a8c967a65c28/pypi/download/pandas/2.2.3/pandas-2.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
HTTP/2 405

which is what we use to determine if range requests are supported

uv/crates/uv-client/src/registry_client.rs

Lines 711 to 753 in 748582e

    
           let req = self 
        
               .uncached_client(url) 
        
               .head(url.clone()) 
        
               .header( 
        
                   "accept-encoding", 
        
                   http::HeaderValue::from_static("identity"), 
        
               ) 
        
               .build() 
        
               .map_err(|err| ErrorKind::from_reqwest(url.clone(), err))?; 
        
           // Copy authorization headers from the HEAD request to subsequent requests 
        
           let mut headers = HeaderMap::default(); 
        
           if let Some(authorization) = req.headers().get("authorization") { 
        
               headers.append("authorization", authorization.clone()); 
        
           } 
        
           // This response callback is special, we actually make a number of subsequent requests to 
        
           // fetch the file from the remote zip. 
        
           let read_metadata_range_request = |response: Response| { 
        
               async { 
        
                   let mut reader = AsyncHttpRangeReader::from_head_response( 
        
                       self.uncached_client(url).clone(), 
        
                       response, 
        
                       url.clone(), 
        
                       headers.clone(), 
        
                   ) 
        
                   .await 
        
                   .map_err(|err| ErrorKind::AsyncHttpRangeReader(url.clone(), err))?; 
        
                   trace!("Getting metadata for {filename} by range request"); 
        
                   let text = wheel_metadata_from_remote_zip(filename, url, &mut reader).await?; 
        
                   let metadata = 
        
                       ResolutionMetadata::parse_metadata(text.as_bytes()).map_err(|err| { 
        
                           Error::from(ErrorKind::MetadataParseError( 
        
                               filename.clone(), 
        
                               url.to_string(), 
        
                               Box::new(err), 
        
                           )) 
        
                       })?; 
        
                   Ok::<ResolutionMetadata, CachedClientError<Error>>(metadata) 
        
               } 
        
               .boxed_local() 
        
               .instrument(info_span!("read_metadata_range_request", wheel = %filename)) 
        
           };

https://github.com/prefix-dev/async_http_range_reader/blob/c2a1b5a63712a34bca661cc94389fb03fb0dcddd/src/lib.rs#L310-L318

zanieb · 2025-02-12T16:48:44Z

I would be curious to hear why they don't support HEAD requests — as that seems to be the crux of the problem here.

You could set RUST_LOG=trace to get verbose logs from the network layer and confirm the HTTP 405 is occurring.

thomasaarholt · 2025-02-12T16:52:57Z

That was quick! I've brought it up internally now. Thank you!

konstin · 2025-02-12T16:53:46Z

For additional context, we should be following redirects in the HEAD request, but we need the initial HEAD requests to determine whether the server supports range requests (https://github.com/prefix-dev/async_http_range_reader/blob/9c1e2d3082745f3a75d8065a3dd145cd51b2715a/src/lib.rs#L310-L318).

thomasaarholt · 2025-02-12T17:05:38Z

You could set RUST_LOG=trace to get verbose logs from the network layer and confirm the HTTP 405 is occurring.

✅

export INDEX='https://pkgs.dev.azure.com/thomasaarholt/thomasaarholt-python/_packaging/thomas-python-feed/pypi/simple/'
export RUST_LOG=trace
uv venv --seed --quiet
source .venv/bin/activate.fish # activate however you normally do
uv pip install numpy --index-url=$INDEX --no-cache -v  &> log.txt

# Ctrl+F 405
TRACE Attempting to retry error: Error { kind: WrappedReqwestError(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("pkgs.dev.azure.com")), port: None, path: "/thomasaarholt/5e230929-47ac-48ad-97db-15ab1d2aa0c5/_packaging/56fc1e3a-f592-420b-bd14-a8c967a65c28/pypi/download/numpy/2.2.2/numpy-2.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", query: None, fragment: Some("sha256=02935e2c3c0c6cbe9c7955a8efa8908dd4221d7755644c59d1bba28b94fd334f") }, WrappedReqwestError(Reqwest(reqwest::Error { kind: Status(405), url: "https://pkgs.dev.azure.com/thomasaarholt/5e230929-47ac-48ad-97db-15ab1d2aa0c5/_packaging/56fc1e3a-f592-420b-bd14-a8c967a65c28/pypi/download/numpy/2.2.2/numpy-2.2.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl#sha256=02935e2c3c0c6cbe9c7955a8efa8908dd4221d7755644c59d1bba28b94fd334f" }))) }

thomasaarholt added the bug Something isn't working label Feb 10, 2025

konstin added question Asking for clarification or support upstream An upstream dependency is involved and removed bug Something isn't working labels Feb 10, 2025

thomasaarholt closed this as completed Feb 10, 2025

thomasaarholt reopened this Feb 12, 2025

thomasaarholt changed the title ~~Slow package resolution from authenticated package index~~ Slow resolution from Azure DevOps feed Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow resolution from Azure DevOps feed #11379

Slow resolution from Azure DevOps feed #11379

thomasaarholt commented Feb 10, 2025 •

edited

Loading

konstin commented Feb 10, 2025

thomasaarholt commented Feb 10, 2025

thomasaarholt commented Feb 12, 2025 •

edited

Loading

zanieb commented Feb 12, 2025

thomasaarholt commented Feb 12, 2025 •

edited

Loading

zanieb commented Feb 12, 2025

zanieb commented Feb 12, 2025 •

edited

Loading

zanieb commented Feb 12, 2025

thomasaarholt commented Feb 12, 2025

konstin commented Feb 12, 2025

thomasaarholt commented Feb 12, 2025

Slow resolution from Azure DevOps feed #11379

Slow resolution from Azure DevOps feed #11379

Comments

thomasaarholt commented Feb 10, 2025 • edited Loading

Summary

Summary logs

Verbose logs

Platform

Version

Python version

konstin commented Feb 10, 2025

thomasaarholt commented Feb 10, 2025

thomasaarholt commented Feb 12, 2025 • edited Loading

zanieb commented Feb 12, 2025

thomasaarholt commented Feb 12, 2025 • edited Loading

zanieb commented Feb 12, 2025

zanieb commented Feb 12, 2025 • edited Loading

zanieb commented Feb 12, 2025

thomasaarholt commented Feb 12, 2025

konstin commented Feb 12, 2025

thomasaarholt commented Feb 12, 2025

thomasaarholt commented Feb 10, 2025 •

edited

Loading

thomasaarholt commented Feb 12, 2025 •

edited

Loading

thomasaarholt commented Feb 12, 2025 •

edited

Loading

zanieb commented Feb 12, 2025 •

edited

Loading