IMPALA-10994: Normalize the pip package name part of download URL.

According to PEP-0503, pip repo server doesn't support unnormalized URL access, and some package name within 'infra/python/deps/*requirements.txt' are unnormalized, e.g. 'Cython', and pip_download.py will concat $PYPI_MIRROR and package name to get download URL directly, which maybe unnormalized. Fix this by normalize package name in download URL using the recommanded method in PEP-0503. Change-Id: I479df0ad7acf3c650b8f5317372261d5e2840864 Reviewed-on: http://gerrit.cloudera.org:8080/17987 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]>
DarvenDuan · Nov 26, 2021 · f566e7d · f566e7d
1 parent da53428
commit f566e7d
Showing 1 changed file with 2 additions and 1 deletion.
diff --git a/infra/python/deps/pip_download.py b/infra/python/deps/pip_download.py
@@ -82,7 +82,8 @@ def get_package_info(pkg_name, pkg_version):
   # to sort them and return the first value in alphabetical order. This ensures that the
   # same result is always returned even if the ordering changed on the server.
   candidates = []
-  url = '{0}/simple/{1}/'.format(PYPI_MIRROR, pkg_name)
+  normalized_name = re.sub(r"[-_.]+", "-", pkg_name).lower()
+  url = '{0}/simple/{1}/'.format(PYPI_MIRROR, normalized_name)
   print('Getting package info from {0}'.format(url))
   # The web page should be in PEP 503 format (https://www.python.org/dev/peps/pep-0503/).
   # We parse the page with regex instead of an html parser because that requires