You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 5, 2023. It is now read-only.
The here-location-services package currently unconditionally depends on pandas, which depends on numpy, pytz and python-dateutil. On x86-64 Linux (for Python 3.9), these end up being very large (~130MB), with all of the rest of the dependencies being ~5MB. However, pandas is only used for converting the result for two functions associated with the matrix routing API:
It seems unfortunate to require these huge dependencies to be installed for only these wo functions when many people are likely to not be calling them anyway, and when the dependencies seemingly aren't required for any additional functionality within this client library.
Potential alternatives
Have pandas be an optional dependency (for example, via extra_requires={"pandas": ["pandas"]} in setup.py), and import it on-demand in the individual functions that need it. For example:
defto_distnaces_matrix(self):
"""Return distnaces matrix in a dataframe."""try:
frompandasimportDataFrameexceptImportErrorase:
raiseImportError("pandasisnotinstalled, run`pip install here-location-services[pandas]`) frome# ... existing implementation as before ...
For an example of prior art, this option is what the popular Pydantic library does:
Remove the pandas dependency totally, and have the functions return the nested lists (nested_distances) without converting to a DataFrame. A user who wants to use pandas can still convert to a DataFrame themselves: DataFrame(result.to_distnaces_matrix()) (the columns= argument seems to be unnecessary, as doing that call gives the same result AFAICT).
Both of these are probably best considered as breaking changes.
There's various ways to provide more code beyond the size limits (layers or docker images), but this provides some context for why someone might care about the size of a package and its dependency. (Those methods are fiddly enough and the cold start impact large enough that we've actually switched away from using this client library for now.)
Package size details
Here's some commands I used to investigate the size impact, leveraging pip install --target to install a set of packages to a specific directory:
uname -a # Linux 322c9a327f85 5.10.104-linuxkit #1 SMP PREEMPT Wed Mar 9 19:01:25 UTC 2022 x86_64 GNU/Linux
python --version # Python 3.9.10
pip install --target=everything here-location-services
pip install --target=deps-pandas requests geojson flexpolyline pyhocon requests_oauthlib
pip install --target=deps-no-pandas requests geojson flexpolyline pyhocon requests_oauthlib pandas
du -sh everything # 135M
du -sh deps-pandas # 134M
du -sh deps-no-pandas # 5.1M
du -sh everything/here_location_services # 484K
That is, without pandas, the total installed package size would be 5.1M (deps-no-pandas) + 484K (everything/here_location_services) = ~5.6MB, down from 135MB (everything).
Summary of individual packages (reported by du -sh everything/*, ignoring the $package.dist-info directories that are mostly less than 50k anyway):
package
size
only required for pandas?
pandas
58M
yes
numpy.libs
35M
yes
numpy
33M
yes
pytz
2.8M
yes
oauthlib
1.4M
urllib3
872K
dateutil
748K
yes
idna
496K
here_location_services
484K
8 others
1.5M
The text was updated successfully, but these errors were encountered:
Upvote this as well. Just ran into this with our AWS lambda as well. Are there any alternatives to make this work with lambda without doing too much work around?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Thank you for publishing a client library!
Issue
The
here-location-services
package currently unconditionally depends onpandas
, which depends onnumpy
,pytz
andpython-dateutil
. On x86-64 Linux (for Python 3.9), these end up being very large (~130MB), with all of the rest of the dependencies being ~5MB. However,pandas
is only used for converting the result for two functions associated with the matrix routing API:here-location-services-python/here_location_services/responses.py
Lines 151 to 182 in 325b4c0
It seems unfortunate to require these huge dependencies to be installed for only these wo functions when many people are likely to not be calling them anyway, and when the dependencies seemingly aren't required for any additional functionality within this client library.
Potential alternatives
extra_requires={"pandas": ["pandas"]}
in setup.py), and import it on-demand in the individual functions that need it. For example:python-dotenv
: https://github.com/samuelcolvin/pydantic/blob/8846ec4685e749b93907081450f592060eeb99b1/setup.py#L134-L137dotenv
within a function (not at the top level) and catching theImportError
to provide additional help to the user: https://github.com/samuelcolvin/pydantic/blob/8846ec4685e749b93907081450f592060eeb99b1/pydantic/env_settings.py#L297-L300nested_distances
) without converting to aDataFrame
. A user who wants to use pandas can still convert to a DataFrame themselves:DataFrame(result.to_distnaces_matrix())
(thecolumns=
argument seems to be unnecessary, as doing that call gives the same result AFAICT).Both of these are probably best considered as breaking changes.
Context
We were attempting to use this package in an AWS Lambda, which has strict size limits on the size of the code asset, and exceeding it results in errors like 'Unzipped size must be smaller than 262144000 bytes' when deploying (relevant docs: https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html#function-configuration-deployment-and-execution "Deployment package (.zip file archive)"). Additionally, larger packages result in slower cold starts: https://mikhail.io/serverless/coldstarts/aws/ .
There's various ways to provide more code beyond the size limits (layers or docker images), but this provides some context for why someone might care about the size of a package and its dependency. (Those methods are fiddly enough and the cold start impact large enough that we've actually switched away from using this client library for now.)
Package size details
Here's some commands I used to investigate the size impact, leveraging
pip install --target
to install a set of packages to a specific directory:That is, without pandas, the total installed package size would be 5.1M (
deps-no-pandas
) + 484K (everything/here_location_services
) = ~5.6MB, down from 135MB (everything
).Summary of individual packages (reported by
du -sh everything/*
, ignoring the$package.dist-info
directories that are mostly less than 50k anyway):The text was updated successfully, but these errors were encountered: