Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heroku-24: Remove Python from the run image #276

Merged
merged 1 commit into from
Mar 21, 2024

Conversation

edmorley
Copy link
Member

@edmorley edmorley commented Mar 20, 2024

(This is stacked on top of #275.)

Since:

  • This system Python installation is not a full Python installation - it does not include the package manager Pip and the venv (virtual environment) + ensurepip modules don't work.
  • As such no Python packages can be installed with it, and the only thing usable is the Python standard library. Notably, the Python standard library does not contain a production-ready web server. (The http.server module is insecure for production use - on Heroku, we recommend Python apps use Gunicorn or Uvicorn.)
  • Therefore Python apps will (or should be) be using Python provided by the Python buildpack instead.
  • Non-Python buildpacks/apps typically don't need Python at runtime (or if they really do, then adding the Python buildpack is still preferred).
  • Having Python in the run image has caused confusion in support tickets where the Python buildpack wasn't present (such as it being accidentally replaced when adding second buildpack), since at runtime apps then fail with a less obvious ModuleNotFoundError or ImportError error instead of python: command not found.
  • None of our other officially supported languages (that have their own buildpacks) are also installed as system packages in the base image.
  • Until now it's not been possible for us to remove system Python from the run image, since Python was also pulled in via transitive dependencies of other system packages (such as lsb_release). However, as of Ubuntu 24.04 lsb_release no longer depends upon Python, so we're finally able to remove system Python if we wish.
  • Once Heroku-24 GAs we can't remove packages (since it will break backwards compatibility given stack rebasing), however, we can add packages - so we should err on the side of trying out removing packages now.
  • (As an added bonus) Removing Python reduces the run image size by 34 MB, and in a CNB world image size is a much bigger concern, so we need to be more selective about what packages we include.

For more details, see:
https://salesforce.quip.com/aNvHAENM0GdT

Python is still in the build image since various non-Python use-cases need it (for example Node.js packages that use node-gyp require Python at install time), plus several other system packages in the build image depend on python3 anyway (so it's going to be installed regardless).

I've intentionally removed the python-is-python3 package entirely (rather than still including it in the build image), since the vast majority of tooling will (or should be) checking for the presence of python3 directly (given that's the default name on Ubuntu unless the backward compat package is installed; node-gyp does for example). And for most end-user/app use-cases we would prefer they use the Python buildpack (rather than system Python), so a python: command not found will nudge them in that direction. We can always add python-is-python3 back later if this turns out to be a bigger issue than expected.

Before (once the other PRs are merged):

-----> Size breakdown...
       heroku/heroku:24         424MB
       heroku/heroku:24-build   1.11GB

After:

-----> Size breakdown...
       heroku/heroku:24         390MB  (34 MB reduction)
       heroku/heroku:24-build   1.11GB (unchanged)

Note: The classic PHP buildpack does use Python in its heroku-php-apache2 and heroku-php-nginx scripts, however, it's only used when realpath doesn't exist (eg macOS), so is unused on Heroku. The buildpack will need to adjust for the python-is-python3 removal, but arguably should have done that previously (given during the Python 2 -> 3 transition the major version of python changed). (If it needs to support environments where only the command python exists, and not python3, then it can use something like: PYTHON=$(which python3 || which python))

There is also Direwolf test or two that will need updating, but (a) migrating those will be fairly easy (add the Python buildpack or switch to another alternative), (b) we shouldn't be bloating the run image for internal testing use-cases.

Towards #266.
GUS-W-15159536.

@edmorley edmorley self-assigned this Mar 20, 2024
@edmorley edmorley marked this pull request as ready for review March 20, 2024 17:43
@edmorley edmorley requested a review from a team as a March 20, 2024 17:43
@edmorley edmorley force-pushed the edmorley/h24-python-build-only branch from 52b41fd to 7139860 Compare March 21, 2024 15:12
Copy link
Contributor

@colincasey colincasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edmorley this seems fine but, out of curiosity, why was python added to the run image in the first place?

@edmorley
Copy link
Member Author

edmorley commented Mar 21, 2024

out of curiosity, why was python added to the run image in the first place?

Looking at the Git log, the following early commits (and changed patch lines) reference Python:

$ git-content-search python | tail -n 20
cb0fdbf Add script for bootstrapping new stack image
A	heroku-16-build/bin/heroku-16-build.sh
A	heroku-16/bin/heroku-16.sh
abee9fd don't write S3 credentials to disk
M	Vagrantfile
b264bba Remove script for deprecated stack image
D	bin/cedar.sh
4fd988e resolves syncing with build script and brings into parity with cedarish. also makes the image slightly smaller.
M	Dockerfile
55328da Install Python and Ruby header files
M	Dockerfile
M	bin/cedar-14.sh
40d01e2 Add Dockerfile
A	Dockerfile
95ea6e3 Install Python package
M	bin/cedar-14.sh
3795a42 Copy script for bootstrapping stack
A	bin/cedar-14.sh
47881eb Initial commit.
A	bin/cedar.sh

(This makes use of an alias I have in my bash profile: alias git-content-search='git log --oneline --name-status -S')

It appears 10 years ago the Cedar-10 images (which only existed in a single image; there weren't build vs run images then) used to install Python, Ruby, Erlang etc from source:
47881eb#diff-78c6d69a91f336508001acf79059aa7a0968cb50b586997b00891673ec0c2139R40-R62

This was then switched to using Python from APT in:
95ea6e3

And then when Heroku-16 was created (which was the first stack version to have split build and run images), this was copy-pasted into the run image (rather than the build image) without any explanation:
cb0fdbf#diff-644665e3a544eefddf93bee1148791f2f0688e257d956cd151625f39a1ef4701R119

(ruby was also copied into the Heroku-16 run image from Cedar-14, but Ruby was later removed when we created Heroku-22 in #209)

So Python's inclusion seems more like a leftover from the pre build vs run image days (and a time where image size was less of a concern in general), rather than needing it for a specific purpose in the run image.

@edmorley edmorley force-pushed the edmorley/h24-python-build-only branch from 7139860 to aff47f8 Compare March 21, 2024 16:26
Base automatically changed from edmorley/h24-rm-stunnel to main March 21, 2024 16:52
Since:
- Python apps will (or should be) be using Python provided by the
  Python buildpack instead.
- Non-Python buildpacks/apps typically don't need Python at runtime.
- Having Python in the run image has caused confusion in support tickets
  where the Python buildpack wasn't present (such as it being
  accidentally replaced when adding second buildpack), since at runtime
  apps then fail with a less obvious `ModuleNotFound` error instead of
  `python: command not found`.
- None of our other officially supported languages (that have their own
  buildpacks) are also installed as system packages in the base image.
- Removing Python reduces the run image size by 34 MB, and in a CNB
  world image size is a much bigger concern, so we need to be more
  selective about what packages we include.
- Once Heroku-24 GAs we can't remove packages (since it will break
  backwards compatibility given stack rebasing), however, we can add
  packages - so we should err on the side of trying out removing
  packages now.

Python is still in the build image since various non-Python use-cases
need it (for example Node.js packages that use node-gyp require Python
at install time), plus several other system packages in the build image
depend on it anyway.

I've intentionally removed the `python-is-python3` package entirely
(rather than still including it in the build image), since the vast
majority of tooling will (or should be) checking for the presence of
`python3` directly (given that's the default name on Ubuntu unless the
backward compat package is installed). And for most end-user/app
use-cases we would prefer they use the Python buildpack (rather than
system Python), so a `python: command not found` will nudge them in that
direction. We can always add `python-is-python3` back later if this
turns out to be a bigger issue than expected.

Note: The classic PHP buildpack does use Python in its
`heroku-php-apache2` and `heroku-php-nginx` scripts, however, it's only
used when `realpath` doesn't exist (eg macOS), so is unused on Heroku.
The buildpack will need to adjust for the `python-is-python3` removal,
but arguably should have done that previously (given during the Python
2 -> 3 transition the major version of `python` changed). (If it needs
to support environments where only the command `python` exists, and not
`python3`, then it can use something like:
`PYTHON=$(which python3 || which python)`)


Before (once the other PRs are merged):

```
-----> Size breakdown...
       heroku/heroku:24         424MB
       heroku/heroku:24-build   1.11GB
```

After:

```
-----> Size breakdown...
       heroku/heroku:24         390MB  (34 MB reduction)
       heroku/heroku:24-build   1.11GB (unchanged)
```

Towards #266.
GUS-W-15159536.
@edmorley edmorley force-pushed the edmorley/h24-python-build-only branch from aff47f8 to e033b35 Compare March 21, 2024 16:52
@edmorley edmorley enabled auto-merge (squash) March 21, 2024 16:53
@edmorley edmorley merged commit f6a1b29 into main Mar 21, 2024
4 checks passed
@edmorley edmorley deleted the edmorley/h24-python-build-only branch March 21, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants