Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile improvements for final image size #92

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dombarnes
Copy link

Changes to reduce production image size

  1. Removebuild-essential from final image
  2. Set Bundler to only install production gems

These reduce the final output image from around 591MB to 226MB

@nickjj
Copy link
Owner

nickjj commented Dec 29, 2024

Hi,

Thanks for the suggestions. Here's a bit of feedback:

build-essential

To be honest I'm not sure how this image built without build-essential. The bigdecimal gem is written in C and it's used by Rails as a dependency. It requires a C compiler and other system level libraries to build it.

I have a larger project that uses this example app and I removed build-essential from it and bigdecimal fails to be installed with:

11.88 Gem::Ext::BuildError: ERROR: Failed to build gem native extension.
11.88
11.88     current directory: /usr/local/bundle/gems/bigdecimal-3.1.8/ext/bigdecimal
11.88 /usr/local/bin/ruby extconf.rb
11.88 checking for __builtin_clz()... *** extconf.rb failed ***
11.88 Could not create Makefile due to some reason, probably lack of necessary
11.88 libraries and/or headers.  Check the mkmf.log file for more details.  You may
11.88 need configuration options.

I included it out of convenience because tracking down a bunch of isolated C dependencies for gems that need to be compiled can be tricky for folks not experienced with Linux and there's quite a few of them in larger projects.

libpq-dev

That is another C dependency to compile the pg gem. I believe this technically works on your machine and CI because the gem author published an amd64 binary of the gem based on the naming conventions shown here: https://rubygems.org/gems/pg/versions

I don't see an arm64 binary which means the gem will need to be compiled on arm64 devices and that will fail without libpq-dev. That would mean anyone using an M1, M2, M3, etc. or arm64 device / server couldn't build images.

bundler changes

--deployment flag

I haven't tested this but will --deployment put all of the gems in vendor/bundle? That means when we volume mount the app, it will be reflected back out to our Docker host. We wouldn't want all of the gems sitting in a vendor folder on the host's file system.

That change which also mean binary paths will fail to resolve since that's not on the system path. I think that's why your CI build is currently failing because puma isn't found anymore.

BUNDLE_PATH

On the main branch of this project, that environment variable is unset. Does setting any of those flags in this PR somehow define this env var?

--without development test

Without installing development and test gems, how will we run this locally and tests in CI? Ideally we would want builds to work with all environments.

Have you compared the before and after in image size with only this change?

Assorted paths being removed

The Ruby cache and .git paths you provided don't exist so they will always be empty but rm -rf won't let you know the command failed.

For example /usr/local/lib/ruby/gems/3.4.0/cache has a few files but your path is "${BUNDLE_PATH}"/ruby/*/cache. The same type of scenario happens with the .git path, although even in the real path there's no .git directories.

--no-cache

If the cache directory were deleted then there would be nothing to cache. Based on the help menu for bundler, it sounds like this only comes into play if existing cache files exist?

--no-clean

I couldn't find a reference to this flag in the help menu and bundler also mentions the --clean flag is deprecated. What does this flag do? Is it the opposite of the clean command?

--retry

This is a good idea for resiliency, lucky for us bundler defaults to 3.

Here's the reference in the docs:

retry (BUNDLE_RETRY): The number of times to retry failed network requests. Defaults to 3.

-j

You set -j4 which is good to improve install speed, but bundler defaults to the number of CPU cores your system has. This allows it to be dynamic. For example if you have a 6 or 12 core work machine, setting 4 could slow builds down.

Here's the reference in the docs:

jobs (BUNDLE_JOBS): The number of gems Bundler can install in parallel. Defaults to the number of available processors.

Do you have benchmark results where setting 4 is faster than a higher number on a high CPU core machine? My workstation only has 4 cores so I can't test that.

RUN mkdir .yarn public log tmp

These directories will always exist.

@dombarnes
Copy link
Author

build-essential only needs to be in the asset stage, not the app stage usually. Just needed to bundle install. Once thats done, clean up the rubbish. Its still on L10 for gem building so not sure why you got errors there.

I've tried to port over stuff from my current Dockerfile (which is a bit more bespoke with other entrypoints and built to run on various Azure platforms. For CI, I usually allow RAILS_ENV and BUNDLE_WITHOUT as ARGs (amongst a bunch of other ones I need for my projects) so I can override during build phases in CI or locally.
I saw the deployment flag is gone in favour of bundle set config deployment true but haven't gone through and updated all mine yet. Similarly I do BUNDLE_PATH="/usr/local/bundle" and copy that over from build phase too, so I have some extra cleanup steps to remove waste and keep my image small (my main app is about 577MB)

The mkdir command, yeah thats redundant in yours cos you're copying all your files in from source during app phase, whereas I'm more used to copying from my build stage because I still use bootsnap so I have an extra step for precompiling all the ruby code in my build stage, then copy that over, which needs some other steps to work around since i need tmp in my destination but not tmp from my source.

Trying not to over-impose my own cases! I'll clean up the bundle commands and update if you'd like.

@nickjj
Copy link
Owner

nickjj commented Dec 29, 2024

so not sure why you got errors there.

Oh I didn't catch that it still existed in the asset stage. I removed it from both stages in my other project. Now it makes sense.

Similarly I do BUNDLE_PATH="/usr/local/bundle" and copy that over from build phase too

The default set up uses /usr/local/bundle.

For CI, I usually allow RAILS_ENV and BUNDLE_WITHOUT as ARGs (amongst a bunch of other ones I need for my projects) so I can override during build phases in CI or locally.

Can you do a before and after in image size on this project?

The concern I have with this is if you build separate images for different environments you run the risk of an untested image getting shipped to production. That's mostly why I've avoided this pattern.

For example, with the way things are now you can build your image + run your tests + push that image to your registry and run it in production. IMO that convenience is worth paying a 50MB image size tax or whatever it ends up being, of course it will vary with how many dev and test dependencies you have.

Also unless you're running on something like AWS Fargate where images aren't cached, you can leverage Docker's layer caching to avoid pulling the whole image on every deploy. You would only pull your dependency layer if your Gemfile changes, otherwise Docker will happily only transfer the diff on your app's code layer.

With these proposed changes you need your test dependencies to run your tests and then you would need to re-build a separate image with only production gems.

Separate to that, you can run parallel Docker builds to at least avoid having to wait twice as long on every deploy, but most hosted CI servers have pretty weak CPUs (2 cores, ~2ghz, etc.), so if you're running parallel bundles for 2 Docker images in parallel you may end up still getting dinged pretty hard for build times.

@dombarnes
Copy link
Author

Personally I'd rather keep my production image as clean as I can, reduce the attack risks with less gems available.
I've just reviewed my other Dockerfiles. I have NO idea where --no-clean came from - thats not in any of my projects!
I see your point on the retries and cores for install.
--deployment is probably good practice to ensure your lock file is present, but I guess it comes down to user preference.

Similar for yarn install --frozen-lockfile to ensure you don't change versions.

Since this is all CI builds, I'm personally not too bothered by run times, or duplicate runs. My CI agent will have layers cached and can reuse them during builds to speed it up if they're run in the same job (not sure if GH runs the same as ADO).
In terms of running an untested image, I'm concerned that my app code is tested, and I run staging and remote dev environments (PR and new feature branches) as it is so the images get booted and tested before production release anyway. I guess that comes down to personal choice over workflows.

Sounds like you're happy with just the build-essential change so feel free to just close this off.

@nickjj
Copy link
Owner

nickjj commented Dec 30, 2024

Sounds like you're happy with just the build-essential change so feel free to just close this off.

For now, I think so. That seems like low hanging fruit to decrease the production image size without changing any behavior.

Did you compare the size before and after just for that change?

If it's substantial do you want to open a smaller PR just for that?


I think --frozen-lockfile is good and bundler has a --frozen flag to just opt into that behavior instead of needing to take in everything --deployment has. Although --frozen is deprecated in favor of the frozen setting.

Keep in mind there's the ./run bundle:install and ./run yarn:install commands that are used to keep lock files reflected back to your Docker host so they stay in sync.

I've never been crazy about that pattern but it does work which is essentially running bundle install twice (once doing the build and again at runtime so volumes kick in). Another option is using docker compose cp to copy the lock file out to change how bundle:install works. This way bundle install isn't called twice. The same could be done for yarn. That's a completely separate matter which I could do at some point but I wouldn't mind reviewing the need for using frozen after that potential refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants