Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make build layers read only for subsequent buildpacks #155

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sambhav
Copy link
Member

@sambhav sambhav commented Apr 16, 2021

Signed-off-by: Sambhav Kothari [email protected]

Readable

Related RFC #163

text/0000-build-write-flag.md Outdated Show resolved Hide resolved
@hone
Copy link
Member

hone commented Apr 28, 2021

I think this a good idea, but it does add both complexity and encourages coupling. Another buildpack needs to know how to fetch or find the directory. This can be done via an env var like the android SDK example or just accidental like the Python one. We have the working dir + slices which is the current place that shared writing can happen. I think being able to show why we don't want to pollute this directory could go a long way. There is an inherit idea today that a layer created by a buildpack is "owned" by it and this changes that.

@sambhav sambhav force-pushed the build-write-layers branch 2 times, most recently from 7e42239 to 5d067af Compare May 5, 2021 15:51
@sambhav sambhav requested a review from hone May 5, 2021 15:51
text/0000-build-write-flag.md Outdated Show resolved Hide resolved
text/0000-build-write-flag.md Outdated Show resolved Hide resolved
@sambhav
Copy link
Member Author

sambhav commented May 19, 2021

Per the last RFC deep dive meeting, I have converted this RFC to only cover the "layerization" part. I will create a sibling RFC for "shared-layers" which should be implemented as an atomic change along with this RFC.

@sambhav sambhav changed the title Add a build-write flag for layers Make build layers read only for subsequent buildpacks May 19, 2021
@sambhav sambhav changed the title Make build layers read only for subsequent buildpacks [BLOCKED] Make build layers read only for subsequent buildpacks May 19, 2021
@sambhav sambhav force-pushed the build-write-layers branch 2 times, most recently from e5b1cbe to 5099221 Compare May 21, 2021 21:32
@sambhav sambhav changed the title [BLOCKED] Make build layers read only for subsequent buildpacks Make build layers read only for subsequent buildpacks May 21, 2021
@ekcasey ekcasey self-requested a review May 26, 2021 19:11
@sambhav sambhav force-pushed the build-write-layers branch 2 times, most recently from 16a896b to d3aacd4 Compare July 1, 2021 07:18
text/0000-read-only-build-layers.md Show resolved Hide resolved
text/0000-read-only-build-layers.md Outdated Show resolved Hide resolved
text/0000-read-only-build-layers.md Outdated Show resolved Hide resolved
Copy link
Member

@jkutner jkutner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing my approval for now, because I thought we were working toward a solution that had an escape hatch.

@jkutner jkutner self-requested a review July 24, 2021 14:19
# What it is
[what-it-is]: #what-it-is

Updates to the lifecycle so that `build` layers created by buildpacks should be layerized immediately after the buildpack that provided the layer is finished with its build process. This way any modification made by the future buildpacks would be ignore while still making this layer available to future buildpacks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a buildpack indicate that a layer should not be layerized until all buildpacks have run?

If a subsequent buildpack modified a layer from an existing buildpack, can we save that as a new layer? This may be inefficient overall, but it makes it much easy to develop buildpacks in some cases.

@jkutner
Copy link
Member

jkutner commented Jul 24, 2021

I'm revisiting this RFC as I'm struggling to get pip to install packages outside of PYTHONHOME, while keeping pip as a distinct buildpack. Is it possible? Yes absolutely, but this puts a lot of burden on new buildpack authors who are used to working with these tools (pip, etc) the way they are meant to be used. By forcing layers to be read-only we are going against the grain.

@hone
Copy link
Member

hone commented Jul 28, 2021

When deemed "ready", a team member will propose a "motion for final comment period (FCP)" along with a disposition of the outcome (merge, close, or postpone).

The proposed changes due to perf reasons are pretty drastic. This RFC seems to have stalled without a decision one way or another. I propose we "postpone" this and create an issue on lifecycle to try to at least warn users when this is happening. It may be expensive perf wise, but it maybe worth spiking and measure how much it would cost. Thoughts @ekcasey @natalieparellano ?

@buildpack-bot
Copy link
Member

Maintainers,

As you review this RFC please queue up issues to be created using the following commands:

/queue-issue <repo> "<title>" [labels]...
/unqueue-issue <uid>

Issues

(none)

@hone hone added the status/blocked RFC is blocked by some other outstanding dependency. label Sep 8, 2021
@hone
Copy link
Member

hone commented Sep 15, 2021

This is blocked currently on this spike: buildpacks/lifecycle#703.

@hone
Copy link
Member

hone commented Sep 29, 2021

Since @ekcasey is on leave, I'd like to reassign this. Do you mind if I assign you @natalieparellano?

@sambhav
Copy link
Member Author

sambhav commented Feb 9, 2022

Coverting this to a draft while it is still blocked.

@sambhav sambhav marked this pull request as draft February 9, 2022 19:23
@natalieparellano
Copy link
Member

The proposed changes due to perf reasons are pretty drastic. This RFC seems to have stalled without a decision one way or another. I propose we "postpone" this and create an issue on lifecycle to try to at least warn users when this is happening. It may be expensive perf wise, but it maybe worth spiking and measure how much it would cost. Thoughts @ekcasey @natalieparellano ?

@hone just to clarify, the spike is measuring the performance impact of notifying or failing if mod times change? Or something else?

@mboldt
Copy link
Contributor

mboldt commented Apr 6, 2022

I ran a couple experiments for buildpacks/lifecycle#703. Detecting which buildpacks modify which layers using a file system watcher was pretty efficient (added ~0.5 seconds to a ~12-second build of Spring Petclinic REST with Paketo buildpacks). More details and results in the experiment repo.

So, it seems feasible for lifecycle to detect and report a buildpack changing another buildpack's layer.

On the broader topic, I've had some conversations with @yaelharel who is interested and has some good ideas to explore for read-only layers and change detection.

@yaelharel
Copy link
Contributor

While @mboldt ran some experiments with a hash function and a file system watcher (as described in this issue), we talked about some ideas for this RFC:

  • Hash - although the experiment that Mikey ran took too long, maybe it’s worth trying a different hash function.
  • Git - if we would have Git on our file system, and everything is committed, we would be able to run git diff and see if anything was changed after each buildpacks run.
  • Virtualization - use it to run the buildpacks in a way so they won’t have write access to the layers.
  • Copy on write - use this mechanism for each buildpack run and then delete the changes so the actual layers won’t change.
  • Search based on timestamps - record the timestamp before a buildpack starts running. At the end of the run, check the timestamp of the relevant directories, and if they changed, we can do a BFS/DFS search in order to find out what directories have changed. Probably it’s not a very good idea since buildpacks can modify the mtime information on files.

One tricky part is knowing ahead of time which buildpack should “own” a given layer. From a fresh build, it’s easy to see which buildpack creates a layer. But, with caching/restore, it’s a bit trickier as the layers already exist. We would need to track that somewhere but we think it’s possible to do so.

Another concern about some of the hard-stop methods is that, although layers should be read-only per spec, it is not enforced today, and so could break some buildpacks. If we don’t want to create a breaking change, we think that we can create some kind of a flag that will tell us whether we should only warn or error based on the version.

@mboldt
Copy link
Contributor

mboldt commented Apr 11, 2022

One tricky part is knowing ahead of time which buildpack should “own” a given layer. From a fresh build, it’s easy to see which buildpack creates a layer. But, with caching/restore, it’s a bit trickier as the layers already exist. We would need to track that somewhere but we think it’s possible to do so.

This was my ignorance...the platform API does include the buildpack ID in the layer path, so this is not a problem as I first thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.