Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossbuild: use host platform as container runtime #40330

Merged
merged 12 commits into from
Aug 14, 2024

Conversation

moukoublen
Copy link
Member

@moukoublen moukoublen commented Jul 24, 2024

Proposed commit message

Disable using target platform as container runtime and resolve strip command based on target architecture.

This PR brought a change, enabling the cross-build container to use the build target platform as the container runtime platform.

So -for example- in a host machine with amd64 architecture, when the target binary is of amd64, the amd64 container runtime will be used. But when the target binary is of arm64 type, the arm64 container runtime (and thus container image) will be used.

While this fixed the issue mentioned in the original PR it has two drawbacks worth pointing out.

  • The host machine that runs the cross-build must have virtualization capabilities (e.g., qemu-user-static and binfmt-support, installed) to be able to run arm64 binaries on an amd64 system and vice versa. (This was the issue we originally had in cloudbeat, and solved by changing the builtkite OS image. You can read more here.)
  • The compilation is much slower since virtualization is used. In cloudbeat case the DRA creation process went from 20 minutes to 1 hour (link).

With this PR, the strip command to use is being resolved based on the target binary that is going to be stripped.

  • For striping binaries of architecture linux/arm64 the aarch64-linux-gnu-strip will be used
  • For striping binaries of architecture linux/amd64 the x86_64-linux-gnu-strip will be used

In linux/amd64 image of golang-crossbuild:<go version>-arm both aarch64-linux-gnu-strip and x86_64-linux-gnu-strip exist, so the strip will happen successfully.

In linux/arm64 the x86_64-linux-gnu-strip is missing but it seems that the linux/arm64 VM is used only to produce arm64 docker images so it is safe.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

N/A

Author's Checklist

  • [ ]

How to test this PR locally

Run SNAPSHOT=true PLATFORMS="linux/amd64,linux/arm64" mage package before and after

Related issues

Use cases

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jul 24, 2024
Copy link
Contributor

mergify bot commented Jul 24, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @moukoublen? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@cmacknz cmacknz added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jul 24, 2024
@cmacknz
Copy link
Member

cmacknz commented Jul 24, 2024

So -for example- in a host machine with amd64 architecture, when the target binary is of amd64, the amd64 container runtime will be used. But when the target binary is of arm64 type, the arm64 container runtime (and thus container image) will be used.

What I would expect to happen in an ideal world:

  1. The arch of the container image always matches the arch of the host./
  2. Cross compilation uses the correct toolchain for the target arch regardless of the host arch.

You shouldn't need to virtualize an arm64 CPU to build an arm64 target from an amd64 host. Where is this requirement coming from? Looking at the linked PR I see:

// This fixes an issue where during arm64 linux build for the currently used docker image
// docker.elastic.co/beats-dev/golang-crossbuild:1.21.9-arm the image for amd64 arch is pulled
// and causes problems when using native arch tools on the binaries that are built for arm64 arch.

This makes me think the actual problem is we don't have both toolchains available and we are using the host toolchain unconditionally. In the C++ universe, on an amd64 host cross-compiling an arm64 target we would use the arm64 cross compiler's strip that is aware it is dealing with an arm64 binary. This does not require virtualization.

My gut feeling is the tooling in the cross build container is missing, or our build system isn't actually using it correctly.

CC @aleksmaus since he made this change and might know the answer.

@aleksmaus
Copy link
Member

aleksmaus commented Jul 24, 2024

My gut feeling is the tooling in the cross build container is missing, or our build system isn't actually using it correctly.

CC @aleksmaus since he made this change and might know the answer.

My memories are vague. As far as I remember there was tooling arch mismatch with our builders docker images.

Here is the quick check with that image on macOS M1, with and without platform flag

Screenshot 2024-07-24 at 10 28 26 AM

I'm fine with the change as long as osqueryd binary is still getting stripped successfully during the build.

@moukoublen
Copy link
Member Author

moukoublen commented Jul 24, 2024

@aleksmaus If I am not mistaken, if the image is already downloaded (for a specific tag) using a particular architecture, then this is used if no --platform is defined afterwards.

So for example in an amd64 linux if I first run with --platform linux/arm64 and then without, then the arm64 runtime will be used for both runs.
Screenshot 2024-07-24 at 6 00 26 PM

But if no --platform is given the first time the image is downloaded then the hosts platform is requested/downloaded.
Screenshot 2024-07-24 at 6 06 32 PM

Perhaps that explains the random selection you mentioned in your PR (what platform was downloaded first).

Having said that I believe a more correct approach for my pr is to change the flag to UseHostPlatform. By default being false and fallbacks to "use target platform as container runtime", and if enabled the host platform will be used specifically by providing --platform like that:

		args = append(args,
			"--platform", runtime.GOOS+"/"+runtime.GOARCH,
		)

@moukoublen moukoublen force-pushed the cloudbeat_mage_additions branch 2 times, most recently from a496b25 to e21b94e Compare July 24, 2024 15:22
@aleksmaus
Copy link
Member

@moukoublen Yep, this makes sense. Thanks for digging into this!

@moukoublen moukoublen marked this pull request as ready for review July 24, 2024 15:22
@moukoublen moukoublen requested a review from a team as a code owner July 24, 2024 15:22
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@moukoublen moukoublen changed the title Crossbuild: make container runtime platform change optional Crossbuild: add option to use host platform Jul 25, 2024
@pierrehilbert pierrehilbert requested review from AndersonQ and rdner and removed request for belimawr July 25, 2024 12:35
@AndersonQ
Copy link
Member

@moukoublen one question, I'm not seeing how to set UseHostPlatform. Or is this PR just to add the option and in a future PR to use it? Or am I missing something?

@moukoublen
Copy link
Member Author

moukoublen commented Jul 29, 2024

@AndersonQ this pr add the option to use the host platform mostly for cloudbeat.

For example something like this will be added to cloudbeat:

func CrossBuild() error {
	return devtools.CrossBuild(devtools.UseHostPlatform())
}

Apart from that I don't know if any of the beats would need to use it, I am not entirely sure if the initial change "use target platform as runtime" was introduced for a specific beat and, as a side effect is used to all or it was meant to be used by all in the first place.

@cmacknz
Copy link
Member

cmacknz commented Jul 29, 2024

Taking a look at what is in the arm container, we only have the host strip (strip with no target triple) and aarch64-linux-gnu-strip cross-toolchain which is the arm64 strip command which is also the host architecture so is pointless. This is the cross compilation setup for an X86 image, but in an ARM image.

docker run -it --entrypoint /bin/bash docker.elastic.co/beats-dev/golang-crossbuild:1.22.5-arm
root@f355e76ee67c:/app# find / -name '*strip*'
/usr/bin/strip
/usr/bin/aarch64-linux-gnu-strip

What I think happened is when we added support for arm64 containers we didn't add the X86 cross toolchain into them. We would want x86_64-linux-gnu-strip to exist here so you can cross compile and strip X86_64 CGO dependent binaries from an ARM64 base image.

This can be fixed in the golang-crossbuild image with a lot more work, which is probably the right way to deal with this.

If Cloudbeat doesn't even need CGo, an even better option is to just use the Go toolchain directly and not use the crossbuild images at all. They aren't doing anything for you really. Several of the Beats don't have CGo dependencies so this would probably be an even better optimization. If you use a container at all just use the Go container to allow pinning the Go version easily or something.

The change here is a quick work around, which I'm not opposed to since it is optional, but it isn't addressing the real root cause, which is that our compilation support isn't actually setup right and/or is done unnecessarily.

@moukoublen
Copy link
Member Author

moukoublen commented Jul 30, 2024

@cmacknz Yes its seems that:

on arm64 platform

$ docker run --pull always  --interactive --tty --rm --platform linux/arm64  --entrypoint bash docker.elastic.co/beats-dev/golang-crossbuild:1.22.5-arm -c "find / -name '*strip*'"

1.22.5-arm: Pulling from beats-dev/golang-crossbuild
Digest: sha256:c06dcf2e7106749aed4fb0b2c0adf9d709ec14574e699a305d9e930fe4958637
Status: Downloaded newer image for docker.elastic.co/beats-dev/golang-crossbuild:1.22.5-arm
/usr/bin/aarch64-linux-gnu-strip
/usr/bin/strip
/usr/lib/aarch64-linux-gnu/systemd-tests/test-strip-tab-ansi
/usr/lib/git-core/git-stripspace
/usr/share/man/man1/strip.1.gz
/usr/share/man/man1/aarch64-linux-gnu-strip.1.gz
/usr/share/man/man1/git-stripspace.1.gz

on x86_64 platform

$ docker run --pull always  --interactive --tty --rm --platform linux/amd64  --entrypoint bash docker.elastic.co/beats-dev/golang-crossbuild:1.22.5-arm -c "find / -name '*strip*'"

1.22.5-arm: Pulling from beats-dev/golang-crossbuild
Digest: sha256:c06dcf2e7106749aed4fb0b2c0adf9d709ec14574e699a305d9e930fe4958637
Status: Downloaded newer image for docker.elastic.co/beats-dev/golang-crossbuild:1.22.5-arm
/usr/bin/aarch64-linux-gnu-strip
/usr/bin/x86_64-linux-gnu-strip
/usr/bin/strip
/usr/lib/aarch64-linux-gnu/systemd-tests/test-strip-tab-ansi
/usr/lib/git-core/git-stripspace
/usr/share/man/man1/aarch64-linux-gnu-strip.1.gz
/usr/share/man/man1/strip.1.gz
/usr/share/man/man1/git-stripspace.1.gz
/usr/share/man/man1/x86_64-linux-gnu-strip.1.gz
/usr/aarch64-linux-gnu/bin/strip

Perhaps because of the fact I mentioned here, the arm64 platform was sometimes used, and other times the x86_64 platform was used (depending on which image was downloaded first).

I am wondering, though, whether forcing the host platform to be always used would solve this issue since the x86_64 platform of the 1.22.5-arm image contains both the aarch64-linux-gnu-strip and the x86_64-linux-gnu-strip tools.

From what I see in the current buildkite pipelines an ubuntu x86_64 is used to produce these platforms "+all linux/amd64 linux/arm64 windows/amd64 darwin/amd64 darwin/arm64" and a separate arm64 VM image is used to produce docker package for arm64 specifically.

It seems that in both cases, force using the host platform would contain the necessary tools (since the arm64 VM image is used to produce only arm64 binaries).

@cmacknz
Copy link
Member

cmacknz commented Jul 30, 2024

I am wondering, though, whether forcing the host platform to be always used would solve this issue since the x86_64 platform of the 1.22.5-arm image contains both the aarch64-linux-gnu-strip and the x86_64-linux-gnu-strip tools.

👍 good observation I hadn't looked at what CI was doing, it isn't actually cross compiling in this case we are just using the container to get repeatable versions of anything we link against.

@moukoublen
Copy link
Member Author

moukoublen commented Jul 31, 2024

Ok, I think I have a glimpse of what happened.

The mentioned PR introduced the ability to strip osqueryd binary for linux. To do that the command it was used was strip (link).

But this seems to not be useful on cross-platform because strip is a symlink to host native platform strip.

$ docker run --pull always  --interactive --tty --rm --platform linux/amd64  --entrypoint bash docker.elastic.co/beats-dev/golang-crossbuild:1.22.5-arm -c "ls -la /usr/bin/strip"

lrwxrwxrwx 1 root root 22 May 10  2017 /usr/bin/strip -> x86_64-linux-gnu-strip


$ docker run --pull always  --interactive --tty --rm --platform linux/arm64  --entrypoint bash docker.elastic.co/beats-dev/golang-crossbuild:1.22.5-arm -c "ls -la /usr/bin/strip"

lrwxrwxrwx 1 root root 23 May 10  2017 /usr/bin/strip -> aarch64-linux-gnu-strip

So -if the host platform is used as docker runtime platform (which initially was used)- this means that we need to use strip to strip x86_64 binaries and aarch64-linux-gnu-strip to strip arm64 binaries (having x86_64 as host platform and container runtime).

golang-crossbuild:1.22.5-arm
runtime platform
target binary strip command to use
linux/amd64 linux/amd64 strip
linux/amd64 linux/arm64 aarch64-linux-gnu-strip
linux/arm64 linux/amd64 N/A
(x86_64-linux-gnu-strip which is required in this case is not inlcuded in arm64 image)
linux/arm64 linux/arm64 strip

So, I think that the solution was given is to use the target platform as runtime platform in order for the command strip to be used safely (enforce the first and last line of table).

Another possible approach to avoid virtualization over the compilation/package process could be to create a function to resolve which strip command to use based on the runtime and target platforms, according to the table above.

Perhaps @aleksmaus could verify if this was the reason for the change or if I am mistaken here.

If this was the case, then if you all agree, I could change this PR to force the use of the host platform always and resolve the strip command based on the table above, as I mentioned.

Thank you all for the information and feedback.

@cmacknz
Copy link
Member

cmacknz commented Jul 31, 2024

Then if you all agree, I could move forward by doing the change I mentioned

Sounds good to me, choosing the correct toolchain like you are suggesting is how cross-compiling is supposed to work so you can avoid unnecessary virtualization. This is why cross toolchains exist in the first place (i.e. why we have an x86 executable called aarch64-linux-gnu-strip to target aarch64 binaries).

@rdner rdner removed their request for review August 1, 2024 12:45
@moukoublen moukoublen requested a review from a team as a code owner August 1, 2024 13:36
@moukoublen moukoublen changed the title Crossbuild: add option to use host platform Crossbuild: use host platform as container runtime Aug 1, 2024
@moukoublen
Copy link
Member Author

So, I changed the PR according to what we discussed; the strip command is being resolved based on the target platform and I used the host platform as container runtime to avoid virtualization.

I changed the title accordingly.

Feel free to check the new approach.

Thank you all for the feedback.

@moukoublen moukoublen force-pushed the cloudbeat_mage_additions branch 5 times, most recently from 872a729 to bd23efd Compare August 6, 2024 07:17
@cmacknz
Copy link
Member

cmacknz commented Aug 9, 2024

LGTM, I took a look at the osquery package build sizes locally and they definitely look stripped.

You will need an approval from @elastic/sec-deployment-and-devices who own osquerybeat as well.

@cmacknz cmacknz requested a review from aleksmaus August 12, 2024 15:53
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I would appreciate a review from @aleksmaus

Copy link
Member

@aleksmaus aleksmaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
Tested both archs builds, osqueryd is stripped alright.

@moukoublen moukoublen merged commit c4c402d into elastic:main Aug 14, 2024
123 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants