-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crossbuild: use host platform as container runtime #40330
Crossbuild: use host platform as container runtime #40330
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
What I would expect to happen in an ideal world:
You shouldn't need to virtualize an arm64 CPU to build an arm64 target from an amd64 host. Where is this requirement coming from? Looking at the linked PR I see: // This fixes an issue where during arm64 linux build for the currently used docker image
// docker.elastic.co/beats-dev/golang-crossbuild:1.21.9-arm the image for amd64 arch is pulled
// and causes problems when using native arch tools on the binaries that are built for arm64 arch. This makes me think the actual problem is we don't have both toolchains available and we are using the host toolchain unconditionally. In the C++ universe, on an amd64 host cross-compiling an arm64 target we would use the arm64 cross compiler's strip that is aware it is dealing with an arm64 binary. This does not require virtualization. My gut feeling is the tooling in the cross build container is missing, or our build system isn't actually using it correctly. CC @aleksmaus since he made this change and might know the answer. |
My memories are vague. As far as I remember there was tooling arch mismatch with our builders docker images. Here is the quick check with that image on macOS M1, with and without platform flag I'm fine with the change as long as osqueryd binary is still getting stripped successfully during the build. |
@aleksmaus If I am not mistaken, if the image is already downloaded (for a specific tag) using a particular architecture, then this is used if no So for example in an amd64 linux if I first run with But if no Perhaps that explains the random selection you mentioned in your PR (what platform was downloaded first). Having said that I believe a more correct approach for my pr is to change the flag to args = append(args,
"--platform", runtime.GOOS+"/"+runtime.GOARCH,
) |
a496b25
to
e21b94e
Compare
@moukoublen Yep, this makes sense. Thanks for digging into this! |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@moukoublen one question, I'm not seeing how to set |
@AndersonQ this pr add the option to use the host platform mostly for cloudbeat. For example something like this will be added to cloudbeat: func CrossBuild() error {
return devtools.CrossBuild(devtools.UseHostPlatform())
} Apart from that I don't know if any of the beats would need to use it, I am not entirely sure if the initial change "use target platform as runtime" was introduced for a specific beat and, as a side effect is used to all or it was meant to be used by all in the first place. |
f5b8780
to
fee1ff1
Compare
Taking a look at what is in the arm container, we only have the host strip (strip with no target triple) and docker run -it --entrypoint /bin/bash docker.elastic.co/beats-dev/golang-crossbuild:1.22.5-arm
root@f355e76ee67c:/app# find / -name '*strip*'
/usr/bin/strip
/usr/bin/aarch64-linux-gnu-strip What I think happened is when we added support for arm64 containers we didn't add the X86 cross toolchain into them. We would want This can be fixed in the golang-crossbuild image with a lot more work, which is probably the right way to deal with this. If Cloudbeat doesn't even need CGo, an even better option is to just use the Go toolchain directly and not use the crossbuild images at all. They aren't doing anything for you really. Several of the Beats don't have CGo dependencies so this would probably be an even better optimization. If you use a container at all just use the Go container to allow pinning the Go version easily or something. The change here is a quick work around, which I'm not opposed to since it is optional, but it isn't addressing the real root cause, which is that our compilation support isn't actually setup right and/or is done unnecessarily. |
@cmacknz Yes its seems that: on arm64 platform
on x86_64 platform
Perhaps because of the fact I mentioned here, the arm64 platform was sometimes used, and other times the x86_64 platform was used (depending on which image was downloaded first). I am wondering, though, whether forcing the host platform to be always used would solve this issue since the x86_64 platform of the From what I see in the current buildkite pipelines an ubuntu x86_64 is used to produce these platforms It seems that in both cases, force using the host platform would contain the necessary tools (since the arm64 VM image is used to produce only arm64 binaries). |
👍 good observation I hadn't looked at what CI was doing, it isn't actually cross compiling in this case we are just using the container to get repeatable versions of anything we link against. |
Ok, I think I have a glimpse of what happened. The mentioned PR introduced the ability to strip osqueryd binary for linux. To do that the command it was used was But this seems to not be useful on cross-platform because
So -if the host platform is used as docker runtime platform (which initially was used)- this means that we need to use
So, I think that the solution was given is to use the target platform as runtime platform in order for the command Another possible approach to avoid virtualization over the compilation/package process could be to create a function to resolve which strip command to use based on the runtime and target platforms, according to the table above. Perhaps @aleksmaus could verify if this was the reason for the change or if I am mistaken here. If this was the case, then if you all agree, I could change this PR to force the use of the host platform always and resolve the strip command based on the table above, as I mentioned. Thank you all for the information and feedback. |
Sounds good to me, choosing the correct toolchain like you are suggesting is how cross-compiling is supposed to work so you can avoid unnecessary virtualization. This is why cross toolchains exist in the first place (i.e. why we have an x86 executable called aarch64-linux-gnu-strip to target aarch64 binaries). |
fee1ff1
to
57afb7f
Compare
So, I changed the PR according to what we discussed; the strip command is being resolved based on the target platform and I used the host platform as container runtime to avoid virtualization. I changed the title accordingly. Feel free to check the new approach. Thank you all for the feedback. |
872a729
to
bd23efd
Compare
bd23efd
to
daa6e46
Compare
LGTM, I took a look at the osquery package build sizes locally and they definitely look stripped. You will need an approval from @elastic/sec-deployment-and-devices who own osquerybeat as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I would appreciate a review from @aleksmaus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Tested both archs builds, osqueryd is stripped alright.
Proposed commit message
Disable using target platform as container runtime and resolve
strip
command based on target architecture.This PR brought a change, enabling the cross-build container to use the build target platform as the container runtime platform.
So -for example- in a host machine with amd64 architecture, when the target binary is of amd64, the amd64 container runtime will be used. But when the target binary is of arm64 type, the arm64 container runtime (and thus container image) will be used.
While this fixed the issue mentioned in the original PR it has two drawbacks worth pointing out.
With this PR, the strip command to use is being resolved based on the target binary that is going to be stripped.
linux/arm64
theaarch64-linux-gnu-strip
will be usedlinux/amd64
thex86_64-linux-gnu-strip
will be usedIn
linux/amd64
image ofgolang-crossbuild:<go version>-arm
bothaarch64-linux-gnu-strip
andx86_64-linux-gnu-strip
exist, so the strip will happen successfully.In
linux/arm64
thex86_64-linux-gnu-strip
is missing but it seems that thelinux/arm64
VM is used only to produce arm64 docker images so it is safe.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
N/A
Author's Checklist
How to test this PR locally
Run
SNAPSHOT=true PLATFORMS="linux/amd64,linux/arm64" mage package
before and afterRelated issues
Use cases
Screenshots
Logs