Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--buildtype=release doesn't pass optimizer flag to linker #14236

Open
alanc opened this issue Feb 11, 2025 · 6 comments
Open

--buildtype=release doesn't pass optimizer flag to linker #14236

alanc opened this issue Feb 11, 2025 · 6 comments

Comments

@alanc
Copy link
Contributor

alanc commented Feb 11, 2025

Describe the bug
"CFLAGS="-O3" meson setup --buildtype=plain" can produce more optimized binaries than meson setup --buildtype=release because the first passes -O3 to the link stage, while the second does not, which at least with gcc, enables some optimizations at linking time.

(I originally noticed this when using diffoscope to compare an autoconf-to-meson conversion, when I was comparing the output of autoconf with CFLAGS="-O3" to meson with --buildtype=release. I fixed that by switching the meson build to "CFLAGS="-O3" meson setup --buildtype=plain" for a better comparison.)

To Reproduce

  • Download https://www.x.org/releases/individual/lib/libXau-1.0.12.tar.gz on a Linux or Unix-like system with gcc installed
  • meson setup --buildtype=release release ; meson compile -C release -v
  • CFLAGS="-O3" meson setup --buildtype=plain plain ; meson compile -C plain -v
  • compare release/libXau.so.6.0.0 to plain/libXau.so.6.0.0

In the plain version, I see a smaller .gnu.hash section in the plain compile than in the release compile. When looking at the output flags passed to gcc at the linking step, I see only the plain compile passed -O3 to the linking step. (This is a small simple library, but enough to show differences; a larger, more complex library would presumably show more differences.)

Expected behavior
I expected --buildtype=release to not skip the simple link time optimizations normally included by gcc -O3

system parameters

  • I saw this first in the freedesktop.org gitlab CI environment, using
    • Arch Linux on x64
    • core/python 3.13.1-1
    • core/gcc 14.2.1+r753+g1cd744a6828f-1
    • extra/meson 1.7.0-1
    • extra/ninja 1.12.1-2
  • I reproduced in a VM locally, using
    • Oracle Linux 9 on x64
    • python3-3.9.21-1.el9_5.x86_64
    • gcc-11.5.0-2.0.1.el9.x86_64
    • meson-0.58.2-1.el9.noarch
    • ninja-build-1.10.2-6.el9.x86_64
@eli-schwartz
Copy link
Member

It's an interesting thought. The binutils linker documents that level is insignificant, -O1 is equivalent to -O3, and also that it currently does nothing for executables, only shared libraries.

It makes sense to pass it along, I suppose. Certainly, we don't seem to document -D optimization=3 as being "compiler optimizations" but not "linker optimizations", so it is fair game to change this.

@thesamesam
Copy link
Collaborator

bfd documents:

  -O level
      If  level  is  a  numeric  values greater than zero ld optimizes the output.  This might take significantly longer and >therefore probably should only be enabled for the final
      binary.  At the moment this option only affects ELF shared library generation.  Future releases of the linker may make >more use of this option.  Also currently  there  is  no
      difference in the linker’s behaviour for different non-zero values of this option.  Again this may change with future >releases.

lld documents:

Optimize output file size. value may be:
0
Disable string merging.
1
Enable string merging.
2
Enable string tail merging.
-O1 is the default.

I don't have access to other linkers to check.

I'll note that gas actually has an optimisation option as well:

  -O0 | -O | -O1 | -O2 | -Os
      Optimize instruction encoding with smaller instruction size.  -O and -O1 encode 64-bit register load instructions with >64-bit immediate as 32-bit register  load  instructions
      with  31-bit  or  32-bits immediates, encode 64-bit register clearing instructions with 32-bit register clearing >instructions, encode 256-bit/512-bit VEX/EVEX vector register
      clearing instructions with 128-bit VEX vector register clearing instructions, encode 128-bit/256-bit EVEX vector register >load/store instructions  with  VEX  vector  register
      load/store instructions, and encode 128-bit/256-bit EVEX packed integer logical instructions with 128-bit/256-bit VEX >packed integer logical.

      -O2 includes -O1 optimization plus encodes 256-bit/512-bit EVEX vector register clearing instructions with 128-bit >EVEX vector register clearing instructions.  In 64-bit mode
      VEX  encoded  instructions with commutative source operands will also have their source operands swapped if this >allows using the 2-byte VEX prefix form instead of the 3-byte
      one.  Certain forms of AND as well as OR with the same (register) operand specified twice will also be changed to TEST.

      -Os includes -O2 optimization plus encodes 16-bit, 32-bit and 64-bit register tests with immediate as 8-bit register test >with immediate.  -O0 turns off this optimization.

@thesamesam
Copy link
Collaborator

thesamesam commented Feb 12, 2025

But wait. Is Alan talking about -Wl,-On, or is he talking about replicating all compile-time optimisation options at link-time (as GCC documents one should do when doing LTO, because of the issues around option merging)? I think he means the former but I'd like to check, and I should probably file a bug for the latter.

@eli-schwartz
Copy link
Member

An explicit reproducer case was provided which involves configuring libXau using

CFLAGS="-O3" meson setup --buildtype=plain plain 

No LTO mentioned. I haven't checked but I doubt the project is forcing b_lto=true in default_options.

@thesamesam
Copy link
Collaborator

thesamesam commented Feb 12, 2025

The reason I ask is that I wasn't aware that GCC ever passed down -Wl,-On on your behalf (so I wouldn't expect, say, bare (and not -Wl,) -O3 to make any difference there).

@eli-schwartz
Copy link
Member

I can't reproduce a difference, anyway. Actually I just realized that we use get_optimization_link_args() and pass -Wl,-O1 to the link stage, so the thing I suggested maybe we should do, it turns out we are in fact doing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants