Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ROOT_jll #9300

Merged
merged 30 commits into from
Feb 19, 2025
Merged

Adding ROOT_jll #9300

merged 30 commits into from
Feb 19, 2025

Conversation

peremato
Copy link
Contributor

This is a new attempt to cross-compile ROOT in order to produce ROOT_jll. It starts from #7666
The current status is:

  • It builds for the host platform (x86_64-linux-musl) 😄 - not tested the produced binaries
  • It also builds for gnu platform (x86_64-linux-gnu-cxx11) 😄 and creates the tarball
    • There are some errors checking the produced shared libraries (although I have disabled dll opening them). For example:
  ERROR in cling::CIFactory::createCI(): cannot extract standard library include paths!
  Invoking:
    LC_ALL=C x86_64-linux-gnu-g++   -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.include/,${' -e '/^ \/.*++/p' -e '}'
  Results was:
  With exit code 0
  <built-in>:6:9: warning: '__STDC_LIMIT_MACROS' macro redefined [-Wmacro-redefined]
  #define __STDC_LIMIT_MACROS 1
          ^
  /opt/x86_64-linux-musl/bin/../lib/gcc/x86_64-linux-musl/11.1.0/include/stdint.h:5:11: note: previous definition is here
  #  define __STDC_LIMIT_MACROS
            ^
  <built-in>:7:9: warning: '__STDC_CONSTANT_MACROS' macro redefined [-Wmacro-redefined]
  #define __STDC_CONSTANT_MACROS 1
          ^
  /opt/x86_64-linux-musl/bin/../lib/gcc/x86_64-linux-musl/11.1.0/include/stdint.h:7:11: note: previous definition is here
  #  define __STDC_CONSTANT_MACROS
            ^
  input_line_2:6:18: error: exception specification in declaration does not match previous declaration
  extern "C++" int at_quick_exit(void(*f)())  throw ()  { return __cxa_atexit((void(*)(void*))f, 0, __dso_handle); }
                   ^
  /opt/x86_64-linux-musl/x86_64-linux-musl/sys-root/usr/include/stdlib.h:48:5: note: previous declaration is here
  int at_quick_exit (void (*) (void));
      ^
  Replaced symbol at_quick_exit cannot be found in JIT!
  In file included from input_line_3:38:
  In file included from /../lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/cassert:44:
  /usr/include/assert.h:37:28: error: function-like macro '__GNUC_PREREQ' is not defined
  #if defined __cplusplus && __GNUC_PREREQ (2,95)
  • I can start the root executable once deployed 😄. To reproduce:
julia build_tarballs.jl --verbose --debug   x86_64-linux-gnu-cxx11
julia build_tarballs.jl --deploy=local --skip-build  x86_64-linux-gnu-cxx11

What I get is:

 $ julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.0-rc2 (2024-07-29)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.11) pkg> dev /build/mato_sftnight/.julia/dev/ROOT_jll
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.11/Project.toml`
  No Changes to `~/.julia/environments/v1.11/Manifest.toml`

julia> using ROOT_jll
Precompiling ROOT_jll...
  1 dependency successfully precompiled in 3 seconds. 67 already precompiled.

julia> run(ROOT_jll.root())
ERROR in cling::CIFactory::createCI(): cannot extract standard library include paths!
Invoking:
  LC_ALL=C x86_64-linux-gnu-g++   -xc++ -E -v /dev/null 2>&1 | sed -n -e '/^.include/,${' -e '/^ \/.*++/p' -e '}'
Results was:
With exit code 0
<built-in>:6:9: warning: '__STDC_LIMIT_MACROS' macro redefined [-Wmacro-redefined]
#define __STDC_LIMIT_MACROS 1
        ^
/opt/x86_64-linux-musl/bin/../lib/gcc/x86_64-linux-musl/11.1.0/include/stdint.h:5:11: note: previous definition is here
#  define __STDC_LIMIT_MACROS
          ^
<built-in>:7:9: warning: '__STDC_CONSTANT_MACROS' macro redefined [-Wmacro-redefined]
#define __STDC_CONSTANT_MACROS 1
        ^
/opt/x86_64-linux-musl/bin/../lib/gcc/x86_64-linux-musl/11.1.0/include/stdint.h:7:11: note: previous definition is here
#  define __STDC_CONSTANT_MACROS
          ^
input_line_2:6:18: error: exception specification in declaration does not match previous declaration
extern "C++" int at_quick_exit(void(*f)())  throw ()  { return __cxa_atexit((void(*)(void*))f, 0, __dso_handle); }
                 ^
/opt/x86_64-linux-musl/x86_64-linux-musl/sys-root/usr/include/stdlib.h:48:5: note: previous declaration is here
int at_quick_exit (void (*) (void));
    ^
Replaced symbol at_quick_exit cannot be found in JIT!
In file included from input_line_3:38:
In file included from /../lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/cassert:44:
/usr/include/assert.h:37:28: error: function-like macro '__GNUC_PREREQ' is not defined
#if defined __cplusplus && __GNUC_PREREQ (2,95)
                           ^
   ------------------------------------------------------------------
  | Welcome to ROOT 6.32.02                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jan 01 1970, 00:00:00                 |
  | From heads/master@tags/v6-32-02                                  |
  | With                                                             |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------
root [0] .q
  • I have not managed yet to fully build for MacOS. 😢
    • It fails when compiling the generated dictionaries produced by rootcling because I think the generated code is not compilable by clang.
    • To reproduce:
julia build_tarballs.jl --verbose --debug   aarch64-apple-darwin-cxx11

The errors are of this form:

[17:24:51] ninja: job failed: /opt/bin/aarch64-apple-darwin20-libgfortran5-cxx11/aarch64-apple-darwin20-clang++ --sysroot=/opt/aarch64-apple-darwin20/aarch64-apple-darwin20/sys-root -DR__ARC4_STDLIB -D_COMPLEX_H -D__CLANG_STDATOMIC_H -D__COMPLEX_H__ -D__STDC_NO_COMPLEX__ -I/workspace/build/include -I/workspace/build/ginclude -I/workspace/srcdir/root-6.32.02/core/base/inc -I/workspace/srcdir/root-6.32.02/core/base/v7/inc -I/workspace/srcdir/root-6.32.02/core/clib/inc -I/workspace/srcdir/root-6.32.02/core/clingutils/inc -I/workspace/srcdir/root-6.32.02/core/clingutils/res -I/workspace/srcdir/root-6.32.02/core/cont/inc -I/workspace/srcdir/root-6.32.02/core/foundation/inc -I/workspace/srcdir/root-6.32.02/core/foundation/v7/inc -I/workspace/srcdir/root-6.32.02/core/foundation/res -I/workspace/srcdir/root-6.32.02/core/gui/inc -I/workspace/srcdir/root-6.32.02/core/meta/inc -I/workspace/srcdir/root-6.32.02/core/rint/inc -I/workspace/srcdir/root-6.32.02/core/textinput/inc -I/workspace/srcdir/root-6.32.02/core/textinput/src -I/workspace/srcdir/root-6.32.02/core/thread/inc -I/workspace/srcdir/root-6.32.02/core/zip/inc -I/workspace/srcdir/root-6.32.02/core/lzma/inc -I/workspace/srcdir/root-6.32.02/core/lz4/inc -I/workspace/srcdir/root-6.32.02/core/zstd/inc -I/workspace/srcdir/root-6.32.02/core/macosx/inc -I/workspace/srcdir/root-6.32.02/core/unix/inc -I/workspace/srcdir/root-6.32.02/core/unix/../clib/res -fcolor-diagnostics -Wc++11-narrowing -Wsign-compare -Wsometimes-uninitialized -Wconditional-uninitialized -Wheader-guard -Warray-bounds -Wcomment -Wtautological-compare -Wstrncat-size -Wloop-analysis -Wbool-conversion -m64 -pipe -W -Wall -Woverloaded-virtual -fsigned-char -fno-common -Qunused-arguments -pthread -stdlib=libc++ -O3 -DNDEBUG -mmacosx-version-min=11.0 -fPIC -std=c++17 -MD -MT core/CMakeFiles/G__Core.dir/G__Core.cxx.o -MF core/CMakeFiles/G__Core.dir/G__Core.cxx.o.d -o core/CMakeFiles/G__Core.dir/G__Core.cxx.o -c /workspace/build/core/G__Core.cxx
[17:24:51] /workspace/build/core/G__Core.cxx:312:81: error: use of undeclared identifier '__gnu_cxx'
[17:24:51]    static TGenericClassInfo *GenerateInitInstanceLocal(const ::reverse_iterator<__gnu_cxx::__normal_iterator<TString*,vector<TString> > >*)
[17:24:51]                                                                                 ^
[17:24:51] /workspace/build/core/G__Core.cxx:312:110: error: 'TString' does not refer to a value
[17:24:51]    static TGenericClassInfo *GenerateInitInstanceLocal(const ::reverse_iterator<__gnu_cxx::__normal_iterator<TString*,vector<TString> > >*)
[17:24:51]                                                                                                              ^
[17:24:51] /workspace/srcdir/root-6.32.02/core/base/inc/TVirtualX.h:43:7: note: declared here
[17:24:51] class TString;
[17:24:51]       ^
[17:24:51] /workspace/build/core/G__Core.cxx:312:118: error: expected expression
[17:24:51]    static TGenericClassInfo *GenerateInitInstanceLocal(const ::reverse_iterator<__gnu_cxx::__normal_iterator<TString*,vector<TString> > >*)
[17:24:51]                                                                                                                      ^
[17:24:51] /workspace/build/core/G__Core.cxx:312:135: error: expected ')'
[17:24:51]    static TGenericClassInfo *GenerateInitInstanceLocal(const ::reverse_iterator<__gnu_cxx::__normal_iterator<TString*,vector<TString> > >*)
[17:24:51]                                                                                                                                       ^
[17:24:51] /workspace/build/core/G__Core.cxx:312:55: note: to match this '('
[17:24:51]    static TGenericClassInfo *GenerateInitInstanceLocal(const ::reverse_iterator<__gnu_cxx::__normal_iterator<TString*,vector<TString> > >*)
[17:24:51]                                                       ^
[17:24:51] /workspace/build/core/G__Core.cxx:314:26: error: use of undeclared identifier '__gnu_cxx'
[17:24:51]       ::reverse_iterator<__gnu_cxx::__normal_iterator<TString*,vector<TString> > > *ptr = nullptr;
[17:24:51]                          ^
[17:24:51] /workspace/build/core/G__Core.cxx:314:55: error: 'TString' does not refer to a value
[17:24:51]       ::reverse_iterator<__gnu_cxx::__normal_iterator<TString*,vector<TString> > > *ptr = nullptr;
[17:24:51]                                                       ^

@peremato peremato mentioned this pull request Aug 23, 2024
@vgvassilev
Copy link

Cling needs to ask the system compiler about the include paths to libstdc++. Do you have a c++ compiler installed?

@giordano
Copy link
Member

Do you have a c++ compiler installed?

We have both a compiler for the host platform and a cross-compiler for the target one, which one should we use for this?

@vgvassilev
Copy link

Do you have a c++ compiler installed?

We have both a compiler for the host platform and a cross-compiler for the target one, which one should we use for this?

I believe the one you compile with. It should be the same as the one you will have on the platform you deploy.

@peremato
Copy link
Contributor Author

@giordano what @vgvassilev is referring is to have the used compiler on the [user] system where you install the ROOT_jll package. This is not going to be easy, unless the compiler itself is a _jll package. If there was a way to set the include path to the libstdc++ then we would not need to have the compiler installed, isn't it?
@vgvassilev, for the problem on MacOS, is there a way to tell rootling to generate code that is compilable with clang? In other words, how to tell rootling the target platform?

@vgvassilev
Copy link

@vgvassilev, for the problem on MacOS, is there a way to tell rootling to generate code that is compilable with clang? In other words, how to tell rootling the target platform?

I suspect that if we are able to point cling to the right compiler, rootcling will just work. Note that ROOT has been developed for many years without cross-compilation in mind and even if it builds it is not necessarily to run. You can see all of the ifdef X in the codebase. I believe that most of them are harmless but there are a few in cling and rootcling that needs to be rewritten asking about the target triple rather then relying on the compiler macro.

@giordano
Copy link
Member

This is not going to be easy, unless the compiler itself is a _jll package. If there was a way to set the include path to the libstdc++ then we would not need to have the compiler installed, isn't it?

We do have Clang compilers packaged up in JLLs (Clang_jll), but I don't think we ship libcxx (nor libstdc++) anywhere.

@grasph
Copy link
Contributor

grasph commented Sep 24, 2024

Hello,

I had a look at the new version of the recipe. For reminder, the recipe was already working for musl, but the package cannot be used due to this issue. So, I will focus on glibc and limit myself to the simplest case of x86_64.

The error about conflicting atexit and at_quick_exit prototypes, when generating the dictionary in the ROOT build process was solved in this new version, by using the host (x86_64/gcc+musl) headers for the c and c++ standard libraries instead of the target ones (x86_64/gcc+glibc). There are differences in the headers from glibc and musl, which I expect are the source of the error. I'm not sure it is correct to use here the host headers, but at least it compiles as Pere reported.

While building works, we have now a similar error when executing ROOT. The execution is done outside the Alpine container, and we have therefore only the glibc headers available. Note that I've solved the error about the compiler not found by creating a link with the expected name, and the at_quick_exit prototype error is still there.

In addition, I get the error IncrementalExecutor::executeFunction: symbol '_ZN5cling7runtime6gClingE' unresolved while linking [cling interface function]! , when entering commands on the ROOT prompt.

For reminder, in absence of --sysroot option for rootcling, we use the environment variable CPLUS_INCLUDE_PATH to for the PATH list that the compiler returns, when executed by rootcling without this option. Nevertheless, it seems not enough. There is some reminiscence of the include path of the host rootcling or root was compiled. @vgvassilev do you have an idea, where it comes from.

Philippe.

@vgvassilev
Copy link

In addition, I get the error IncrementalExecutor::executeFunction: symbol '_ZN5cling7runtime6gClingE' unresolved while linking [cling interface function]! , when entering commands on the ROOT prompt.

This very likely means for some reason Cling cannot finds its runtime. There is RuntimeUniverse.h which is supposed to be included at start-up time of Cling. Then the symbol should be provided by libCling.so. Can you find that symbol with nm?

@grasph
Copy link
Contributor

grasph commented Sep 24, 2024

The gCling symbol is in the libCling.so, but nm shows it without the cling::runtime:: namespace prefix and with the "U" type (undefined). libCling.so from working installation displays the same (no namespace and with U type.

@SimeonEhrig
Copy link

This is not going to be easy, unless the compiler itself is a _jll package. If there was a way to set the include path to the libstdc++ then we would not need to have the compiler installed, isn't it?

We do have Clang compilers packaged up in JLLs (Clang_jll), but I don't think we ship libcxx (nor libstdc++) anywhere.

I can confirm, that Clang_jll does not ship it's own STL. It use the system STL. In my case it is the libcxx 14.

image

Cross-compiling related issues with root-cling fixed for linux with glibc. MacOS not yet supported.
@grasph grasph force-pushed the ROOT-2 branch 3 times, most recently from 262560e to 98d5191 Compare February 4, 2025 15:39
@grasph
Copy link
Contributor

grasph commented Feb 4, 2025

@giordano, this PR should finally be ready to be merged.

It includes binaries for Linux x86_64 only, compilation for x86_84/glibc target on x86_64/musl has already been challenging. Support for MacOS will need some extra work.

Philippe.

@peremato peremato changed the title [WIP] Adding ROOT_jll Adding ROOT_jll Feb 14, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? We can disable it in a different way than patching the compiler, but it'd be good to understand why it's needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix a bug of libstdc++ that prevents compilation of the code. The bug is fixed with gcc 9.3.0, while GCCBootstrap@9 is at version 9.1.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing how using -march=something (what something?) is related to the bug you linked. There's no reference about -march flags in that report.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't look at the correct patch. I need to run the script to see when in the build process it is needed. I'll let you know once done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's needed here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work? Does that produce a binary that can only be run on avx512 machines? That's a no for us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different versions of libraries are compiled, and the library with the supported intrinsics is loaded at runtime. See: https://github.com/root-project/root/blob/3280847501bfb354a3a9ff1e023c8fd3b74548f4/roofit/batchcompute/src/Initialisation.cxx#L44

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then you can just add lock_microarchitecture=false as keyword argument to build_tarballs(...) instead of patching this file.

grasph and others added 3 commits February 14, 2025 18:59
Co-authored-by: Mosè Giordano <[email protected]>
…ompat flag to previous version

Compilation error was:

```
from /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/9.1.0/stdexcept:39,
from /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/9.1.0/array:39,
from /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/9.1.0/tuple:39,
from /opt/x86_64-linux-gnu/x86_64-linux-gnu/include/c++/9.1.0/functional:54,
from /workspace/srcdir/root/core/foundation/inc/TError.h:37,
from /workspace/srcdir/root/core/cont/inc/ROOT/TSeq.hxx:15,
from /workspace/srcdir/root/core/base/inc/ROOT/TExecutorCRTP.hxx:15,
from /workspace/srcdir/root/core/imt/inc/ROOT/TThreadExecutor.hxx:25,
from /workspace/srcdir/root/core/imt/src/TThreadExecutor.cxx:4:
/opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/include/bits/waitstatus.h:66:7: note: candidate: ‘wait::wait()’
     66 | union wait
        |       ^~~~                                                                                                                                               /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/include/bits/waitstatus.h:66:7: note:   candidate expects 0 arguments, 2 provided
/opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/include/bits/waitstatus.h:66:7: note: candidate: ‘constexpr wait::wait(const wait&)’                         /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/include/bits/waitstatus.h:66:7: note:   candidate expects 1 argument, 2 provided                             /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/include/bits/waitstatus.h:66:7: note: candidate: ‘constexpr wait::wait(wait&&)’
/opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/include/bits/waitstatus.h:66:7: note:   candidate expects 1 argument, 2 provided
```

# N llvm links. LLVM link command needs 15GB
njobs=$((2*`nproc`))
LLVM_PARALLEL_LINK_JOBS=`grep MemTot /proc/meminfo | awk '{a=int($2/15100000); if(a>'"$njobs"') a='"$njobs"'; if(a<1) a=1; print a;}'`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you need to do all of this because you were using setting njobs incorrectly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the recommendations from https://llvm.org/docs/CMake.html#frequently-used-llvm-related-variables, section LLVM_PARALLEL_{COMPILE,LINK}_JOBS. Machines have often less than 15GB RAM per core, so it it is also needed without the factor 2.

Beware $nproc ignores the cpu affinity mask (used by slurm batch system to share cpus of a node).

Co-authored-by: Mosè Giordano <[email protected]>
Adding util_linux_jll did not solve missing uuid_gen symbol error.
Remove it and switch off Davix build.
@grasph
Copy link
Contributor

grasph commented Feb 17, 2025

delete ccache.

Wait, what, no! Don't do that fir any reason!

ok. It is shared with other _jll's ?

@giordano
Copy link
Member

ok. It is shared with other _jll's ?

There's a single shared cache for all the runners.

Ccachs has its own mechanism to disable the cache on demand, set environment variable CCACHE_DISABLE=1: https://ccache.dev/manual/4.8.2.html#_disabling_ccache

export CCACHE_DISABLE=true

Or you can also set USE_CCACHE=false (this is BinaryBuilder specific), but trashing weeks worth of cache would be a disaster.

@grasph
Copy link
Contributor

grasph commented Feb 17, 2025

It is fortunate, that I announced my plans before proceeding ;).

@giordano
Copy link
Member

But I don't understand what's the supposed problem with ccache: as I showed above the machine was not doing anything while the build process was stuck. Sounds like a build system issue to me.

@grasph
Copy link
Contributor

grasph commented Feb 18, 2025

But I don't understand what's the supposed problem with ccache [...]

The only difference I could find between build on my institute's facilities and on buildkite is that I was not using ccache. It's why I wanted to try. Anway, CCACHE_RECACHE, which should ignore ccache content, while updating it, did not help.

@giordano
Copy link
Member

is that I was not using ccach

We enable ccache automatically by default in BinaryBuilder, you'd have to opt out not to use it.

@grasph
Copy link
Contributor

grasph commented Feb 18, 2025

We enable ccache automatically by default in BinaryBuilder, you'd have to opt out not to use it.

OK. I had understood from the doc [1] that BINARYBUILDER_USE_CCACHE environment variable needed to be set.

I've found where it is stuck by disabling ninja jobs parallelization. Remains to understand why I cannot reproduce the issue when running the build myself. Another difference between buildkite and my environment can be the BinaryBuilder version. I've been using v0.6.3.

[1] https://docs.binarybuilder.org/stable/environment_variables/

@grasph grasph force-pushed the ROOT-2 branch 3 times, most recently from 71580c8 to 56ffe4e Compare February 18, 2025 21:08
@grasph
Copy link
Contributor

grasph commented Feb 19, 2025

@giordano, CI issue disappeared after removing the nlohman_jll dependency. It's not clear to me why including nlohman_jll introduce a deadlock, neither why I cannot reproduce the by running the build on my own.

Anyway, we can live without the dependency, as ROOT fallbacks in this case by using built-in version.

After reading the LAPACK/BLAS section of BinaryBuider documentation, I've fixed the BLAS dependency.

The PR should now be hopefully ready to be merged.

@giordano giordano merged commit 6717f80 into JuliaPackaging:master Feb 19, 2025
4 checks passed
@grasph
Copy link
Contributor

grasph commented Feb 19, 2025

Thanks @giordano for the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants