-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance LLVM
easyblock for compilation of clang/flang + other llvm-projects
#3373
base: develop
Are you sure you want to change the base?
Conversation
I haven't looked at this in detail, but I'm not a big fan of "forking" an existing easyblock, and then updating it. There seems to be a lot of overlap still with the custom easyblock for Clang. I do understand that the other option (updating the existing custom easyblock for Clang to introduce updates/enhancements) is painful too though, partially since there's a lot of history there. @Crivella How difficult would it be to integrate the functionality you need in the existing easyblock for Clang (or LLVM, I'm not even sure which one is most relevant here, but I guess it's the Clang one). |
@boegel I will try to work on it as soon as I finish all the current tests (there is still something to iron out on non shared builds, and understanding if the test failures should be addressed or can be ignored). I do not know if it would still make sense to keep the Clang naming scheme as this would end up including also Flang and other optimization tools usable also with other frontends. |
@boegel Isn't it a lot to ask to merge these easyblocks? The reality is that we already have two other forks for LLVM and AOMP, so the precedent already exists for taking a new/alternative path. |
@boegel There is also the fact that for the |
One thing I don't actually like is the name, |
As an update, i've also tested now with all tools enabled and
It should be noted that before 34341 (28.84%) were flagged as unsupported while now only 4588 (3.85%) which should be due to the |
UPDATE Testing with openmp offloading on VEGA frontend node:
With:
I encountered extra errors related to
As a sanitycheck I also recompiled on my WS by allowing libomptarget to be compiled for x86 (before i had it enabled only for nvptx and amd) and i get a similar number of test errors (268) Beside that i tried to compile a simple openmp application and the GPU offloading seems to be working correctly (verified the GPU is being used from nvtop) |
1 similar comment
UPDATE Testing with openmp offloading on VEGA frontend node:
With:
I encountered extra errors related to
As a sanitycheck I also recompiled on my WS by allowing libomptarget to be compiled for x86 (before i had it enabled only for nvptx and amd) and i get a similar number of test errors (268) Beside that i tried to compile a simple openmp application and the GPU offloading seems to be working correctly (verified the GPU is being used from nvtop) |
8c906e5
to
3b348e2
Compare
I've tried to build LLVM using your easyblock and easyconfig PRs on a fresh I haven't looked to deeply into which and how many tests failed though. I built everything on my local workstation (Intel Core i7-12700, AMD Radeon RX7700XT). |
Thanks for the report, will look into it |
Yes, exactly. When setting it to
|
I have an idea on what might be happening. Was the error during the In case it is the latter it is very likely that the reason would be that I had another python available and capable of working on my WS during the build which was causing the sanity check to pass even if python was not in the normal dependencies instead of the build ones. Let me know if this is the case, in the meanwhile i will fix the EC files |
dede0bc
to
d601b63
Compare
UPDATE: after some discussion in the 2024/07/17 Easybuild Conference Call, it was decided to replace the To this end, the following has been added:
The following tests have been manually performed for sanity check:
|
LLVMcore
easyblock for compilation of clang/flang + other llvm-projectsLLVM
easyblock update for compilation of clang/flang + other llvm-projects
2349af1
to
21ad3f5
Compare
LLVM
easyblock update for compilation of clang/flang + other llvm-projectsLLVM
easyblock for compilation of clang/flang + other llvm-projects
easybuild/easyblocks/l/llvm.py
Outdated
unknown_targets = [target for target in build_targets if target not in CLANG_TARGETS] | ||
unknown_targets = set(build_targets) - set(ALL_TARGETS) | ||
|
||
if unknown_targets: | ||
raise EasyBuildError("Some of the chosen build targets (%s) are not in %s.", | ||
', '.join(unknown_targets), ', '.join(CLANG_TARGETS)) | ||
', '.join(unknown_targets), ', '.join(ALL_TARGETS)) | ||
exp_targets = set(build_targets) & set(LLVM_EXPERIMENTAL_TARGETS) | ||
if exp_targets: | ||
self.log.warning("Experimental targets %s are being used.", ', '.join(exp_targets)) | ||
|
||
self.cfg.update('configopts', '-DLLVM_TARGETS_TO_BUILD="%s"' % ';'.join(build_targets)) | ||
self.build_targets = build_targets or [] | ||
|
||
if LooseVersion(self.version) >= LooseVersion('15.0'): | ||
# make sure that CMake modules are available in build directory, | ||
# by moving the extracted folder to the expected location | ||
cmake_modules_path = os.path.join(self.builddir, 'cmake-%s.src' % self.version) | ||
if os.path.exists(cmake_modules_path): | ||
move_file(cmake_modules_path, os.path.join(self.builddir, 'cmake')) | ||
self.nvptx_cond = 'NVPTX' in self.build_targets | ||
self.amd_cond = 'AMDGPU' in self.build_targets | ||
self.all_cond = 'all' in self.build_targets | ||
self.cuda_cc = [] | ||
if self.nvptx_cond or self.all_cond: | ||
# list of CUDA compute capabilities to use can be specifed in two ways (where (2) overrules (1)): | ||
# (1) in the easyconfig file, via the custom cuda_compute_capabilities; | ||
# (2) in the EasyBuild configuration, via --cuda-compute-capabilities configuration option; | ||
ec_cuda_cc = self.cfg['cuda_compute_capabilities'] | ||
cfg_cuda_cc = build_option('cuda_compute_capabilities') | ||
cuda_cc = cfg_cuda_cc or ec_cuda_cc or [] | ||
if not cuda_cc and self.nvptx_cond: | ||
raise EasyBuildError( | ||
"Can't build Clang with CUDA support without specifying 'cuda-compute-capabilities'" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I'm wrong:
The current approach would just take the host architecture and not NVPTX / AMDGPU if the user doesn't explicitly set anything in their EasyConfig, right? Would it make sense to use the approach from the Clang EasyBlock or detect enabling them in another way (e.g. via cuda_compute_capabilities
)?
if build_targets is None:
deps = [dep['name'].lower() for dep in self.cfg.dependencies()]
arch = get_cpu_architecture()
try:
default_targets = DEFAULT_TARGETS_MAP[arch][:]
# If CUDA is included as a dep, add NVPTX as a target
# There are (old) toolchains with CUDA as part of the toolchain
cuda_toolchain = hasattr(self.toolchain, 'COMPILER_CUDA_FAMILY')
if 'cuda' in deps or cuda_toolchain:
default_targets += ['NVPTX']
# For AMDGPU support we need ROCR-Runtime and
# ROCT-Thunk-Interface, however, since ROCT is a dependency of
# ROCR we only check for the ROCR-Runtime here
# https://openmp.llvm.org/SupportAndFAQ.html#q-how-to-build-an-openmp-amdgpu-offload-capable-compiler
if 'rocr-runtime' in deps:
default_targets += ['AMDGPU']
self.cfg['build_targets'] = build_targets = default_targets
self.log.debug("Using %s as default build targets for CPU/GPU architecture %s.", default_targets, arch)
except KeyError:
raise EasyBuildError("No default build targets defined for CPU architecture %s.", arch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes currently only the host architecture is used by default.
What the Clang
EB does is just check if cuda
or rocr-runtime
is in the dependencies.
I think atleast for CUDA i had removed it as i was able to get offload going without adding CUDA to the deps (will have to double check as i did it a couple months ago and don't have access to the frontend where i tested this).
For using cuda_compute_capabilities
i followed the same approach as the Clang
EB of checking that it is defined when NVPTX
is enabled as a target and set the proper variables (or fail if it is not defined).
I guess it would make sense to automatically enable NVPTX/AMDGPU if the respective deps are specified in the EC or if cuda_compute_capabilities
/amd_gfx_list
are defined
I am thinking the advantage of the automatic detection would be to be able to have NVPTX/AMDGPU enabled without having to also manually specify the CPU target.
I can try adding the original check and see how it goes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would make sense to automatically enable NVPTX/AMDGPU if the respective deps are specified in the EC or if cuda_compute_capabilities/amd_gfx_list are defined
For our systems at JSC, I added this small snippet to our "custom" LLVM EasyBlock and used your second idea to check if NVPTX should be built:
cuda_cc_list = build_option('cuda_compute_capabilities') or self.cfg['cuda_compute_capabilities']
if cuda_cc_list:
build_targets += ["NVPTX"]
self.log.debug("Add NVPTX as additional build target architecture")
The code is not perfect, but is likely sufficient for our use case. LLVM does not require having CUDA / ROCm to build the respective NVPTX / AMDGPU backends as far as I am aware of, only when actually using them.
I am thinking the advantage of the automatic detection would be to be able to have NVPTX/AMDGPU enabled without having to also manually specify the CPU target.
Agreed, this was my thought as well. Especially if one has a "global" repository for multiple systems with different architectures, needing to specify the CPU target each time might get messy. Users are also still able to specify the targets manually, as the automatic detection is only done when build_targets
is not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented the automatic detection based on having cuda_compute_capabilities
or amd_gfx_list
defined (cac8fe9).
For now this is being done only if no build_targets
are being manually specified.
It is also true that if someone is specifying either of the 2 variables they might want to have the target enabled, but i feel giving priority should still be given to build_targets
(Maybe could add a check for printing a warning or outright raising an error if cuda_cc/amd_gfx are defined and the corresponding targets are not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your current approach is a good idea, only enabling the backends if the user has not defined which ones to build. Adding a warning (or at least a log entry) if he might've missed one might be a good idea.
0a32c58
to
0dbb30c
Compare
406098e
to
d497b21
Compare
This EB is a modified version of the current
Clang
one (whether to keep them separate, having a simpler EB for Clang-only, or rename Clang to LLVMcore or viceversa is up for debate).Requires testing of:
It enables building most of the main LLVM projects. What is still missing is/has not been tested:
libc
libclc
llvm-libgcc
pstl
The current EB improves on the Clang one by:
LLVM >= 18
Currently/WIP:
The
full_llvm
build finishes successfully with 74 out of 119128 tests failingbolt
clang tools
mlir
lldb
ompd
lldb
with a static build (55 out of 119090)the
bootstrapped
nonfull_llvm
build finishes successfully with 46 out of 119897 tests failingbolt
clang tools
mlir
MemProfiler-x86_64-linux
lldb
ompd
the non
bootstrapped
version builds successfully but not build the test-suite (runningmake -j 1 check-all
inside the builddirworked leading to 52 failures + 2 timeout out of 119900) (NOTE: the number of MemProfiler and timeout failure is not consistent)
The last stage with all the required projects is always the longest and it seems that
clang
is faster at building (at-least within the scope of the LLVM project)make -j XXX (!=1)
-j 1
if the parallel build fails and the installation runs successfullybolt
clang tools
mlir
MemProfiler-x86_64-linux
(this number is not consistent across builds)lldb
ompd
Improving sanity check depending on enabled options
Notes:
iostream
seems to work both withlibc++
andlibstdc++
(tested with the
full_llvm
build and thebootstrapped
non full one)libomp
andlibgomp
but the one compiled withlibgomp
does not seem to recognizeOMP_NUM_THREADS
and runs with 1 thread.Tested with C (
clang
), C++ (clang++) and fortran (flang-new
)Possible TODOs:
clang
(Tested on VEGA front-end with an A100)flang-new