Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an issue in simple-kernel-timer with mismatched linked symbols #199

Merged
merged 1 commit into from
Aug 3, 2023

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Jul 31, 2023

Basic Kokkos Serial build and test failed for me on MacOS. Symptom was that the simple kernel timer "currentEntry" symbol was null during end of parallel for call back.
Tracked that down to that symbol having different addresses inside the increment_counter function and the end of parallel_for callback.

I believe moving the increment_counter functions to not be inlined may help?

Or do we need accessor functions which are uniform?

At least this change did fix the issue for me.

Basic Kokkos Serial build and test failed for me on MacOS.
Symptom was that the simple kernel timer "currentEntry" symbol was null
during end of parallel for call back.
Tracked that down to that symbol having different addresses inside the
increment_counter function and the end of parallel_for callback.

I believe moving the increment_counter functions to not be inlined may help?

Or do we need accessor functions which are uniform?

At least this change did fix the issue for me.
@vlkale
Copy link
Contributor

vlkale commented Aug 1, 2023

Confirming that this PR of @crtrott works for me on a Mac OSX with latest Kokkos 4 from kokkos-core's develop branch.

@vlkale
Copy link
Contributor

vlkale commented Aug 1, 2023

Some sample output trying this out on my laptop, with the Kokkos develop branch using serial build. This also seems to work with other tools like space-time-stack (which is what is currently causing failing tests on PR #194 though fortuitously not on develop).

user@lap kto-dev-vk % ls
Build.md		Copyright.txt		README.md		common			kokkos.presets.json	vbuild
CMakeLists.txt		LICENSE			build-all.sh		debugging		profiling
CMakePresets.json	LICENSE_FILE_HEADER	cmake			example			tpls
user@lap kto-dev-vk % cmake -S . --preset=OpenMP -DKokkosTools_ENABLE_MPI="OFF"                                                  
Preset CMake variables:

  CMAKE_BUILD_TYPE="Release"
  CMAKE_CXX_STANDARD="17"
  KokkosTools_ENABLE_EXAMPLES="ON"
  KokkosTools_ENABLE_SINGLE="ON"

-- The CXX compiler identification is AppleClang 14.0.3.14030022
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- 
-- ConfiguringKokkos-Tools
-- 
-- Found Kokkos installation: /usr/local
		Devices: SERIAL
		Architecture: 
		TPLs: LIBDL
		Compiler: /Library/Developer/CommandLineTools/usr/bin/c++ (AppleClang)
		CMAKE_CXX_FLAGS: 
		Options: DEPRECATED_CODE_4;DEPRECATION_WARNINGS;COMPLEX_ALIGN
-- PAPI support disabled
-- MPI not available. MPI disabled.
CMake Warning at cmake/configure_variorum.cmake:23 (message):
  Variorum not found: set Variorum_ROOT CMake variable or VARIORUM_ROOT
  environment variable to build Variorum connector
Call Stack (most recent call first):
  CMakeLists.txt:95 (include)


CMake Warning at CMakeLists.txt:107 (message):
  Set VTUNE_HOME in environment or VTune_ROOT in build options to build VTune
  connectors


-- Apple OSX target detected.
-- Skipping memory-hwm-mpi (MPI disabled)
-- Building Monolithic KokkosTools library with profilers: kp_kernel_logger;kp_kernel_timer_json;kp_kernel_timer;kp_hwm;kp_memory_events;kp_memory_usage;kp_chrome_tracing;kp_space_time_stack;kp_perfetto_connector
-- Enabled Kokkos devices: SERIAL
-- Found installed Kokkos at /usr/local/lib/cmake/Kokkos
-- Configuring done (0.3s)
-- Generating done (0.1s)
-- Build files have been written to: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/ktools/kto-dev-vk/build-with-OpenMP
user@lap kto-dev-vk % cmake --build --preset=OpenMP  -DKokkosTools_ENABLE_MPI=OFF                                                    
Unknown argument -DKokkosTools_ENABLE_MPI=OFF
[  3%] Building CXX object common/kernel-filter/CMakeFiles/kp_kernel_filter.dir/kp_kernel_filter.cpp.o
[  6%] Linking CXX shared library libkp_kernel_filter.dylib
[  6%] Built target kp_kernel_filter
[  9%] Building CXX object common/kokkos-sampler/CMakeFiles/kp_kokkos_sampler.dir/kp_sampler_skip.cpp.o
[ 12%] Linking CXX shared library libkp_kokkos_sampler.dylib
[ 12%] Built target kp_kokkos_sampler
[ 15%] Building CXX object debugging/kernel-logger/CMakeFiles/kp_kernel_logger.dir/kp_kernel_logger.cpp.o
[ 18%] Linking CXX shared library libkp_kernel_logger.dylib
[ 18%] Built target kp_kernel_logger
[ 21%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_kernel_shared.dir/kp_shared.cpp.o
[ 24%] Linking CXX static library libkp_kernel_shared.a
[ 24%] Built target kp_kernel_shared
[ 27%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_kernel_timer_json.dir/kp_kernel_timer_json.cpp.o
[ 30%] Linking CXX shared library libkp_kernel_timer_json.dylib
[ 30%] Built target kp_kernel_timer_json
[ 33%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_kernel_timer.dir/kp_kernel_timer.cpp.o
[ 36%] Linking CXX shared library libkp_kernel_timer.dylib
[ 36%] Built target kp_kernel_timer
[ 39%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_reader.dir/kp_reader.cpp.o
[ 42%] Linking CXX executable kp_reader
[ 42%] Built target kp_reader
[ 45%] Building CXX object profiling/simple-kernel-timer/CMakeFiles/kp_json_writer.dir/kp_json_writer.cpp.o
[ 48%] Linking CXX executable kp_json_writer
[ 48%] Built target kp_json_writer
[ 51%] Building CXX object profiling/memory-hwm/CMakeFiles/kp_hwm.dir/kp_hwm.cpp.o
[ 54%] Linking CXX shared library libkp_hwm.dylib
[ 54%] Built target kp_hwm
[ 57%] Building CXX object profiling/memory-events/CMakeFiles/kp_memory_events.dir/kp_memory_events.cpp.o
[ 60%] Linking CXX shared library libkp_memory_events.dylib
[ 60%] Built target kp_memory_events
[ 63%] Building CXX object profiling/memory-usage/CMakeFiles/kp_memory_usage.dir/kp_memory_usage.cpp.o
[ 66%] Linking CXX shared library libkp_memory_usage.dylib
[ 66%] Built target kp_memory_usage
[ 69%] Building CXX object profiling/chrome-tracing/CMakeFiles/kp_chrome_tracing.dir/kp_chrome_tracing.cpp.o
[ 72%] Linking CXX shared library libkp_chrome_tracing.dylib
[ 72%] Built target kp_chrome_tracing
[ 75%] Building CXX object profiling/space-time-stack/CMakeFiles/kp_space_time_stack.dir/kp_space_time_stack.cpp.o
[ 78%] Linking CXX shared library libkp_space_time_stack.dylib
[ 78%] Built target kp_space_time_stack
[ 81%] Building CXX object profiling/perfetto-connector/CMakeFiles/kp_perfetto_connector.dir/libperfetto-connector.cpp.o
[ 84%] Building CXX object profiling/perfetto-connector/CMakeFiles/kp_perfetto_connector.dir/perfetto/perfetto.cc.o
[ 87%] Linking CXX shared library libkp_perfetto_connector.dylib
[ 87%] Built target kp_perfetto_connector
[ 90%] Building CXX object profiling/all/CMakeFiles/kokkostools.dir/kp_all.cpp.o
[ 93%] Linking CXX shared library libkokkostools.dylib
[ 93%] Built target kokkostools
[ 96%] Building CXX object example/CMakeFiles/kp_example.dir/main.cpp.o
[100%] Linking CXX executable kp_example
[100%] Built target kp_example
user@lap kto-dev-vk % sudo cmake --install build-with-OpenMP --prefix=/Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld
-- Install configuration: "Release"
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/include/kp_config.hpp
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/include/kp_all.hpp
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkokkostools.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_kernel_logger.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_kernel_shared.a
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_kernel_timer_json.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_kernel_timer.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/bin/kp_reader
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/bin/kp_json_writer
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_hwm.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_memory_events.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_memory_usage.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_chrome_tracing.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_space_time_stack.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/libkp_perfetto_connector.dylib
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/cmake/KokkosToolsConfig.cmake
-- Installing: /Users/vlkale/Desktop/vlap/wk/code/softwareTech/kokkos-dev/ksrlbld/lib/cmake/KokkosToolsConfig-release.cmake
user@lap kto-dev-vk % ls  
Build.md		Copyright.txt		README.md		cmake			example			tpls
CMakeLists.txt		LICENSE			build-all.sh		common			kokkos.presets.json	vbuild
CMakePresets.json	LICENSE_FILE_HEADER	build-with-OpenMP	debugging		profiling
user@lap kto-dev-vk % cd build-with-OpenMP 
user@lap build-with-OpenMP % ls
CMakeCache.txt		CTestTestfile.cmake	cmake_install.cmake	debugging		install_manifest.txt
CMakeFiles		Makefile		common			example			profiling
user@lap build-with-OpenMP % cd example 
user@lap example % ls
CMakeFiles		CTestTestfile.cmake	Makefile		cmake_install.cmake	kp_example
user@lap example % ./kp_example "kernel-timer"     
KokkosP: Simple Kernel Timer Library Initialized (sequence is 0, version: 20211015)
  Kokkos Version: 4.1.0
Compiler:
  KOKKOS_COMPILER_APPLECC: 6000
Architecture:
  CPU architecture: none
  Default Device: N6Kokkos6SerialE
  GPU architecture: none
  platform: 64bit
Atomics:
Vectorization:
  KOKKOS_ENABLE_PRAGMA_IVDEP: no
  KOKKOS_ENABLE_PRAGMA_LOOPCOUNT: no
  KOKKOS_ENABLE_PRAGMA_UNROLL: no
  KOKKOS_ENABLE_PRAGMA_VECTOR: no
Memory:
  KOKKOS_ENABLE_HBWSPACE: no
  KOKKOS_ENABLE_INTEL_MM_ALLOC: no
Options:
  KOKKOS_ENABLE_ASM: no
  KOKKOS_ENABLE_CXX17: yes
  KOKKOS_ENABLE_CXX20: no
  KOKKOS_ENABLE_CXX23: no
  KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK: no
  KOKKOS_ENABLE_HWLOC: no
  KOKKOS_ENABLE_LIBDL: yes
  KOKKOS_ENABLE_LIBRT: no
Host Serial Execution Space:
  KOKKOS_ENABLE_SERIAL: yes
Kokkos atomics disabled

Serial Runtime Configuration:

Result OK: S(100000) = 705082704

KokkosP: Kernel timing written to /Users/vlkale/Desktop/vlap/wk/code/softwareTech/ktools/kto-dev-vk/build-with-OpenMP/example/s1088602ca-83018.dat 
user@lap example % ./kp_example "space-time-stack"
  Kokkos Version: 4.1.0
Compiler:
  KOKKOS_COMPILER_APPLECC: 6000
Architecture:
  CPU architecture: none
  Default Device: N6Kokkos6SerialE
  GPU architecture: none
  platform: 64bit
Atomics:
Vectorization:
  KOKKOS_ENABLE_PRAGMA_IVDEP: no
  KOKKOS_ENABLE_PRAGMA_LOOPCOUNT: no
  KOKKOS_ENABLE_PRAGMA_UNROLL: no
  KOKKOS_ENABLE_PRAGMA_VECTOR: no
Memory:
  KOKKOS_ENABLE_HBWSPACE: no
  KOKKOS_ENABLE_INTEL_MM_ALLOC: no
Options:
  KOKKOS_ENABLE_ASM: no
  KOKKOS_ENABLE_CXX17: yes
  KOKKOS_ENABLE_CXX20: no
  KOKKOS_ENABLE_CXX23: no
  KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK: no
  KOKKOS_ENABLE_HWLOC: no
  KOKKOS_ENABLE_LIBDL: yes
  KOKKOS_ENABLE_LIBRT: no
Host Serial Execution Space:
  KOKKOS_ENABLE_SERIAL: yes
Kokkos atomics disabled

Serial Runtime Configuration:

Result OK: S(100000) = 705082704


BEGIN KOKKOS PROFILING REPORT:
TOTAL TIME: 0.000495458 seconds
TOP-DOWN TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <remainder> <kernels per second> <number of calls> <name> [type]
===================
|-> 1.88e-04 sec 38.0% 89.6% 0.0% 10.4% 1.06e+04 1 Computation [region]
    |-> 1.16e-04 sec 23.4% 100.0% 0.0% ------ 1 accumulate() [reduce]
    |-> 5.25e-05 sec 10.6% 100.0% 0.0% ------ 1 initialize() [for]

BOTTOM-UP TIME TREE:
<average time> <percent of total time> <percent time in Kokkos> <percent MPI imbalance> <number of calls> <name> [type]
===================
|-> 1.16e-04 sec 23.4% 100.0% 0.0% ------ 1 accumulate() [reduce]
|   |-> 1.16e-04 sec 23.4% 100.0% 0.0% 0.0% 0.00e+00 1 Computation [region]
|-> 5.25e-05 sec 10.6% 100.0% 0.0% ------ 1 initialize() [for]
|   |-> 5.25e-05 sec 10.6% 100.0% 0.0% 0.0% 0.00e+00 1 Computation [region]
|-> 1.96e-05 sec 4.0% 0.0% 0.0% 0.0% 0.00e+00 1 Computation [region]

KOKKOS HOST SPACE:
===================
MAX MEMORY ALLOCATED: 401.6 kB
ALLOCATIONS AT TIME OF HIGH WATER MARK:
  97.3% Computation/data
  2.7% Computation/accumulate()/Kokkos::Serial::scratch_mem

KOKKOS CUDA SPACE:
===================
MAX MEMORY ALLOCATED: 0.0 kB
ALLOCATIONS AT TIME OF HIGH WATER MARK:

KOKKOS HIP SPACE:
===================
MAX MEMORY ALLOCATED: 0.0 kB
ALLOCATIONS AT TIME OF HIGH WATER MARK:

KOKKOS SYCL SPACE:
===================
MAX MEMORY ALLOCATED: 0.0 kB
ALLOCATIONS AT TIME OF HIGH WATER MARK:

KOKKOS OpenMPTarget SPACE:
===================
MAX MEMORY ALLOCATED: 0.0 kB
ALLOCATIONS AT TIME OF HIGH WATER MARK:

Host process high water mark memory consumption: 2686976 kB

END KOKKOS PROFILING REPORT.
user@lap example % ./kp_example "memory-usage"    
  Kokkos Version: 4.1.0
Compiler:
  KOKKOS_COMPILER_APPLECC: 6000
Architecture:
  CPU architecture: none
  Default Device: N6Kokkos6SerialE
  GPU architecture: none
  platform: 64bit
Atomics:
Vectorization:
  KOKKOS_ENABLE_PRAGMA_IVDEP: no
  KOKKOS_ENABLE_PRAGMA_LOOPCOUNT: no
  KOKKOS_ENABLE_PRAGMA_UNROLL: no
  KOKKOS_ENABLE_PRAGMA_VECTOR: no
Memory:
  KOKKOS_ENABLE_HBWSPACE: no
  KOKKOS_ENABLE_INTEL_MM_ALLOC: no
Options:
  KOKKOS_ENABLE_ASM: no
  KOKKOS_ENABLE_CXX17: yes
  KOKKOS_ENABLE_CXX20: no
  KOKKOS_ENABLE_CXX23: no
  KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK: no
  KOKKOS_ENABLE_HWLOC: no
  KOKKOS_ENABLE_LIBDL: yes
  KOKKOS_ENABLE_LIBRT: no
Host Serial Execution Space:
  KOKKOS_ENABLE_SERIAL: yes
Kokkos atomics disabled

Serial Runtime Configuration:

Result OK: S(100000) = 705082704

user@lap example % ./kp_example "memory-events"
KokkosP: MemoryEvents loaded (sequence: 0, version: 20211015)
  Kokkos Version: 4.1.0
Compiler:
  KOKKOS_COMPILER_APPLECC: 6000
Architecture:
  CPU architecture: none
  Default Device: N6Kokkos6SerialE
  GPU architecture: none
  platform: 64bit
Atomics:
Vectorization:
  KOKKOS_ENABLE_PRAGMA_IVDEP: no
  KOKKOS_ENABLE_PRAGMA_LOOPCOUNT: no
  KOKKOS_ENABLE_PRAGMA_UNROLL: no
  KOKKOS_ENABLE_PRAGMA_VECTOR: no
Memory:
  KOKKOS_ENABLE_HBWSPACE: no
  KOKKOS_ENABLE_INTEL_MM_ALLOC: no
Options:
  KOKKOS_ENABLE_ASM: no
  KOKKOS_ENABLE_CXX17: yes
  KOKKOS_ENABLE_CXX20: no
  KOKKOS_ENABLE_CXX23: no
  KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK: no
  KOKKOS_ENABLE_HWLOC: no
  KOKKOS_ENABLE_LIBDL: yes
  KOKKOS_ENABLE_LIBRT: no
Host Serial Execution Space:
  KOKKOS_ENABLE_SERIAL: yes
Kokkos atomics disabled

Serial Runtime Configuration:

Result OK: S(100000) = 705082704

Copy link
Contributor

@vlkale vlkale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the files, where the function increment_counter are moved from .h to the corresponding .cpp file.

It makes and looks good to me.

I have tested this as well and this works for different cases of profilers. (see sample output in PR).

@masterleinad
Copy link
Contributor

I'd like to understand the issue some better. It's not quite clear to me if and how the current code exposes a ODR-violation.

@crtrott
Copy link
Member Author

crtrott commented Aug 1, 2023

Its not an ODR issue, I suspect that we got two different copies of the global variable via the two different shared libraries linked in?

@crtrott crtrott merged commit ef39361 into kokkos:develop Aug 3, 2023
5 checks passed
@crtrott crtrott deleted the fix_single_library_symbol_sharing branch August 3, 2023 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants