Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic expansion of thread data #294

Merged
merged 32 commits into from
Oct 16, 2023
Merged

Conversation

jrmadsen
Copy link
Collaborator

  • support dynamic number of threads created by user application
  • thread-local data is allocated in blocks of OMNITRACE_MAX_THREADS (defined at compile-time)
    • omnitrace stores a lot of thread-specific data in arrays indexed by an integral value unique to that thread
      • this data persists even after the thread destroys it's static thread_local allocations
    • previously, if OMNITRACE_MAX_THREADS=32 then the user application could only create ~31 additional threads at the absolute max before omnitrace aborted
    • Now, omnitrace will allocate chunks in block-sizes of OMNITRACE_MAX_THREADS to support an unlimited number of threads, i.e. once the 32nd additional thread is created, omnitrace will resize all the thread_data instances to support 32 more threads (i.e. the size is originally 32 and after the resize, the size is 64)
  • closes Segmentation fault sampling without instrumentation python app.  #220

@jrmadsen jrmadsen added bug fix Fixes a bug timemory Issue affects/involves timemory features/capabilities libomnitrace Involves omnitrace library cmake Modifies the CMake build system submodule Updates a git submodule libomnitrace-core Internal library containing core capabilities labels Jun 30, 2023
@jrmadsen jrmadsen changed the title Dynamic thread data Dynamic expansion of thread data Jun 30, 2023
@jrmadsen jrmadsen force-pushed the thread-data-update branch 3 times, most recently from 77eeb9f to aac3903 Compare July 6, 2023 23:19
- tests which exceeds OMNITRACE_MAX_THREADS value for thread creation
- include source files in /tests/source directory
- fail if a timemory hash is not resolved to a name
- remove env disabling of critical-trace and process-sampling
- make_unique in concepts.hpp
- add OMNITRACE_USE_ROCM_SMI to "process_sampling" category
- remove forced disabling of critical-trace in sampling mode
- parentheses for OMNITRACE_PREFER
- use tim::get_hash_id instead of tim::get_combined_hash_id
- added aligned_static_vector.hpp
  - similar to static_vector.hpp but attempts to align to cache line size
- alignment template parameter for stable_vector
- added missing aliases in static_vector
  - consistent with aligned_static_vector aliases
- track the peak number of threads created
- thread_info::get_peak_num_threads() returns the peak number of threads
- generic thread_data inherits from base_thread_data
- thread_data reworked to support dynamic expansion
- base_thread_data updated to invoke private_instance() function
- thread_data<optional<T>> uses stable_vector aligned to cache line width
- thread_data<identity<T>> uses stable_vector aligned to cache line width
- thread_data for optional and identity provide private private_instance function + friend to base_thread_data
- component_bundle_cache<T> is now thread_data<component_bundle_cache_impl<T>>
- thread_data<T>::instances -> thread_data<T>::instance(construct_on_thread{ ... })
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- tim::get_combined_hash_id -> tim::get_hash_id
- update progress_bundle usage to new thread_data API
- backtrace_metrics update
  - update to new thead_data API
  - add thread CPU time row in perfetto
  - fix potential bug when rusage categories are disabled
  - fix bug in operator-= not subtracting cpu time of rhs
- backtrace update
  - skip all child call-stack below 'tim::openmp::' if sampling_keep_internal = false
- pthread_gotcha::shutdown() invokes pthread_create_gotcha::shutdown()
- minor tweak to {start,stop}_bundle functions: pass in thread id
- update to new thread_data API
- track native handles of internal threads
- implement system with pthread_kill to stop dangling bundles
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- update to new thread_data API
- tim::get_combined_hash_id -> tim::get_hash_id
- update to new thread_data API
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- update to new thread_data API
- update to new thread_data API
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- invoke pthread_gotcha::shutdown before invoking OMPT finalize function
  - this prevents signals from being delivered to OpenMP threads
- replace get_timemory_hash_{ids,aliases} functions with copy_timemory_hash_ids function
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- tim::get_combined_hash_id -> tim::get_hash_id
- improvements to + error checking in thread_init function
- move copying timemory hash id/aliases to tracing.cpp
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- add -Wno-interference-size to suppress warning about use of std::hardware_destructive_interference
- improve scheme for waiting on child processes via waitpid instead of wait
- support running main routine multiple times
- push/pop regions in child process
- allow use to specify misc values via -D <name>=<value>
  - OMNITRACE_CACHELINE_SIZE
  - OMNITRACE_CACHELINE_SIZE_MIN
  - OMNITRACE_ROCM_MAX_COUNTERS
- remove unused defines
  - OMNITRACE_ROCM_LOOK_AHEAD
  - OMNITRACE_MAX_ROCM_QUEUES
- OMNITRACE_MAX_ROCM_COUNTERS -> OMNITRACE_ROCM_MAX_COUNTERS
- set cacheline_align_v from max of OMNITRACE_CACHELINE_SIZE and OMNITRACE_CACHELINE_SIZE_MIN
- acquire locks for updating main hash ids/aliases
- only propagate ids/aliases when finalizing
- make sure hash for "start_thread" exists on main thread
- if OMNITRACE_BUILD_NUMBER is 1, set OMNITRACE_VERBOSE=0
@jrmadsen jrmadsen merged commit 518c83e into ROCm:main Oct 16, 2023
47 checks passed
@jrmadsen jrmadsen deleted the thread-data-update branch October 16, 2023 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixes a bug cmake Modifies the CMake build system libomnitrace Involves omnitrace library libomnitrace-core Internal library containing core capabilities submodule Updates a git submodule timemory Issue affects/involves timemory features/capabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault sampling without instrumentation python app.
1 participant