VecMem Update, main branch (2024.10.29.) #757
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updated the project to using vecmem-1.11.0.
This update, most importantly, includes: acts-project/vecmem#299 Which was made by @stephenswat to get the memory usage of our code under control. It also includes acts-project/vecmem#286, which is the reason for all the actual code changes in the PR...
Currently
traccc_throughput_st_cuda
uses the following amount of memory to reconstruct 100 mu=140 events:So, about 3.85 GB at its peak.
With this update included, this looks like:
About 2.77 GB at its peak.
To complete the picture, this is what I get when using vecmem::pool_memory_resource instead of vecmem::binary_page_memory_resource:
About 3.3 GB at its peak.
And finally, if I turn off memory caching completely, I get:
I.e. 200-700 MB of memory usage.
In all these cases, on the RTX 2060 GPU of my desktop machine, the throughput of reconstructing mu=140 events in a single thread/stream remained in +-10%. Without caching being the slowest of course, and
vecmem::pool_memory_resource
(which is based on CUB and std::pmr::unsynchronized_pool_resource, being the fastest. But only by little. Apparently with the full chain running, the caching is not dominating the performance at the moment. 🤔