Better support for cluster environments #471

alexanderbock · 2022-09-15T08:33:02Z

Totally understand if this is out of scope and a pretty niche usecase.

In our institution we are having a planetarium that is running the same instance 6 times in a networked environment. In the past I have used Tracy in this environment by starting the GUI 7 times and connecting remotely to all instances manually. It would be really neat to be able to connect to all of the clients from a single GUI and possibly also align the timelines from all of the instances and show the places where one of the instances takes longer to execute a function, for example.

Just to be clear, this would be N instances of the same executable and they should always go through the same function calls and where they disagree is where the interesting stuff happens.

GCCFeli · 2022-11-08T12:41:56Z

I'm in a slightly different situation. I'm running several game services in cluster and each service has different functionality. It would be great if all services(clients) could connect to a single GUI and the timelines are aligned.

PeterTh · 2022-11-15T13:44:51Z

Cluster tooling is an entirely different can of worms, but FWIW we'd also be very interested in (even just basic) support for this use case. (Where "basic support" would probably mean ingesting data from several processes and aligning the times)

GCCFeli · 2022-11-17T05:32:11Z

Cluster tooling is an entirely different can of worms, but FWIW we'd also be very interested in (even just basic) support for this use case. (Where "basic support" would probably mean ingesting data from several processes and aligning the times)

Ingesting data from several processes and aligning the times are enough to my case. For now I'm hacking this by a simple proxy which is connected by cluster processes and act as the only client to tracy.

wolfpld · 2022-11-17T14:14:13Z

Making a proxy that would mux multiple clients would be a preferred solution here. To properly handle thread identifiers, which may be duplicated across different processes, you may use the already existing encoding:

tracy/import-chrome/src/import-chrome.cpp

Lines 143 to 183 in e1395f5

    
           // encode a pair of "real pid, real tid" from a trace into a 
        
           // pseudo thread ID living in the single namespace of Tracy threads. 
        
           struct PidTidEncoder 
        
           { 
        
               uint64_t tid; 
        
               uint64_t pid; 
        
               uint64_t pseudo_tid; // fake thread id, unique within Tracy 
        
           }; 
        
           std::vector<PidTidEncoder> tid_encoders; 
        
           std::vector<tracy::Worker::ImportEventTimeline> timeline; 
        
           std::vector<tracy::Worker::ImportEventMessages> messages; 
        
           std::vector<tracy::Worker::ImportEventPlots> plots; 
        
           std::unordered_map<uint64_t, std::string> threadNames; 
        
           const auto getPseudoTid = [&](json& val) -> uint64_t { 
        
               const auto real_tid = val["tid"].get<uint64_t>(); 
        
               if( val.contains( "pid" ) ) 
        
               { 
        
                   // there might be multiple processes so we allocate a pseudo-tid 
        
                   // for each pair (pid, real_tid) 
        
                   const auto pid = val["pid"].get<uint64_t>(); 
        
                   for ( auto &pair : tid_encoders) 
        
                   { 
        
                       if( pair.pid == pid && pair.tid == real_tid ) return pair.pseudo_tid; 
        
                   } 
        
                   assert( pid <= std::numeric_limits<uint32_t>::max() ); 
        
                   assert( real_tid <= std::numeric_limits<uint32_t>::max() ); 
        
                   const auto pseudo_tid = ( real_tid & 0xFFFFFFFF ) | ( pid << 32 ); 
        
                   tid_encoders.emplace_back(PidTidEncoder {real_tid, pid, pseudo_tid}); 
        
                   return pseudo_tid; 
        
               } 
        
               else 
        
               { 
        
                   return real_tid; 
        
               } 
        
           };

You can see how this works in #213 (comment).

In 0.9 there were many changes in how the timeline items are handled, which is not really visible to users right now. Each track displayed on the timeline is now an instance of https://github.com/wolfpld/tracy/blob/master/server/TracyTimelineItem.hpp and the management of these items is now well defined in https://github.com/wolfpld/tracy/blob/master/server/TracyTimelineController.hpp, instead of the mess it was before. The takeaway here is that it should be now relatively easy to rearrange the threads, so that threads originating from the same process are next to each other, or to add different colorings to thread backgrounds, etc.

jamesfmilne · 2023-02-02T13:29:22Z

Just wanted to add that at our company are looking into integrating Tracy into our development environment, and we also need to merge Tracy data multiple sources. At least two, but perhaps more, that are either on the same machine, or distributed across multiple machines.

Sounds like we have a very similar problem to everyone else in this thread. A mux is a good idea, especially if we can also use that mux to record a trace for later analysis.

Great work on Tracy!

asymingt · 2023-02-06T04:50:52Z

I tried my hand at writing a mux, and I thought I might add some color to this conversation based on my experience over the last few days. My application involves remote introspection of a target system comprising of many tracy-instrumented processes running at the same time. It should be much more convenient to run a mux/proxy on the target side, which aggregates streams from all processes into one point, shoving them all onto one unified timeline. The idea being that the mux would then present the aggregated data stream on tcp/w.x.y.z:8085, which would be easy to open-up on a firewall and push over the internet to a the profiling user interface running on some remote host (ie. with ./Tracy-release -a w.x.y.z -p 8085. As a side note, I had hoped to make it even easier and avoid a remote-side binary, and in stead offer a web-based (wasm) profiler, with the web server running on the target (I'm OK with it stealing resources). However, I haven't managed to get Emscripten to compile libcapstone into a sysroot, where it will successfully link against the wasm code (a guide on that would be greatly appreciated to help with development). So I'm sticking with the legacy/X11-based unix version of the profiler, becuase the wayland version doesn's work on Ubuntu 22.04 with NVidia 525.85.12.

Towards writing this mux, I was able to fairly easily scan for the UDP broadcast packets sent out on port 8086 by tracy clients. Decoding them was fairly straightforward, and I was able to extract the TCP listenPort, which all tracy-instrumented processes negotiate to be unique on start-up (it looks like the first one gets 8086, the next one gets 8087, etc up to a hard-coded 20 max). This is where things fell apart. I had intended to spin up a thread to start a worker to bind to all TCP streams, collect and forward. However, I can't seem to work out how the handshake / lz4-encoding works for the TCP stream, and how the on-demand and regular implementations of the TCP protocols differ from each other! I can probably work it out by following the code (which I think is call captured in the TracyWorker.{hpp, cpp} source files , I just need time :)

Here are some hacky implementations of UDP listeners (the first version using the network protocol API in tracy, and another version using Boost.asio) for anybody who wants a starting point.

Here's a CMakeLists.txt to build the UI and muxers all at once. I do this all in a Docker context, but the basic Ubuntu 22.04 pre-requsites are apt install libboost-all-dev libdbus-1-dev libcapstone-dev libglfw3-dev libfreetype-dev before trying anything below.

cmake_minimum_required(VERSION 3.5)
project(tracy_mux)
add_definitions(-DTRACY_ENABLE)

## TRACY CODE ###########################################

# Fetch the core interface library and make available to the next steps
include(FetchContent)
FetchContent_Declare(
  tracy
  GIT_REPOSITORY https://github.com/wolfpld/tracy.git
  GIT_TAG master
  GIT_SHALLOW TRUE
  GIT_PROGRESS TRUE)
FetchContent_MakeAvailable(tracy)
FetchContent_GetProperties(tracy)
message(STATUS "tracy: ${tracy_SOURCE_DIR} ${tracy_BINARY_DIR}")

## TRACY PROFILER UI ####################################### 

# Build the tracy profiler (server and UI)
include(ExternalProject)
ExternalProject_Add(tracy_profiler
  SOURCE_DIR ${tracy_SOURCE_DIR}/profiler/build/unix
  CONFIGURE_COMMAND ""
  BUILD_COMMAND ${CMAKE_COMMAND} -E env LEGACY=1 make -j all
  INSTALL_COMMAND cp ${tracy_SOURCE_DIR}/profiler/build/unix/Tracy-release ${CMAKE_CURRENT_BINARY_DIR}/tracy
  BUILD_IN_SOURCE TRUE)

## TRACY MUXER ########################################### 

find_package(Boost REQUIRED COMPONENTS thread)
add_executable(tracy_muxer_native tracy_muxer_native.cpp)
target_link_libraries(tracy_muxer_native TracyClient)

find_package(Boost REQUIRED COMPONENTS thread)
add_executable(tracy_muxer_boost tracy_muxer_boost.cpp)
target_link_libraries(tracy_muxer_boost TracyClient)
target_link_libraries(tracy_muxer_boost ${Boost_LIBRARIES})

One of the strange things about the native version of the UDP listener is that it finds itself ! In other words, when you run it, you see something like this...

ubuntu@mars:~/ros2_ws/src/libtracy_ros2/src/build$ ./tracy_muxer_native 
Starting listener...
Adding client with procName tracy_muxer_native  # <--- weird!

Also, don't be a numpty like me and forget to sudo ufw allow 8086/udp before trying anything above.

john-plate · 2023-02-15T15:25:43Z

My company is also using Tracy, great work!

We could also benefit a lot for the requested enhancement to support merging the traces of multiple clients in one GUI, especially to profile network latencies.

topolarity · 2023-02-27T17:38:30Z

Such a feature would also be very useful for, e.g., profiling applications that spawn child processes.

Build systems are one example where it'd be quite nice to have an end-to-end view of the performance timeline across all processes.

Arpafaucon · 2024-02-14T11:02:54Z

Hi all ! I am joining the team of people that would be interested in a way to collect multiple process traces into a same GUI windows.
Ideally, it would be even better to have all processes data in the same capture file, the proxy being an acceptable solution for that.

My company is willing to let me do some work on open-source projects of importance for us, and I'd be happy to contribute here. If you feel like you'd accept a contribution on that topic, I could help. (To be honest, I will surely need some help/guidance on this part to make it happen )

wolfpld · 2024-02-14T11:25:37Z

If you feel like you'd accept a contribution on that topic, I could help.

Sure.

Arpafaucon · 2024-02-14T18:57:21Z

Nice! Can I suggest the following plan?

I take some time to read more about the code, better understand what such a change would impact, and assess if I understand enough to do it cleanly. I should be good next week
then, would you be OK to take some time helping me figure out a good way to carry the change? (we can definitely do that though this issue, or a dedicated one)
on my side I'll have to check with management that they're OK
and then coding time for me ^^

cipharius · 2024-03-05T20:32:58Z

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

I have few bugs to iron out, but I am at a stage where broadcasting clients are automatically adopted, all client events are weaved into single event stream by splitting at ThreadContext boundaries, broadcasting server queries to all clients and picking single most appropriate response.

Edit:

My current progress on the prototype can be found here: https://github.com/cipharius/tracy/blob/feature/multiplex/multiplex/src/multiplex.cpp

And little preview of how it's looking right now:

I have conviniently hidden the tracy thread zones in that screenshot, because those currently get messed up when new clients connect, still need to figure that out. On Linux I'm not seeing any thread ID conflicts, so I didn't bother creating pseudo IDs yet.

Arpafaucon · 2024-03-29T08:35:45Z

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

Very kind of you to warn :) I had started digging into the existing code to get a sense of how things worked, but that's not lost time at all anyways

I can confirm my company is giving me time to work on this (roughly half a day per week). @cipharius would you accept help on your branch to make this happen ? The minor caveat is that I am on holidays from mid-april to early may, so if you go too fast you might well be finished before I get back and try to help^^

cipharius · 2024-03-29T09:56:13Z

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

Very kind of you to warn :) I had started digging into the existing code to get a sense of how things worked, but that's not lost time at all anyways

I can confirm my company is giving me time to work on this (roughly half a day per week). @cipharius would you accept help on your branch to make this happen ? The minor caveat is that I am on holidays from mid-april to early may, so if you go too fast you might well be finished before I get back and try to help^^

Sure, can try to help, I'll have to update the branch with local changes first.

Though the code being very prototypical and changing a lot, it might be tough to effectively collaborate on it.

The most helpful feedback right now would be testing it out. Right now I'm trying to figure out the last crucial bit of normalising the time between clients such that timeline is correctly displayed. You can try figuring out how time is represented in tracy, but by that time I might have figured out what's going wrong with my current attempts.

The most neutral help would be improving and testing the build scripts, since I only tested on linux and didn't pay too much attention to customising the build scripts. So would be good to see if it builds on windows for example.

wolfpld · 2024-04-13T09:39:55Z

Anyone interested in this feature should have a look at #766.

Arpafaucon · 2024-07-01T12:37:29Z

Hi @cipharius :)
With a lot of delay on my end, I finally managed to secure a one-week slot to work on this without interruptions, starting tomorrow (sorry for the short notice).

As you suggested, I'll start by testing your branch !

Sure, can try to help, I'll have to update the branch with local changes first.

friendly reminder, in case you did not carry on the update (I see your last push dates from April 4th, does that look a good date for you ? cf https://github.com/cipharius/tracy/commits/feature/multiplex/multiplex/src/multiplex.cpp )

The most neutral help would be improving and testing the build scripts, since I only tested on linux and didn't pay too much attention to customising the build scripts. So would be good to see if it builds on windows for example.

I don't have a working windows setup right now (I'm on Ubuntu Linux 22.04 / Xorg) - but if I get stuck at some point I might try this in a VM to check the build works.

Arpafaucon · 2024-07-02T18:24:09Z

@wolfpld , @cipharius I will be documenting my work in #822.

If you have a few minutes to spare to read/review my messages as they go along, that would be precious help for me. Otherwise, I am aware you probably have stuff to work on your own, so I will carry on on my own :) (I specifically created a new issue to limit noise on this one)

Arpafaucon · 2024-07-12T18:08:40Z

Heya: I worked on an alternative approach, simpler (but more limited - at least in this first version), in #825
I'd be glad if some people could take a look and check if this would help their workflow :)

wolfpld added the enhancement New feature or request label Nov 3, 2022

wolfpld mentioned this issue Feb 2, 2023

Can we see multiple clients (from different processes) all at once in the profiler user interface? #520

Closed

wolfpld mentioned this issue Feb 22, 2024

Feature Request: Support for multiple clients #734

Closed

cipharius mentioned this issue Apr 4, 2024

Implements experimental tracy client multiplexer #766

Open

Arpafaucon mentioned this issue Jul 2, 2024

(Investigation) Multi-process support in tracy #822

Open

Arpafaucon mentioned this issue Jul 12, 2024

[multicapture 3/3] Add multi-capture capabilities to capture + merge utility #825

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for cluster environments #471

Better support for cluster environments #471

alexanderbock commented Sep 15, 2022

GCCFeli commented Nov 8, 2022

PeterTh commented Nov 15, 2022 •

edited

Loading

GCCFeli commented Nov 17, 2022 •

edited

Loading

wolfpld commented Nov 17, 2022

jamesfmilne commented Feb 2, 2023

asymingt commented Feb 6, 2023 •

edited

Loading

john-plate commented Feb 15, 2023

topolarity commented Feb 27, 2023

Arpafaucon commented Feb 14, 2024

wolfpld commented Feb 14, 2024

Arpafaucon commented Feb 14, 2024

cipharius commented Mar 5, 2024 •

edited

Loading

Arpafaucon commented Mar 29, 2024

cipharius commented Mar 29, 2024

wolfpld commented Apr 13, 2024

Arpafaucon commented Jul 1, 2024

Arpafaucon commented Jul 2, 2024

Arpafaucon commented Jul 12, 2024

Better support for cluster environments #471

Better support for cluster environments #471

Comments

alexanderbock commented Sep 15, 2022

GCCFeli commented Nov 8, 2022

PeterTh commented Nov 15, 2022 • edited Loading

GCCFeli commented Nov 17, 2022 • edited Loading

wolfpld commented Nov 17, 2022

jamesfmilne commented Feb 2, 2023

asymingt commented Feb 6, 2023 • edited Loading

john-plate commented Feb 15, 2023

topolarity commented Feb 27, 2023

Arpafaucon commented Feb 14, 2024

wolfpld commented Feb 14, 2024

Arpafaucon commented Feb 14, 2024

cipharius commented Mar 5, 2024 • edited Loading

Arpafaucon commented Mar 29, 2024

cipharius commented Mar 29, 2024

wolfpld commented Apr 13, 2024

Arpafaucon commented Jul 1, 2024

Arpafaucon commented Jul 2, 2024

Arpafaucon commented Jul 12, 2024

PeterTh commented Nov 15, 2022 •

edited

Loading

GCCFeli commented Nov 17, 2022 •

edited

Loading

asymingt commented Feb 6, 2023 •

edited

Loading

cipharius commented Mar 5, 2024 •

edited

Loading