Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for cluster environments #471

Open
alexanderbock opened this issue Sep 15, 2022 · 18 comments
Open

Better support for cluster environments #471

alexanderbock opened this issue Sep 15, 2022 · 18 comments
Labels
enhancement New feature or request

Comments

@alexanderbock
Copy link
Contributor

Totally understand if this is out of scope and a pretty niche usecase.

In our institution we are having a planetarium that is running the same instance 6 times in a networked environment. In the past I have used Tracy in this environment by starting the GUI 7 times and connecting remotely to all instances manually. It would be really neat to be able to connect to all of the clients from a single GUI and possibly also align the timelines from all of the instances and show the places where one of the instances takes longer to execute a function, for example.

Just to be clear, this would be N instances of the same executable and they should always go through the same function calls and where they disagree is where the interesting stuff happens.

@wolfpld wolfpld added the enhancement New feature or request label Nov 3, 2022
@GCCFeli
Copy link

GCCFeli commented Nov 8, 2022

I'm in a slightly different situation. I'm running several game services in cluster and each service has different functionality. It would be great if all services(clients) could connect to a single GUI and the timelines are aligned.

@PeterTh
Copy link

PeterTh commented Nov 15, 2022

Cluster tooling is an entirely different can of worms, but FWIW we'd also be very interested in (even just basic) support for this use case. (Where "basic support" would probably mean ingesting data from several processes and aligning the times)

@GCCFeli
Copy link

GCCFeli commented Nov 17, 2022

Cluster tooling is an entirely different can of worms, but FWIW we'd also be very interested in (even just basic) support for this use case. (Where "basic support" would probably mean ingesting data from several processes and aligning the times)

Ingesting data from several processes and aligning the times are enough to my case. For now I'm hacking this by a simple proxy which is connected by cluster processes and act as the only client to tracy.

@wolfpld
Copy link
Owner

wolfpld commented Nov 17, 2022

Making a proxy that would mux multiple clients would be a preferred solution here. To properly handle thread identifiers, which may be duplicated across different processes, you may use the already existing encoding:

// encode a pair of "real pid, real tid" from a trace into a
// pseudo thread ID living in the single namespace of Tracy threads.
struct PidTidEncoder
{
uint64_t tid;
uint64_t pid;
uint64_t pseudo_tid; // fake thread id, unique within Tracy
};
std::vector<PidTidEncoder> tid_encoders;
std::vector<tracy::Worker::ImportEventTimeline> timeline;
std::vector<tracy::Worker::ImportEventMessages> messages;
std::vector<tracy::Worker::ImportEventPlots> plots;
std::unordered_map<uint64_t, std::string> threadNames;
const auto getPseudoTid = [&](json& val) -> uint64_t {
const auto real_tid = val["tid"].get<uint64_t>();
if( val.contains( "pid" ) )
{
// there might be multiple processes so we allocate a pseudo-tid
// for each pair (pid, real_tid)
const auto pid = val["pid"].get<uint64_t>();
for ( auto &pair : tid_encoders)
{
if( pair.pid == pid && pair.tid == real_tid ) return pair.pseudo_tid;
}
assert( pid <= std::numeric_limits<uint32_t>::max() );
assert( real_tid <= std::numeric_limits<uint32_t>::max() );
const auto pseudo_tid = ( real_tid & 0xFFFFFFFF ) | ( pid << 32 );
tid_encoders.emplace_back(PidTidEncoder {real_tid, pid, pseudo_tid});
return pseudo_tid;
}
else
{
return real_tid;
}
};

You can see how this works in #213 (comment).

In 0.9 there were many changes in how the timeline items are handled, which is not really visible to users right now. Each track displayed on the timeline is now an instance of https://github.com/wolfpld/tracy/blob/master/server/TracyTimelineItem.hpp and the management of these items is now well defined in https://github.com/wolfpld/tracy/blob/master/server/TracyTimelineController.hpp, instead of the mess it was before. The takeaway here is that it should be now relatively easy to rearrange the threads, so that threads originating from the same process are next to each other, or to add different colorings to thread backgrounds, etc.

@jamesfmilne
Copy link

Just wanted to add that at our company are looking into integrating Tracy into our development environment, and we also need to merge Tracy data multiple sources. At least two, but perhaps more, that are either on the same machine, or distributed across multiple machines.

Sounds like we have a very similar problem to everyone else in this thread. A mux is a good idea, especially if we can also use that mux to record a trace for later analysis.

Great work on Tracy!

@asymingt
Copy link

asymingt commented Feb 6, 2023

I tried my hand at writing a mux, and I thought I might add some color to this conversation based on my experience over the last few days. My application involves remote introspection of a target system comprising of many tracy-instrumented processes running at the same time. It should be much more convenient to run a mux/proxy on the target side, which aggregates streams from all processes into one point, shoving them all onto one unified timeline. The idea being that the mux would then present the aggregated data stream on tcp/w.x.y.z:8085, which would be easy to open-up on a firewall and push over the internet to a the profiling user interface running on some remote host (ie. with ./Tracy-release -a w.x.y.z -p 8085. As a side note, I had hoped to make it even easier and avoid a remote-side binary, and in stead offer a web-based (wasm) profiler, with the web server running on the target (I'm OK with it stealing resources). However, I haven't managed to get Emscripten to compile libcapstone into a sysroot, where it will successfully link against the wasm code (a guide on that would be greatly appreciated to help with development). So I'm sticking with the legacy/X11-based unix version of the profiler, becuase the wayland version doesn's work on Ubuntu 22.04 with NVidia 525.85.12.

Towards writing this mux, I was able to fairly easily scan for the UDP broadcast packets sent out on port 8086 by tracy clients. Decoding them was fairly straightforward, and I was able to extract the TCP listenPort, which all tracy-instrumented processes negotiate to be unique on start-up (it looks like the first one gets 8086, the next one gets 8087, etc up to a hard-coded 20 max). This is where things fell apart. I had intended to spin up a thread to start a worker to bind to all TCP streams, collect and forward. However, I can't seem to work out how the handshake / lz4-encoding works for the TCP stream, and how the on-demand and regular implementations of the TCP protocols differ from each other! I can probably work it out by following the code (which I think is call captured in the TracyWorker.{hpp, cpp} source files , I just need time :)

Here are some hacky implementations of UDP listeners (the first version using the network protocol API in tracy, and another version using Boost.asio) for anybody who wants a starting point.

Here's a CMakeLists.txt to build the UI and muxers all at once. I do this all in a Docker context, but the basic Ubuntu 22.04 pre-requsites are apt install libboost-all-dev libdbus-1-dev libcapstone-dev libglfw3-dev libfreetype-dev before trying anything below.

cmake_minimum_required(VERSION 3.5)
project(tracy_mux)
add_definitions(-DTRACY_ENABLE)

## TRACY CODE ###########################################

# Fetch the core interface library and make available to the next steps
include(FetchContent)
FetchContent_Declare(
  tracy
  GIT_REPOSITORY https://github.com/wolfpld/tracy.git
  GIT_TAG master
  GIT_SHALLOW TRUE
  GIT_PROGRESS TRUE)
FetchContent_MakeAvailable(tracy)
FetchContent_GetProperties(tracy)
message(STATUS "tracy: ${tracy_SOURCE_DIR} ${tracy_BINARY_DIR}")

## TRACY PROFILER UI ####################################### 

# Build the tracy profiler (server and UI)
include(ExternalProject)
ExternalProject_Add(tracy_profiler
  SOURCE_DIR ${tracy_SOURCE_DIR}/profiler/build/unix
  CONFIGURE_COMMAND ""
  BUILD_COMMAND ${CMAKE_COMMAND} -E env LEGACY=1 make -j all
  INSTALL_COMMAND cp ${tracy_SOURCE_DIR}/profiler/build/unix/Tracy-release ${CMAKE_CURRENT_BINARY_DIR}/tracy
  BUILD_IN_SOURCE TRUE)

## TRACY MUXER ########################################### 

find_package(Boost REQUIRED COMPONENTS thread)
add_executable(tracy_muxer_native tracy_muxer_native.cpp)
target_link_libraries(tracy_muxer_native TracyClient)

find_package(Boost REQUIRED COMPONENTS thread)
add_executable(tracy_muxer_boost tracy_muxer_boost.cpp)
target_link_libraries(tracy_muxer_boost TracyClient)
target_link_libraries(tracy_muxer_boost ${Boost_LIBRARIES})

One of the strange things about the native version of the UDP listener is that it finds itself ! In other words, when you run it, you see something like this...

ubuntu@mars:~/ros2_ws/src/libtracy_ros2/src/build$ ./tracy_muxer_native 
Starting listener...
Adding client with procName tracy_muxer_native  # <--- weird!

Also, don't be a numpty like me and forget to sudo ufw allow 8086/udp before trying anything above.

@john-plate
Copy link
Contributor

My company is also using Tracy, great work!

We could also benefit a lot for the requested enhancement to support merging the traces of multiple clients in one GUI, especially to profile network latencies.

@topolarity
Copy link
Contributor

Such a feature would also be very useful for, e.g., profiling applications that spawn child processes.

Build systems are one example where it'd be quite nice to have an end-to-end view of the performance timeline across all processes.

@Arpafaucon
Copy link
Contributor

Hi all ! I am joining the team of people that would be interested in a way to collect multiple process traces into a same GUI windows.
Ideally, it would be even better to have all processes data in the same capture file, the proxy being an acceptable solution for that.

My company is willing to let me do some work on open-source projects of importance for us, and I'd be happy to contribute here. If you feel like you'd accept a contribution on that topic, I could help. (To be honest, I will surely need some help/guidance on this part to make it happen )

@wolfpld
Copy link
Owner

wolfpld commented Feb 14, 2024

If you feel like you'd accept a contribution on that topic, I could help.

Sure.

@Arpafaucon
Copy link
Contributor

Nice! Can I suggest the following plan?

  • I take some time to read more about the code, better understand what such a change would impact, and assess if I understand enough to do it cleanly. I should be good next week
  • then, would you be OK to take some time helping me figure out a good way to carry the change? (we can definitely do that though this issue, or a dedicated one)
  • on my side I'll have to check with management that they're OK
  • and then coding time for me ^^

@cipharius
Copy link

cipharius commented Mar 5, 2024

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

I have few bugs to iron out, but I am at a stage where broadcasting clients are automatically adopted, all client events are weaved into single event stream by splitting at ThreadContext boundaries, broadcasting server queries to all clients and picking single most appropriate response.

Edit:

My current progress on the prototype can be found here: https://github.com/cipharius/tracy/blob/feature/multiplex/multiplex/src/multiplex.cpp

And little preview of how it's looking right now:
screenshot

I have conviniently hidden the tracy thread zones in that screenshot, because those currently get messed up when new clients connect, still need to figure that out. On Linux I'm not seeing any thread ID conflicts, so I didn't bother creating pseudo IDs yet.

@Arpafaucon
Copy link
Contributor

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

Very kind of you to warn :) I had started digging into the existing code to get a sense of how things worked, but that's not lost time at all anyways

I can confirm my company is giving me time to work on this (roughly half a day per week). @cipharius would you accept help on your branch to make this happen ? The minor caveat is that I am on holidays from mid-april to early may, so if you go too fast you might well be finished before I get back and try to help^^

@cipharius
Copy link

I just wanted to warn you that I am almost done with initial multiplexer prototype, so that you don't end up doing duplicate work.

Very kind of you to warn :) I had started digging into the existing code to get a sense of how things worked, but that's not lost time at all anyways

I can confirm my company is giving me time to work on this (roughly half a day per week). @cipharius would you accept help on your branch to make this happen ? The minor caveat is that I am on holidays from mid-april to early may, so if you go too fast you might well be finished before I get back and try to help^^

Sure, can try to help, I'll have to update the branch with local changes first.

Though the code being very prototypical and changing a lot, it might be tough to effectively collaborate on it.

The most helpful feedback right now would be testing it out. Right now I'm trying to figure out the last crucial bit of normalising the time between clients such that timeline is correctly displayed. You can try figuring out how time is represented in tracy, but by that time I might have figured out what's going wrong with my current attempts.

The most neutral help would be improving and testing the build scripts, since I only tested on linux and didn't pay too much attention to customising the build scripts. So would be good to see if it builds on windows for example.

@wolfpld
Copy link
Owner

wolfpld commented Apr 13, 2024

Anyone interested in this feature should have a look at #766.

@Arpafaucon
Copy link
Contributor

Hi @cipharius :)
With a lot of delay on my end, I finally managed to secure a one-week slot to work on this without interruptions, starting tomorrow (sorry for the short notice).

As you suggested, I'll start by testing your branch !

Sure, can try to help, I'll have to update the branch with local changes first.

friendly reminder, in case you did not carry on the update (I see your last push dates from April 4th, does that look a good date for you ? cf https://github.com/cipharius/tracy/commits/feature/multiplex/multiplex/src/multiplex.cpp )

The most neutral help would be improving and testing the build scripts, since I only tested on linux and didn't pay too much attention to customising the build scripts. So would be good to see if it builds on windows for example.

I don't have a working windows setup right now (I'm on Ubuntu Linux 22.04 / Xorg) - but if I get stuck at some point I might try this in a VM to check the build works.

@Arpafaucon
Copy link
Contributor

@wolfpld , @cipharius I will be documenting my work in #822.

If you have a few minutes to spare to read/review my messages as they go along, that would be precious help for me. Otherwise, I am aware you probably have stuff to work on your own, so I will carry on on my own :) (I specifically created a new issue to limit noise on this one)

@Arpafaucon
Copy link
Contributor

Heya: I worked on an alternative approach, simpler (but more limited - at least in this first version), in #825
I'd be glad if some people could take a look and check if this would help their workflow :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants