Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Added replay aware tracing filter #30

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

h7kanna
Copy link
Contributor

@h7kanna h7kanna commented Oct 30, 2024

Hi @slinkydeveloper,

I am trying to port the tracing filter I have in my SDK. I am not sure if this is a feature that you are looking for.
But what I want to achieve is to skip tracing within the service/workflow while they replay/suspend. It helps while developing.

Though I am not able to make it work because of the CoreVM debug instrumentation creating a debug span.

This needs the SDK to be updated with the latest 'sdk-share-core' as it needs 'is_processing' method on the CoreVM like 'is_processing' method

How it works

  1. User enables the tracing-subscriber feature
  2. User adds replay aware filter
  3. Filter only outputs events which have 'replaying = false' flag field
  4. Handle future toggles this field in the span between [replaying and processing]
  5. Tracing subscriber filters the events based on the value of the 'replaying' field in the span at the moment.

Note
Due to the limitation of how the tracing-subscriber works, 'replaying' field is displayed twice when the state machine switches between replaying and processing with the default formatter.
Json formatter does not have this issue though.

Example:

cargo run --features tracing-subscriber --example tracing
restate dp add --yes --force http://localhost:9080
curl -v localhost:8080/Greeter/greet -H 'content-type: application/json' -d '"60"'
h7kanna@Harshas-MBP sdk-rust % RUST_LOG=info cargo run --features tracing-subscriber --example tracing
   Compiling restate-sdk v0.3.0 (/Users/h7kanna/Repos/restatedev/sdk-rust)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.33s
     Running `target/debug/examples/tracing`
2024-10-30T17:01:18.703423Z  INFO restate_sdk::http_server: Starting listening on 0.0.0.0:9080
2024-10-30T17:01:27.433139Z  INFO handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet"}: tracing: This will be logged on replay
2024-10-30T17:01:28.443789Z  INFO handle{rpc.system="restate" rpc.service="Delayer" rpc.method="delay"}: tracing: Delayed for 1 seconds
2024-10-30T17:01:28.447667Z  INFO handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet"}: tracing: This will not be logged on replay
2024-10-30T17:02:28.442877Z  WARN handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet"}: restate_sdk::endpoint::futures::handler_state_aware: Error while processing handler Suspended rpc.system="restate" rpc.service=Greeter rpc.method=greet
2024-10-30T17:02:28.442961Z  WARN restate_sdk::hyper: Handler failure: Error(Suspended)
2024-10-30T17:02:28.447207Z  WARN handle{rpc.system="restate" rpc.service="Delayer" rpc.method="delay"}: restate_sdk::endpoint::futures::handler_state_aware: Error while processing handler Suspended rpc.system="restate" rpc.service=Delayer rpc.method=delay
2024-10-30T17:02:28.447275Z  WARN restate_sdk::hyper: Handler failure: Error(Suspended)
2024-10-30T17:02:28.449098Z  INFO handle{rpc.system="restate" rpc.service="Delayer" rpc.method="delay"}: tracing: Delayed for 60 seconds
2024-10-30T17:02:28.451720Z  INFO handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet"}: tracing: This will be logged on replay
2024-10-30T17:02:28.451854Z  INFO handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet" replaying=true replaying=false}: tracing: This will be logged on processing after suspension

Copy link
Collaborator

@slinkydeveloper slinkydeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I generally like the idea here of providing people a tracing-subscriber filter, but I think what's going on with the replaying field in the handle span (which is what you do in the shared-core PR) is incorrect.

I think replaying should be a field of the event, captured somehow from calling a method on the CoreVM to get that information every time a log event is created. Perhaps tracing has somewhat a feature alike thread local contexts where it can take this info, and automatically inject it in every event? Right now I don't really have much time to look into this, but perhaps I can look into this later.

Also, I would not push any tracing specific stuff in the shared core, I would just add CoreVM::is_replaying (alike CoreVM::is_processing, which is BTW what the javascript SDK uses to implement this feature: https://github.com/restatedev/sdk-typescript/blob/443630750266be4402215263a971ef8eed313f3c/packages/restate-sdk/src/endpoint/handlers/generic.ts#L286).

"rpc.system" = "restate",
"rpc.service" = self.svc_name,
"rpc.method" = self.handler_name,
"replaying" = field::Empty,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how and if this works.

replaying IMO should be an aspect of the individual events, not of the trace. Also what you did here https://github.com/restatedev/sdk-rust/pull/30/files#diff-6658812715089c8241f20814aaf257dbd94a5448e01cf0010a0a2da54406f844R70 i suspect it doesn't really work, because you make "replaying" a field of the whole invocation attempt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was looking for, But I did not realize restate-sdk-shared-core = "0.1.0" still in the SDK.

Yes, 'replaying' field is part of the events and it is injected on state changes. This is working in my SDK.

I will make necessary change and add a test. Not urgent. I am anyways using my fork.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I did not realize restate-sdk-shared-core = "0.1.0" still in the SDK.

For testing purposes, you can change the version to a git reference: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, there are other changes I need to do to update to the latest shared sdk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's working now though

Copy link
Contributor Author

@h7kanna h7kanna Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h7kanna@Harshas-MBP sdk-rust % cargo run --features tracing-subscriber --example tracing
   Compiling restate-sdk v0.3.0 (/Users/h7kanna/Repos/restatedev/sdk-rust)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.33s
     Running `target/debug/examples/tracing`
2024-10-30T17:01:18.703423Z  INFO restate_sdk::http_server: Starting listening on 0.0.0.0:9080
2024-10-30T17:01:27.433139Z  INFO handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet"}: tracing: This will be logged on replay
2024-10-30T17:01:28.443789Z  INFO handle{rpc.system="restate" rpc.service="Delayer" rpc.method="delay"}: tracing: Delayed for 1 seconds
2024-10-30T17:01:28.447667Z  INFO handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet"}: tracing: This will not be logged on replay
2024-10-30T17:02:28.442877Z  WARN handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet"}: restate_sdk::endpoint::futures::handler_state_aware: Error while processing handler Suspended rpc.system="restate" rpc.service=Greeter rpc.method=greet
2024-10-30T17:02:28.442961Z  WARN restate_sdk::hyper: Handler failure: Error(Suspended)
2024-10-30T17:02:28.447207Z  WARN handle{rpc.system="restate" rpc.service="Delayer" rpc.method="delay"}: restate_sdk::endpoint::futures::handler_state_aware: Error while processing handler Suspended rpc.system="restate" rpc.service=Delayer rpc.method=delay
2024-10-30T17:02:28.447275Z  WARN restate_sdk::hyper: Handler failure: Error(Suspended)
2024-10-30T17:02:28.449098Z  INFO handle{rpc.system="restate" rpc.service="Delayer" rpc.method="delay"}: tracing: Delayed for 60 seconds
2024-10-30T17:02:28.451720Z  INFO handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet"}: tracing: This will be logged on replay
2024-10-30T17:02:28.451854Z  INFO handle{rpc.system="restate" rpc.service="Greeter" rpc.method="greet" replaying=true replaying=false}: tracing: This will be logged on processing after suspension

This needs 'is_processing' method

@h7kanna
Copy link
Contributor Author

h7kanna commented Oct 30, 2024

Hi,

I made changes to move the tracing replay tracking logic to the Context. No changes needed in the sdk-shared-core. But this will work only when updated to the latest sdk-shared-core.

I will wait for the 'sdk-shared-core' to be updated. But the feature works(updated description above) and I will keep using the fork for now.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants