Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tracing logs for Nexus HTTP request retries #7186

Merged
merged 9 commits into from
Feb 6, 2025

Conversation

pdoerner
Copy link
Contributor

@pdoerner pdoerner commented Jan 30, 2025

What changed?

Added additional HTTP tracing logs for the first 3 Nexus request retries.
Both minimum and maximum attempts to add additional tracing info to are configurable.

Why?

To aid in debugging.

How did you test it?

Potential risks

Documentation

Is hotfix candidate?

Comment on lines 869 to 879
NexusHTTPTraceMinAttempt = NewNamespaceIntSetting(
"system.nexusHTTPTraceMinAttempt",
2,
`The minimum attempt of a Nexus request which will log additional HTTP request tracing information.
WARNING: setting to 0 or 1 will log additional tracing information for ALL Nexus HTTP requests and may be expensive.`,
)
NexusHTTPTraceMaxAttempt = NewNamespaceIntSetting(
"system.nexusHTTPTraceMaxAttempt",
5,
`The maximum attempt of a Nexus request which will log additional HTTP request tracing information.
Set to 0 to disable tracing.`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be disabled by default. Generally, I don't think the attempt number matters here, it's more about what to trace IMHO. E.g. whether TLSHandshakeStart should be traced or not. Ideally the config would be for a set of methods, but It'd be hard to express in dynamic config as a set without adding a lot of overhead. We could probably provide a config per method we support intercepting though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think attempt count is useful for limiting the overhead and the overall log spam. Without a notion of attempt count do you imagine we would log trace info for every retry?

Instead of adding a dynamic config per-hook, we could just use fx to provide the httptrace.ClientTrace. Then anyone who wants to change the behavior could do so with fx.replace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, agree that adding the counts is nice, also requested that we add the dynamic config and make it global so it can be efficiently cached (a limitation of the current DC implementation).

@pdoerner pdoerner marked this pull request as ready for review February 5, 2025 16:30
@pdoerner pdoerner requested a review from a team as a code owner February 5, 2025 16:30
@pdoerner pdoerner requested a review from bergundy February 5, 2025 16:32
Copy link
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with a comment.

NewTrace(attempt int32, logger log.Logger) *httptrace.ClientTrace
}

var HTTPTraceConfig = dynamicconfig.NewGlobalTypedSetting(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this cached please (you'll want to wrap it with the following):

func NewGlobalCachedTypedValue[T, I any](c *Collection, setting GlobalTypedSetting[I], convert func(I) (T, error)) *GlobalCachedTypedValue[T] {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with @dnr offline and changed the conversion functions to the best we can do at the moment. We will want to revisit this setting after merging #7052

@pdoerner pdoerner merged commit 2640635 into temporalio:main Feb 6, 2025
49 checks passed
@pdoerner pdoerner deleted the nexus-http-tracing branch February 6, 2025 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants