Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: add timer for inner batching sink client flush #132572

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

asg0451
Copy link
Contributor

@asg0451 asg0451 commented Oct 14, 2024

Add a timer for the inner flushes inside sink
clients. Where possible (batching_sink users,
webhook v1, pubsub v1, cloud storage), this is
distinct from changefeed.batch_hist_nanos in that
in only captures time spent flushing data to the
downstream endpoint, not time spent buffering
data.

This also fixes a bug where timers were not
correctly registered with the metric system.

Part of: #127784

Release note (enterprise change): Added a timer
for inner sink client flushes. Fixed a bug where
timers were not correctly registered with the
metric system.

@asg0451 asg0451 requested a review from a team as a code owner October 14, 2024 14:56
@asg0451 asg0451 requested review from andyyang890 and removed request for a team October 14, 2024 14:56
Copy link

blathers-crl bot commented Oct 14, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@asg0451 asg0451 added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 labels Oct 14, 2024
@asg0451 asg0451 requested review from rharding6373 and wenyihu6 and removed request for andyyang890 October 14, 2024 16:27
@asg0451 asg0451 force-pushed the sink-inner-timings branch 2 times, most recently from 327a7cc to 8d92e6b Compare October 14, 2024 19:29
@wenyihu6
Copy link
Contributor

Do we want to do the same thing across other sinks as well and rename this metric to SinkClientFlush? Perhaps in a follow up PR.

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asg0451 and @wenyihu6)


pkg/ccl/changefeedccl/timers/timers.go line 48 at r2 (raw file):

		Encode:                    b.Histogram(histogramOptsFor("changefeed.stage.encode.latency", "Latency of the changefeed stage: encoding data")),
		EmitRow:                   b.Histogram(histogramOptsFor("changefeed.stage.emit_row.latency", "Latency of the changefeed stage: emitting row to sink")),
		BatchingSinkClientFlush:   b.Histogram(histogramOptsFor("changefeed.stage.batching_sink_client_flush.latency", "Latency of the changefeed stage: flushing the inner sink client, for sinks using batching_sink")),

I agree with Wenyi that it would be great if this metric were agnostic to the batching sink, so we could extend it to all sinks in the future. Sink client latency is applicable to all cases.

Copy link
Contributor Author

@asg0451 asg0451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added this to batching sink only is that in the other cases BatchHistNanos is sufficient for this. it's only in batching sink-wrapped sinks that it is counting an additional queueing point

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373 and @wenyihu6)


pkg/ccl/changefeedccl/timers/timers.go line 48 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

I agree with Wenyi that it would be great if this metric were agnostic to the batching sink, so we could extend it to all sinks in the future. Sink client latency is applicable to all cases.

The reason I added this to batching sink only is that in the other cases BatchHistNanos is sufficient for this. it's only in batching sink-wrapped sinks that it is counting an additional queueing point

Copy link
Contributor

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, how do you feel about if we just rename BatchHistNanos to SinkClientFlush and keep SinkClientFlush consistent across different sink types - as in they should always measure client sink flush? And we can add something else as a separate metrics for BatchingSink.BatchHistNanos to track the time between we add something to the batch for the first time until the batch is flushed.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373)

Copy link
Contributor Author

@asg0451 asg0451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm open to something like that, let's discuss later today if there are no other topics

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rharding6373)

@asg0451 asg0451 force-pushed the sink-inner-timings branch 3 times, most recently from b12c90c to 1918a0d Compare October 22, 2024 17:59
@asg0451 asg0451 requested a review from wenyihu6 October 22, 2024 18:03
Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I only have a nit about the name of the metric itself as exposed to users. Thanks for going above and beyond the batching sinks in this PR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asg0451 and @wenyihu6)


pkg/ccl/changefeedccl/timers/timers.go line 48 at r2 (raw file):

Previously, asg0451 (Miles Frankel) wrote…

The reason I added this to batching sink only is that in the other cases BatchHistNanos is sufficient for this. it's only in batching sink-wrapped sinks that it is counting an additional queueing point

minor nit: would sink_client be descriptive enough to understand the metric, or do we need sink_client_send? sink_client_send reads awkwardly imo (I'm indifferent about DownstreamClientSend, since it's internal)

Copy link
Contributor Author

@asg0451 asg0451 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rharding6373 and @wenyihu6)


pkg/ccl/changefeedccl/timers/timers.go line 48 at r2 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

minor nit: would sink_client be descriptive enough to understand the metric, or do we need sink_client_send? sink_client_send reads awkwardly imo (I'm indifferent about DownstreamClientSend, since it's internal)

i'm not 100% on this name -- i'm trying to avoid ambiguity with the batching sink's SinkClient, as this metric is not specific to that. hence "send".... if you have a better idea that's not ambiguous on that axis lmk

Add a timer for the inner flushes inside sink
clients. Where possible (batching_sink users,
webhook v1, pubsub v1, cloud storage), this is
distinct from changefeed.batch_hist_nanos in that
in only captures time spent flushing data to the
downstream endpoint, not time spent buffering
data.

This also fixes a bug where timers were not
correctly registered with the metric system.

Part of: cockroachdb#127784

Release note (enterprise change): Added a timer
for inner sink client flushes. Fixed a bug where
timers were not correctly registered with the
metric system.
@asg0451
Copy link
Contributor Author

asg0451 commented Oct 22, 2024

bors r=rharding6373

@craig craig bot merged commit 9e49a0b into cockroachdb:master Oct 22, 2024
23 checks passed
Copy link

blathers-crl bot commented Oct 22, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from d7ccd77 to blathers/backport-release-23.2-132572: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants