Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add metrics measuring distributed execution #135236

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

michae2
Copy link
Collaborator

@michae2 michae2 commented Nov 15, 2024

sql: add sql.distsql.select.distributed.count metrics

Add a new planFlag and counter to EngineMetrics. Note that the planFlag
is set if any part of a query uses distributed execution, including
pre- or post-queries. This seemed both simpler and more likely to be
useful than only counting main queries that use distributed execution.

Release note (ops change): Add two new metrics,
sql.distsql.select.distributed.count and
sql.distsql.select.distributed.count.internal. These metrics count the
number of SELECT statements that actually execute with full or partial
distribution. These metrics differ from sql.distsql.select.count and
sql.distsql.select.count.internal in that the latter count the number
of SELECT statements that are planned with full or partial
distribution, but might not necessarily execute with full or partial
distribution, depending on the location of data.


sql: add sql.distsql.distributed.count metric

Release note (ops change): Add new metric
sql.distsql.distributed.count which counts the number of invocations
of the DistSQL engine with full or partial distribution. (This is in
contrast to sql.distsql.queries.total which counts the total number of
invocations of the DistSQL engine.)


sql: clarify scope of a few sql.distsql metrics

Release note (ops change): Add some clarification that the following
metrics count invocations of the DistSQL engine and not SQL queries
(which could each result in multiple invocations of the DistSQL engine):

  • sql.distsql.queries.active
  • sql.distsql.queries.total
  • sql.distsql.distributed.count

Add a new planFlag and counter to EngineMetrics. Note that the planFlag
is set if *any* part of a query uses distributed execution, including
pre- or post-queries. This seemed both simpler and more likely to be
useful than only counting main queries that use distributed execution.

Release note (ops change): Add two new metrics,
`sql.distsql.select.distributed.count` and
`sql.distsql.select.distributed.count.internal`. These metrics count the
number of SELECT statements that actually execute with full or partial
distribution. These metrics differ from `sql.distsql.select.count` and
`sql.distsql.select.count.internal` in that the latter count the number
of SELECT statements that are *planned* with full or partial
distribution, but might not necessarily execute with full or partial
distribution, depending on the location of data.
Release note (ops change): Add new metric
`sql.distsql.distributed.count` which counts the number of invocations
of the DistSQL engine with full or partial distribution. (This is in
contrast to `sql.distsql.queries.total` which counts the total number of
invocations of the DistSQL engine.)
Release note (ops change): Add some clarification that the following
metrics count invocations of the DistSQL engine and not SQL queries
(which could each result in multiple invocations of the DistSQL engine):

- `sql.distsql.queries.active`
- `sql.distsql.queries.total`
- `sql.distsql.distributed.count`
@michae2 michae2 requested review from yuzefovich, mw5h and a team November 15, 2024 00:45
@michae2 michae2 requested review from a team as code owners November 15, 2024 00:45
@michae2 michae2 requested review from srosenberg, nameisbhaskar and kyle-a-wong and removed request for a team November 15, 2024 00:45
Copy link

blathers-crl bot commented Nov 15, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, 4 of 4 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @mw5h, and @yuzefovich)


-- commits line 11 at r1:
just bikeshedding on the name: would sql.distsql.select.exec.count be better? Maybe that's more confusing...


-- commits line 25 at r2:
Doesn't this count invocations when distsql is disabled too? So it effectively counts all invocations of the execution engine, right?


docs/generated/metrics/metrics.html line 1647 at r3 (raw file):

<tr><td>APPLICATION</td><td>sql.distsql.dist_query_rerun_locally.count</td><td>Total number of cases when distributed query error resulted in a local rerun</td><td>Queries</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>sql.distsql.dist_query_rerun_locally.failure_count</td><td>Total number of cases when the local rerun of a distributed query resulted in an error</td><td>Queries</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
<tr><td>APPLICATION</td><td>sql.distsql.distributed.count</td><td>Number of invocations of the DistSQL engine executed with full or partial distribution</td><td>DistSQL runs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>

Unless I'm mistaken, there is only the "DistSQL engine" and no alternative. Qualifying this as the "DistSQL engine" makes it sound like there is an alternative, and a user might expect distsql=off to use that alternative, which may be confusing. Should we say just "invocation of the execution engine ... multiple of which may occur for a single SQL statement ... and can also occur for internal processes ...".

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1, 4 of 4 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @mw5h)


-- commits line 11 at r1:

Previously, mgartner (Marcus Gartner) wrote…

just bikeshedding on the name: would sql.distsql.select.exec.count be better? Maybe that's more confusing...

I like having word distributed in the metric name. sql.distsql prefix is confusing, but it's an existing one, and it'll be annoying to change. Perhaps I would go even more verbose and do sql.distsql.select.actually_distributed.count or something like that.


-- commits line 25 at r2:

Previously, mgartner (Marcus Gartner) wrote…

Doesn't this count invocations when distsql is disabled too? So it effectively counts all invocations of the execution engine, right?

Sounds right.


docs/generated/metrics/metrics.html line 1647 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Unless I'm mistaken, there is only the "DistSQL engine" and no alternative. Qualifying this as the "DistSQL engine" makes it sound like there is an alternative, and a user might expect distsql=off to use that alternative, which may be confusing. Should we say just "invocation of the execution engine ... multiple of which may occur for a single SQL statement ... and can also occur for internal processes ...".

I like this suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants