-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add metrics measuring distributed execution #135236
base: master
Are you sure you want to change the base?
Conversation
Add a new planFlag and counter to EngineMetrics. Note that the planFlag is set if *any* part of a query uses distributed execution, including pre- or post-queries. This seemed both simpler and more likely to be useful than only counting main queries that use distributed execution. Release note (ops change): Add two new metrics, `sql.distsql.select.distributed.count` and `sql.distsql.select.distributed.count.internal`. These metrics count the number of SELECT statements that actually execute with full or partial distribution. These metrics differ from `sql.distsql.select.count` and `sql.distsql.select.count.internal` in that the latter count the number of SELECT statements that are *planned* with full or partial distribution, but might not necessarily execute with full or partial distribution, depending on the location of data.
Release note (ops change): Add new metric `sql.distsql.distributed.count` which counts the number of invocations of the DistSQL engine with full or partial distribution. (This is in contrast to `sql.distsql.queries.total` which counts the total number of invocations of the DistSQL engine.)
Release note (ops change): Add some clarification that the following metrics count invocations of the DistSQL engine and not SQL queries (which could each result in multiple invocations of the DistSQL engine): - `sql.distsql.queries.active` - `sql.distsql.queries.total` - `sql.distsql.distributed.count`
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, 4 of 4 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2, @mw5h, and @yuzefovich)
-- commits
line 11 at r1:
just bikeshedding on the name: would sql.distsql.select.exec.count
be better? Maybe that's more confusing...
-- commits
line 25 at r2:
Doesn't this count invocations when distsql is disabled too? So it effectively counts all invocations of the execution engine, right?
docs/generated/metrics/metrics.html
line 1647 at r3 (raw file):
<tr><td>APPLICATION</td><td>sql.distsql.dist_query_rerun_locally.count</td><td>Total number of cases when distributed query error resulted in a local rerun</td><td>Queries</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr> <tr><td>APPLICATION</td><td>sql.distsql.dist_query_rerun_locally.failure_count</td><td>Total number of cases when the local rerun of a distributed query resulted in an error</td><td>Queries</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr> <tr><td>APPLICATION</td><td>sql.distsql.distributed.count</td><td>Number of invocations of the DistSQL engine executed with full or partial distribution</td><td>DistSQL runs</td><td>COUNTER</td><td>COUNT</td><td>AVG</td><td>NON_NEGATIVE_DERIVATIVE</td></tr>
Unless I'm mistaken, there is only the "DistSQL engine" and no alternative. Qualifying this as the "DistSQL engine" makes it sound like there is an alternative, and a user might expect distsql=off
to use that alternative, which may be confusing. Should we say just "invocation of the execution engine ... multiple of which may occur for a single SQL statement ... and can also occur for internal processes ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, 4 of 4 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @mw5h)
Previously, mgartner (Marcus Gartner) wrote…
just bikeshedding on the name: would
sql.distsql.select.exec.count
be better? Maybe that's more confusing...
I like having word distributed
in the metric name. sql.distsql
prefix is confusing, but it's an existing one, and it'll be annoying to change. Perhaps I would go even more verbose and do sql.distsql.select.actually_distributed.count
or something like that.
Previously, mgartner (Marcus Gartner) wrote…
Doesn't this count invocations when distsql is disabled too? So it effectively counts all invocations of the execution engine, right?
Sounds right.
docs/generated/metrics/metrics.html
line 1647 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Unless I'm mistaken, there is only the "DistSQL engine" and no alternative. Qualifying this as the "DistSQL engine" makes it sound like there is an alternative, and a user might expect
distsql=off
to use that alternative, which may be confusing. Should we say just "invocation of the execution engine ... multiple of which may occur for a single SQL statement ... and can also occur for internal processes ...".
I like this suggestion.
sql: add sql.distsql.select.distributed.count metrics
Add a new planFlag and counter to EngineMetrics. Note that the planFlag
is set if any part of a query uses distributed execution, including
pre- or post-queries. This seemed both simpler and more likely to be
useful than only counting main queries that use distributed execution.
Release note (ops change): Add two new metrics,
sql.distsql.select.distributed.count
andsql.distsql.select.distributed.count.internal
. These metrics count thenumber of SELECT statements that actually execute with full or partial
distribution. These metrics differ from
sql.distsql.select.count
andsql.distsql.select.count.internal
in that the latter count the numberof SELECT statements that are planned with full or partial
distribution, but might not necessarily execute with full or partial
distribution, depending on the location of data.
sql: add sql.distsql.distributed.count metric
Release note (ops change): Add new metric
sql.distsql.distributed.count
which counts the number of invocationsof the DistSQL engine with full or partial distribution. (This is in
contrast to
sql.distsql.queries.total
which counts the total number ofinvocations of the DistSQL engine.)
sql: clarify scope of a few sql.distsql metrics
Release note (ops change): Add some clarification that the following
metrics count invocations of the DistSQL engine and not SQL queries
(which could each result in multiple invocations of the DistSQL engine):
sql.distsql.queries.active
sql.distsql.queries.total
sql.distsql.distributed.count