Internal queries should use paging (ControlConnection) #331

piodul · 2024-06-13T07:05:44Z

Non-paged reads are an anti-pattern and we recommend users not to use them. Scylla has metrics that counts them and in scylla-monitoring we have a dashboard that displays it.

In investigation of scylladb/scylladb#5983 we observed that the python driver issues non-paged reads to fetch schema after noticing schema change. When there are lots of clients connected at once this can lead to a large increase of the non-paged reads metric which can be confusing for the user, and confusing+concerning to the core developers - in the aforementioned issue we thought that Scylla itself was doing non-paged reads.

Internal queries should be change to use paging in order to reduce the confusion.

mykaul · 2024-06-13T08:22:11Z

Internal queries should be changed not to use paging in order to reduce the confusion.
@piodul - use or not use paging?

piodul · 2024-06-13T08:27:12Z

They should use paging. Sorry for the confusion.

mykaul · 2024-06-18T08:20:36Z

@roydahan - looks important to me. Can we asses complexity/risk?

fruch · 2024-07-02T18:35:04Z

@piodul do we have information on which driver was used ? i.e. which version of our fork ?

cause I'm quite sure the driver does use pagination for the internal queries, since:
#140

so missing some information in this report

piodul · 2024-07-02T19:24:08Z

I just tried to reproduce this with cqlsh that I had installed on my machine (cqlsh 6.0.21), I picked up a recent master build of Scylla but the closest approximation by a released version would be 6.0.1.

For example, when cqlsh connects to the node, I see that the scylla_cql_unpaged_select_queries_per_ks{ks="system",shard="0"} metric gets bumped by 2.
I'm using the following filter in wireshark:

cql.opcode == "QUERY" && cql.query.flags.page_size == 0

...and I can see that two unpaged queries pop up: SELECT * FROM system.peers and SELECT * FROM system.local WHERE key='local'.

However, I do see that this metric starts with a non-zero value (~121) right after booting up the node. Moreover, this metric grows by itself every 10 seconds. I either have some unexplained source of queries, or internal queries can increase this metric after all. It looks like the fault lies on both sides and we might have closed the Scylla issue premeturely...

fruch · 2024-07-02T19:57:33Z

I just tried to reproduce this with cqlsh that I had installed on my machine (cqlsh 6.0.21), I picked up a recent master build of Scylla but the closest approximation by a released version would be 6.0.1.

For example, when cqlsh connects to the node, I see that the scylla_cql_unpaged_select_queries_per_ks{ks="system",shard="0"} metric gets bumped by 2. I'm using the following filter in wireshark:
cql.opcode == "QUERY" && cql.query.flags.page_size == 0
...and I can see that two unpaged queries pop up: SELECT * FROM system.peers and SELECT * FROM system.local WHERE key='local'.

However, I do see that this metric starts with a non-zero value (~121) right after booting up the node. Moreover, this metric grows by itself every 10 seconds. I either have some unexplained source of queries, or internal queries can increase this metric after all. It looks like the fault lies on both sides and we might have closed the Scylla issue premeturely...

now that I took a look again on the title of the PR... it's "Metadata/Schema paginated queries"

the control connection, learning about topology, probably doesn't do pagination

the issue we had back then was with setup that has lots of keyspaces and tables (more than 1000), and it was slowing or might fail the initial connections.

so setup with hundreds of nodes, might be a bit problematic with pagination

mykaul · 2024-07-03T08:26:42Z

A different path we pursue in the Java driver is scylladb/java-driver#312 - we add USING TIMEOUT to the schema fetch, to be more patient than either the default client or server potentially low timeouts, when pulling the schema, which may be large.

piodul mentioned this issue Jun 13, 2024

View building wrongly counted as "Non-paged CQL reads" scylladb/scylladb#5983

Closed

roydahan assigned Lorak-mmk Jun 13, 2024

Lorak-mmk added enhancement New feature or request upstream-issue Issue is not specific to our fork and Scylla and should be also fixed in the upstream labels Jun 24, 2024

fruch changed the title ~~Internal queries should use paging~~ Internal queries should use paging (ControlConnection) Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal queries should use paging (ControlConnection) #331

Internal queries should use paging (ControlConnection) #331

piodul commented Jun 13, 2024 •

edited

Loading

mykaul commented Jun 13, 2024

piodul commented Jun 13, 2024

mykaul commented Jun 18, 2024

fruch commented Jul 2, 2024 •

edited

Loading

piodul commented Jul 2, 2024

fruch commented Jul 2, 2024

mykaul commented Jul 3, 2024

Internal queries should use paging (ControlConnection) #331

Internal queries should use paging (ControlConnection) #331

Comments

piodul commented Jun 13, 2024 • edited Loading

mykaul commented Jun 13, 2024

piodul commented Jun 13, 2024

mykaul commented Jun 18, 2024

fruch commented Jul 2, 2024 • edited Loading

piodul commented Jul 2, 2024

fruch commented Jul 2, 2024

mykaul commented Jul 3, 2024

piodul commented Jun 13, 2024 •

edited

Loading

fruch commented Jul 2, 2024 •

edited

Loading