Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add 'system' subcommand to display system tables #25912

Merged
merged 1 commit into from
Jan 29, 2025
Merged

Conversation

waynr
Copy link
Contributor

@waynr waynr commented Jan 25, 2025

Addresses #25844

This PR is more or less ready for review, but I suspect there may be one or two changes left to make after getting feedback from the rest of the team. I'll discuss those possible changes below in the demo for each subcommand.

influxdb3 system get

zsh/5 13611  (git)-[feat/system-command]-% ./target/debug/influxdb3 system --database whatever get --limit 10 --select query_text,end2end_duration queries
SELECT query_text,end2end_duration FROM system.queries
ORDER BY end2end_duration
LIMIT 10
+---------------------------------------------------------------------------------------------------------------------------------------+------------------+
| query_text                                                                                                                            | end2end_duration |
+---------------------------------------------------------------------------------------------------------------------------------------+------------------+
| SELECT array_to_string(SHOW TABLES, ',');                                                                                             | PT0.001349372S   |
| SELECT array_to_string(SELECT table_name from information_schema.tables), ',');                                                       | PT0.001455799S   |
| WITH tables (table_name) AS ( SELECT table_name FROM information_schema.tables WHERE table_schema = 'system')                         | PT0.001621811S   |
| WITH cols (column_name)  AS (SELECT column_name FROM information_schema.columns WHERE table_name = tables.table_name)                 |                  |
| SELECT tables.table_name, array_agg(cols.column_name) FROM tables, cols;                                                              |                  |
| SELECT * FROM distinct_caches                                                                                                         | PT0.001677234S   |
| SELECT * FROM 'distinct_caches'                                                                                                       | PT0.001703585S   |
|                                                                                                                                       | PT0.001790812S   |
| SELECT * FROM system.queries                                                                                                          | PT0.001864755S   |
| ORDER_BY end2end_duration                                                                                                             |                  |
| LIMIT 100                                                                                                                             |                  |
| SELECT * FROM information_schema.columns WHERE table_name = parquet_files                                                             | PT0.00189948S    |
| SHOW COLUMNS FROM last_cache                                                                                                          | PT0.002133479S   |
| SELECT table.table_name, array_to_string(SELECT column_name FROM information_schema.columns WHERE table.table_name = table_name, ',') | PT0.00237514S    |
| FROM information_schema.tables AS table                                                                                               |                  |
| WHERE table.table_schema = 'system'                                                                                                   |                  |
+---------------------------------------------------------------------------------------------------------------------------------------+------------------+

I consider this subcommand more or less good to go. It has the following options:

  • --limit - limit the number of entries show (demoed above)
  • --order-by - a comma-separated list of fields passed to a ORDER BY clause
  • --select - the set of fields selected in the query (demoed above)
  • --format - the format of the query result output

influxdb3 system summary

zsh/5 13612  (git)-[feat/system-command]-% ./target/debug/influxdb3 system --database whatever summary
distinct_caches summary:
+-------+------+------------+--------------+-----------------+-----------------+
| table | name | column_ids | column_names | max_cardinality | max_age_seconds |
+-------+------+------------+--------------+-----------------+-----------------+
+-------+------+------------+--------------+-----------------+-----------------+
last_caches summary:
+-------+------+----------------+------------------+------------------+--------------------+-------+-----+
| table | name | key_column_ids | key_column_names | value_column_ids | value_column_names | count | ttl |
+-------+------+----------------+------------------+------------------+--------------------+-------+-----+
+-------+------+----------------+------------------+------------------+--------------------+-------+-----+
parquet_files summary:
+------------+------+------------+-----------+----------+----------+
| table_name | path | size_bytes | row_count | min_time | max_time |
+------------+------+------------+-----------+----------+----------+
+------------+------+------------+-----------+----------+----------+
processing_engine_plugins summary:
++
++
processing_engine_triggers summary:
++
++
queries summary:
+--------------------------------------+-------+-------------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------+------------+---------------+----------------+-----------------+------------------+------------------+------------------+------------+---------+---------+-----------+----------+
| id                                   | phase | issue_time                    | query_type | query_text                                                                                                                            | partitions | parquet_files | plan_duration  | permit_duration | execute_duration | end2end_duration | compute_duration | max_memory | success | running | cancelled | trace_id |
+--------------------------------------+-------+-------------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------+------------+---------------+----------------+-----------------+------------------+------------------+------------------+------------+---------+---------+-----------+----------+
| 99e46156-bb53-4a5e-b98a-294ba03ea7f2 | fail  | 2025-01-25T02:16:24.679992907 | sql        | SELECT array_to_string(SHOW TABLES, ',');                                                                                             |            |               | PT0.001340376S |                 |                  | PT0.001349372S   |                  |            | false   | false   | false     |          |
| f5579d59-db4b-4761-ab2a-86c8003ed8ab | fail  | 2025-01-25T02:16:44.705374328 | sql        | SELECT array_to_string(SELECT table_name from information_schema.tables), ',');                                                       |            |               | PT0.001448241S |                 |                  | PT0.001455799S   |                  |            | false   | false   | false     |          |
| 8b470e59-dc7c-496d-aa5f-4477954a55da | fail  | 2025-01-25T02:29:33.046920659 | sql        | WITH tables (table_name) AS ( SELECT table_name FROM information_schema.tables WHERE table_schema = 'system')                         |            |               | PT0.001611764S |                 |                  | PT0.001621811S   |                  |            | false   | false   | false     |          |
|                                      |       |                               |            | WITH cols (column_name)  AS (SELECT column_name FROM information_schema.columns WHERE table_name = tables.table_name)                 |            |               |                |                 |                  |                  |                  |            |         |         |           |          |
|                                      |       |                               |            | SELECT tables.table_name, array_agg(cols.column_name) FROM tables, cols;                                                              |            |               |                |                 |                  |                  |                  |            |         |         |           |          |
| 0fa7a425-53ff-4808-9dab-d810b8f45951 | fail  | 2025-01-25T03:17:22.317768809 | sql        | SELECT * FROM distinct_caches                                                                                                         |            |               | PT0.001669792S |                 |                  | PT0.001677234S   |                  |            | false   | false   | false     |          |
| a76921d8-cb13-4f17-a4e1-d5db64f01c49 | fail  | 2025-01-25T03:18:25.041650872 | sql        | SELECT * FROM 'distinct_caches'                                                                                                       |            |               | PT0.001693496S |                 |                  | PT0.001703585S   |                  |            | false   | false   | false     |          |
| ec94712b-33b9-4b3b-a201-48c77e3786f3 | fail  | 2025-01-24T23:26:57.949831034 | sql        |                                                                                                                                       |            |               | PT0.001782268S |                 |                  | PT0.001790812S   |                  |            | false   | false   | false     |          |
| f28577ea-0a79-4624-a612-1732da1665e2 | fail  | 2025-01-25T03:41:13.064308031 | sql        | SELECT * FROM system.queries                                                                                                          |            |               | PT0.001851264S |                 |                  | PT0.001864755S   |                  |            | false   | false   | false     |          |
|                                      |       |                               |            | ORDER_BY end2end_duration                                                                                                             |            |               |                |                 |                  |                  |                  |            |         |         |           |          |
|                                      |       |                               |            | LIMIT 100                                                                                                                             |            |               |                |                 |                  |                  |                  |            |         |         |           |          |
| a806f79c-9aa6-42b2-80db-4cc1c0c881f9 | fail  | 2025-01-25T01:03:23.176335465 | sql        | SELECT * FROM information_schema.columns WHERE table_name = parquet_files                                                             |            |               | PT0.00188766S  |                 |                  | PT0.00189948S    |                  |            | false   | false   | false     |          |
| 6039cd83-fc8d-4d9a-bc92-38db4cb9b9b5 | fail  | 2025-01-25T01:00:34.321586109 | sql        | SHOW COLUMNS FROM last_cache                                                                                                          |            |               | PT0.002123681S |                 |                  | PT0.002133479S   |                  |            | false   | false   | false     |          |
| 9bcc78ea-f665-436c-86e1-798cd14545c9 | fail  | 2025-01-25T02:12:09.785267216 | sql        | SELECT table.table_name, array_to_string(SELECT column_name FROM information_schema.columns WHERE table.table_name = table_name, ',') |            |               | PT0.0023638S   |                 |                  | PT0.00237514S    |                  |            | false   | false   | false     |          |
|                                      |       |                               |            | FROM information_schema.tables AS table                                                                                               |            |               |                |                 |                  |                  |                  |            |         |         |           |          |
|                                      |       |                               |            | WHERE table.table_schema = 'system'                                                                                                   |            |               |                |                 |                  |                  |                  |            |         |         |           |          |
+--------------------------------------+-------+-------------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------+------------+---------------+----------------+-----------------+------------------+------------------+------------------+------------+---------+---------+-----------+----------+

The system summary first queries for a list of system tables then iterates over each selecting all fields before displaying them in the chosen output format (--format). It also respects the (--limit flag similar to the get sub command, but applied individually to each system table).

The thing I'm not certain about here is whether we should try to avoid printing results for empty tables.

influxdb3 system list

zsh/5 13624  (git)-[feat/system-command]-% ./target/debug/influxdb3 system --database whatever list
distinct_caches
  table
  name
  column_ids
  column_names
  max_cardinality
  max_age_seconds
last_caches
  table
  name
  key_column_ids
  key_column_names
  value_column_ids
  value_column_names
  count
  ttl
parquet_files
  table_name
  path
  size_bytes
  row_count
  min_time
  max_time
processing_engine_plugins
  plugin_name
  file_name
  plugin_type
processing_engine_triggers
  trigger_name
  plugin_name
  trigger_specification
  disabled
queries
  id
  phase
  issue_time
  query_type
  query_text
  partitions
  parquet_files
  plan_duration
  permit_duration
  execute_duration
  end2end_duration
  compute_duration
  max_memory
  success
  running
  cancelled
  trace_id

This subcommand currently displays each system table name along with the available column names for each table indented with two spaces. It works by first querying for all table names then for each table name querying its column names (this could reduced to a single query with a join if we want to keep this output formatting).

I am considering rewriting the implementation of this to use the following single query so that we can stick with the same output formatting options that come with the summary and list subcommands:

WITH cols (table_name, column_name)  AS (SELECT table_name, column_name FROM information_schema.columns)
SELECT table_name, array_agg(column_name) AS columns
FROM cols
GROUP BY table_name

The output of this query looks as follows:

zsh/5 13625  (git)-[feat/system-command]-% ./target/debug/influxdb3 query --format pretty --database whatever "$(cat whatever.sql)"
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| table_name                 | columns                                                                                                                                                                                                             |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| cpu                        | [application, host, region, status, time, usage_percent, val]                                                                                                                                                       |
| distinct_caches            | [table, name, column_ids, column_names, max_cardinality, max_age_seconds]                                                                                                                                           |
| processing_engine_triggers | [trigger_name, plugin_name, trigger_specification, disabled]                                                                                                                                                        |
| processing_engine_plugins  | [plugin_name, file_name, plugin_type]                                                                                                                                                                               |
| last_caches                | [table, name, key_column_ids, key_column_names, value_column_ids, value_column_names, count, ttl]                                                                                                                   |
| parquet_files              | [table_name, path, size_bytes, row_count, min_time, max_time]                                                                                                                                                       |
| queries                    | [id, phase, issue_time, query_type, query_text, partitions, parquet_files, plan_duration, permit_duration, execute_duration, end2end_duration, compute_duration, max_memory, success, running, cancelled, trace_id] |
+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I'm pretty sure I prefer this approach since it lets us use the same --format options as the other subcommands, but I thought I would get the team's input also.

@waynr
Copy link
Contributor Author

waynr commented Jan 25, 2025

Once I've got some feedback with respect to the influxdb3 system list and influxdb3 system summary issues I mentioned in the description, I'll make any necessary changes and look into adding test cases to validate the expected output(s) for the new subcommands.

@waynr waynr requested a review from a team January 25, 2025 04:04
@mgattozzi
Copy link
Contributor

Hey @waynr we should keep the CLI commands consistent with the Verb -> Noun verbiage we have so add table vs table add

I think we could do something like

  • show system table <FOO> to show the actual contents
  • list system tables to list what tables are available
  • summarize system tables to show a summary. We might want this for normal tables eventually as well

but what would be better for naming is up for debate, I just think we should keep things consistent so we don't end up like git which has both verb -> noun and noun -> verb when using it

@waynr
Copy link
Contributor Author

waynr commented Jan 28, 2025

I think we could do something like

@mgattozzi I can see where you're coming from with the desire to retain a <verb> <noun> CLI semantics, but I'm not fond of the specific suggestion where we would split the system table subcommands across three different top-level subcommands. What do you think about something like this to retain the <verb> primacy:

  • influxdb3 show system
    • provides a summary across all available system tables showing 10 entries for each by default
  • influxdb3 show system --table <table-name>
    • displays the first 100 entries for the specified table
  • influxdb3 show system --list-tables
    • displays the available table names and their columns

The use of "show system" gives us the desired <verb> <noun> ordering while the use of flags lets us zero in on which aspects of the system tables we are inspecting. I would propose the verb "inspect", but it look like we already have the "show databases" subcommand.

@waynr
Copy link
Contributor Author

waynr commented Jan 28, 2025

Hmm, we could even make this more generic with influxdb3 show tables then require a --system flag (or --schema system) to explicitly select system tables 🤔.

@waynr waynr force-pushed the feat/system-command branch from 3597db8 to c24ebee Compare January 29, 2025 03:44
@waynr
Copy link
Contributor Author

waynr commented Jan 29, 2025

Okay I ended up going with this subcommand structure as of the latest set of commits:

  • influxdb3 show system summary
    • provides a summary across all available system tables showing 10 entries for each by default
  • influxdb3 show system table <table-name>
    • displays the first 100 entries for the specified table
  • influxdb3 show system table-list
    • displays the available table names and their columns

This preserves the <verb> <noun> ordering using the existing show verb for consistency's sake. I did spend a bit of time trying to use arg groups to achieve the proposed structure earlier, but that started to get a little messy and made less sense to me the more I thought about. The <verb> <noun> <sub-noun> structure ended up being a much simpler refactor anyway.

I've also added tests so this is ready for review @mgattozzi @pauldix

Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where you ended up with the influxdb 3 show ... semantics.

@waynr waynr force-pushed the feat/system-command branch from 4e1f6e8 to 5987a2e Compare January 29, 2025 17:35
test: add 'show system' subcommand tests
@waynr
Copy link
Contributor Author

waynr commented Jan 29, 2025

Okay I've squashed all my fixes/changes after adding a couple more unit tests and fixing a bug in the queries table that is supposed to filter out the system tables from results. Will merge once CI is green.

@waynr waynr merged commit 99c9d02 into main Jan 29, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants