feat: adds information_schema cluster_info table #3832

killme2008 · 2024-04-29T02:09:35Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

Close #1768 and close #3141

What's changed and what's your intention?

First, rename the greptime_region_peers to region_peers. It is more appropriate.

Second, adds cluster_info table to information_schema, it provides the information about the current topology of the cluster.

It depends on GreptimeTeam/greptime-proto#160

mysql> DESC TABLE CLUSTER_INFO;
+-------------+----------------------+-----+------+---------+---------------+
| Column      | Type                 | Key | Null | Default | Semantic Type |
+-------------+----------------------+-----+------+---------+---------------+
| peer_id     | Int64                |     | NO   |         | FIELD         |
| peer_type   | String               |     | NO   |         | FIELD         |
| peer_addr   | String               |     | YES  |         | FIELD         |
| version     | String               |     | NO   |         | FIELD         |
| git_commit  | String               |     | NO   |         | FIELD         |
| start_time  | TimestampMillisecond |     | YES  |         | FIELD         |
| uptime      | String               |     | YES  |         | FIELD         |
| active_time | String               |     | YES  |         | FIELD         |
+-------------+----------------------+-----+------+---------+---------------+

peer_id: the peer server id.
peer_type: the peer type, such as datanode, frontend, metasrv etc.
peer_addr: the peer gRPC address.
version: the build package version of the peer.
git_commit: the build git commit hash of the peer.
start_time: the starting time of the peer.
uptime: the uptime of the peer.
active_time: the time since the last activity of the peer.

For example

In standalone mode:

mysql> USE INFORMATION_SCHEMA;

mysql> SELECT * FROM CLUSTER_INFO;
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+
| peer_id | peer_type  | peer_addr | version | git_commit | start_time              | uptime | active_time |
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+
| 0       | STANDALONE |           | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:02.074 | 18ms   |             |
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+

In standalone mode, the peer_addr is always empty and peer_id is always 0.

In distributed mode:

+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+
| peer_id | peer_type | peer_addr      | version | git_commit | start_time              | uptime   | active_time |
+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+
| 1       | DATANODE  | 127.0.0.1:4101 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:04.791 | 4s 478ms | 1s 467ms    |
| 2       | DATANODE  | 127.0.0.1:4102 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:06.098 | 3s 171ms | 162ms       |
| 3       | DATANODE  | 127.0.0.1:4103 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:07.425 | 1s 844ms | 1s 839ms    |
| -1      | FRONTEND  | 127.0.0.1:4001 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:08.815 | 454ms    | 47ms        |
| 0       | METASRV   | 127.0.0.1:3002 | unknown | unknown    |                         |          |             |
+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+

It will list all the nodes' info in cluster. The peer_id in frontends are always -1.

Checklist

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR does not require documentation updates.

Cargo.toml

src/meta-client/src/client.rs

codecov · 2024-04-29T13:14:32Z

Codecov Report

Attention: Patch coverage is 33.82789% with 223 lines in your changes are missing coverage. Please review.

Project coverage is 85.29%. Comparing base (f6e2039) to head (6c6d1b6).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3832      +/-   ##
==========================================
- Coverage   85.70%   85.29%   -0.42%     
==========================================
  Files         954      955       +1     
  Lines      162947   163262     +315     
==========================================
- Hits       139656   139250     -406     
- Misses      23291    24012     +721

sunng87

There is no peer-id concept in frontend (and metasrv maybe). Introducing unique id will bring in overall operation and tooling complexity, which we would like to avoid. We can keep it all 0 and it requires explanations from docs.

Another idea is to change peer_addr to hostname:

The port number in peer_addr is implemented as grpc service port. However. it makes little sense when we listing those grpc ports from frontend/datanode/metasrv together because they serve different purpose.
In kubernetes and some other modern environment, hostname offers better readability than IP addresses. Also IP address may change after pod rebuild.

And I wonder if we have sufficient information to include a new field like state or health.

src/common/meta/src/cluster.rs

src/frontend/src/heartbeat.rs

killme2008 · 2024-04-30T01:02:20Z

There is no peer-id concept in frontend (and metasrv maybe). Introducing unique id will bring in overall operation and tooling complexity, which we would like to avoid. We can keep it all 0 and it requires explanations from docs.

Another idea is to change peer_addr to hostname:

The port number in peer_addr is implemented as grpc service port. However. it makes little sense when we listing those grpc ports from frontend/datanode/metasrv together because they serve different purpose.

In kubernetes and some other modern environment, hostname offers better readability than IP addresses. Also IP address may change after pod rebuild.

And I wonder if we have sufficient information to include a new field like state or health.

Agree, frontends don't need the peer_id at all, but in datanodes it makes sense, so I like to set all the peer_id in frontends to be -1.
Of course, we have a last_active_ts in NodeInfo, and we can use it to determine if a peer is alive, but it looks like we don't have it for Metasrv @fengjiachun
Disagree. Because in some cases(not k8s env), users may deploy some nodes in the same pod or host, and the hostname can't distinguish the peers.

fengjiachun · 2024-04-30T03:38:02Z

Of course, we have a last_active_ts in NodeInfo, and we can use it to determine if a peer is alive, but it looks like we don't have it for Metasrv @fengjiachun

Metasrv cannot have a last_active_ts since there is no heartbeat from followers to leader.
However, due to the internal impl mechanism of metasrv, if a metasrv node is disconnected for more than a certain period of time, we will no longer be able to see it through the cluster info list, which means it will be automatically removed.
That is to say: when you see it, that means it's healthy.

killme2008 · 2024-04-30T06:09:49Z

@fengjiachun @sunng87 @MichaelScofield Please take a look, thank you.

killme2008 · 2024-04-30T06:32:44Z

Of course, we have a last_active_ts in NodeInfo, and we can use it to determine if a peer is alive, but it looks like we don't have it for Metasrv @fengjiachun

Metasrv cannot have a last_active_ts since there is no heartbeat from followers to leader. However, due to the internal impl mechanism of metasrv, if a metasrv node is disconnected for more than a certain period of time, we will no longer be able to see it through the cluster info list, which means it will be automatically removed. That is to say: when you see it, that means it's healthy.

I added an active_time column to represent the time since the last activity of the peer.

cc @sunng87

src/catalog/src/information_schema/runtime_metrics.rs

src/catalog/src/information_schema/utils.rs

src/datanode/src/heartbeat.rs

src/meta-srv/src/service/cluster.rs

Co-authored-by: Jeremyhi <[email protected]>

fengjiachun

Almost LGTM

tisonkun

Thank you!

killme2008 changed the title ~~feat: adds nformation_schema cluster_info table~~ feat: adds information_schema cluster_info table Apr 29, 2024

github-actions bot added the docs-not-required This change does not impact docs. label Apr 29, 2024

killme2008 mentioned this pull request Apr 29, 2024

feat: adds node info to heartbeat request GreptimeTeam/greptime-proto#160

Merged

2 tasks

killme2008 force-pushed the feature/cluster-info branch from 4855bf5 to 2c909a5 Compare April 29, 2024 12:47

killme2008 commented Apr 29, 2024

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

killme2008 commented Apr 29, 2024

View reviewed changes

src/meta-client/src/client.rs Show resolved Hide resolved

github-actions bot added docs-required This change requires docs update. and removed docs-not-required This change does not impact docs. labels Apr 29, 2024

killme2008 marked this pull request as ready for review April 29, 2024 13:00

killme2008 requested review from MichaelScofield and a team as code owners April 29, 2024 13:00

killme2008 requested review from fengjiachun and tisonkun April 29, 2024 13:01

killme2008 mentioned this pull request Apr 29, 2024

information_schema improvements #2931

Open

37 tasks

sunng87 reviewed Apr 29, 2024

View reviewed changes

src/common/meta/src/cluster.rs Show resolved Hide resolved

sunng87 reviewed Apr 29, 2024

View reviewed changes

src/frontend/src/heartbeat.rs Outdated Show resolved Hide resolved

killme2008 added 11 commits April 30, 2024 09:04

feat: adds server running mode to KvBackendCatalogManager

74c5907

feat: adds MetaClient to KvBackendCatalogManager

76fb99a

feat: impl information_schema.cluster_info table

17ab11e

fix: forgot files

083ccf0

test: update information_schema result

e2942e2

feat: adds start_time and uptime to cluster_info

97babb0

chore: tweak cargo and comment

c1b82c3

feat: rename greptime_region_peers to region_peers

04287c2

fix: cluster_info result

ba3a39f

chore: simplify sqlness commands

e7b5609

chore: set peer_id to -1 for frontends

6e254f5

fix: move cluster_info to greptime catalog

876bcee

killme2008 force-pushed the feature/cluster-info branch from 746e951 to 876bcee Compare April 30, 2024 01:21

chore: use official proto

6153104

feat: adds active_time

86ab3d9

fengjiachun reviewed Apr 30, 2024

View reviewed changes

src/catalog/src/information_schema/runtime_metrics.rs Outdated Show resolved Hide resolved

src/catalog/src/information_schema/utils.rs Show resolved Hide resolved

src/datanode/src/heartbeat.rs Show resolved Hide resolved

src/meta-srv/src/service/cluster.rs Outdated Show resolved Hide resolved

fengjiachun mentioned this pull request Apr 30, 2024

Retrieve the metasrv node info in meta-client #3843

Closed

killme2008 and others added 2 commits April 30, 2024 17:24

chore: apply suggestion

7b9922c

Co-authored-by: Jeremyhi <[email protected]>

chore: STANDALONE for runtime_metrics

d15e80f

fengjiachun approved these changes Apr 30, 2024

View reviewed changes

killme2008 requested a review from waynexia May 1, 2024 14:35

Merge branch 'main' into feature/cluster-info

6c6d1b6

tisonkun approved these changes May 2, 2024

View reviewed changes

tisonkun added this pull request to the merge queue May 2, 2024

Merged via the queue into GreptimeTeam:main with commit 65d47ba May 2, 2024
23 checks passed

killme2008 deleted the feature/cluster-info branch May 6, 2024 06:46

killme2008 mentioned this pull request May 6, 2024

docs: cluster_info table GreptimeTeam/docs#937

Merged

2 tasks

waynexia mentioned this pull request May 21, 2024

improve observability for procedure #3999

Closed

3 tasks

WenyXu mentioned this pull request Jun 15, 2024

fix(sqlness): catch different format timestamp #4149

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds information_schema cluster_info table #3832

feat: adds information_schema cluster_info table #3832

killme2008 commented Apr 29, 2024 •

edited

Loading

codecov bot commented Apr 29, 2024 •

edited

Loading

sunng87 left a comment

killme2008 commented Apr 30, 2024 •

edited

Loading

fengjiachun commented Apr 30, 2024

killme2008 commented Apr 30, 2024

killme2008 commented Apr 30, 2024 •

edited

Loading

fengjiachun left a comment

tisonkun left a comment

feat: adds information_schema cluster_info table #3832

feat: adds information_schema cluster_info table #3832

Conversation

killme2008 commented Apr 29, 2024 • edited Loading

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

For example

Checklist

codecov bot commented Apr 29, 2024 • edited Loading

Codecov Report

sunng87 left a comment

Choose a reason for hiding this comment

killme2008 commented Apr 30, 2024 • edited Loading

fengjiachun commented Apr 30, 2024

killme2008 commented Apr 30, 2024

killme2008 commented Apr 30, 2024 • edited Loading

fengjiachun left a comment

Choose a reason for hiding this comment

tisonkun left a comment

Choose a reason for hiding this comment

killme2008 commented Apr 29, 2024 •

edited

Loading

codecov bot commented Apr 29, 2024 •

edited

Loading

killme2008 commented Apr 30, 2024 •

edited

Loading

killme2008 commented Apr 30, 2024 •

edited

Loading