Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds information_schema cluster_info table #3832

Merged
merged 17 commits into from
May 2, 2024

Conversation

killme2008
Copy link
Contributor

@killme2008 killme2008 commented Apr 29, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#2931

Close #1768 and close #3141

What's changed and what's your intention?

First, rename the greptime_region_peers to region_peers. It is more appropriate.

Second, adds cluster_info table to information_schema, it provides the information about the current topology of the cluster.

It depends on GreptimeTeam/greptime-proto#160

mysql> DESC TABLE CLUSTER_INFO;
+-------------+----------------------+-----+------+---------+---------------+
| Column      | Type                 | Key | Null | Default | Semantic Type |
+-------------+----------------------+-----+------+---------+---------------+
| peer_id     | Int64                |     | NO   |         | FIELD         |
| peer_type   | String               |     | NO   |         | FIELD         |
| peer_addr   | String               |     | YES  |         | FIELD         |
| version     | String               |     | NO   |         | FIELD         |
| git_commit  | String               |     | NO   |         | FIELD         |
| start_time  | TimestampMillisecond |     | YES  |         | FIELD         |
| uptime      | String               |     | YES  |         | FIELD         |
| active_time | String               |     | YES  |         | FIELD         |
+-------------+----------------------+-----+------+---------+---------------+
  • peer_id: the peer server id.
  • peer_type: the peer type, such as datanode, frontend, metasrv etc.
  • peer_addr: the peer gRPC address.
  • version: the build package version of the peer.
  • git_commit: the build git commit hash of the peer.
  • start_time: the starting time of the peer.
  • uptime: the uptime of the peer.
  • active_time: the time since the last activity of the peer.

For example

In standalone mode:

mysql> USE INFORMATION_SCHEMA;

mysql> SELECT * FROM CLUSTER_INFO;
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+
| peer_id | peer_type  | peer_addr | version | git_commit | start_time              | uptime | active_time |
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+
| 0       | STANDALONE |           | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:02.074 | 18ms   |             |
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+

In standalone mode, the peer_addr is always empty and peer_id is always 0.

In distributed mode:

+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+
| peer_id | peer_type | peer_addr      | version | git_commit | start_time              | uptime   | active_time |
+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+
| 1       | DATANODE  | 127.0.0.1:4101 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:04.791 | 4s 478ms | 1s 467ms    |
| 2       | DATANODE  | 127.0.0.1:4102 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:06.098 | 3s 171ms | 162ms       |
| 3       | DATANODE  | 127.0.0.1:4103 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:07.425 | 1s 844ms | 1s 839ms    |
| -1      | FRONTEND  | 127.0.0.1:4001 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:08.815 | 454ms    | 47ms        |
| 0       | METASRV   | 127.0.0.1:3002 | unknown | unknown    |                         |          |             |
+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+

It will list all the nodes' info in cluster. The peer_id in frontends are always -1.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

@killme2008 killme2008 changed the title feat: adds nformation_schema cluster_info table feat: adds information_schema cluster_info table Apr 29, 2024
@github-actions github-actions bot added the docs-not-required This change does not impact docs. label Apr 29, 2024
Cargo.toml Outdated Show resolved Hide resolved
@github-actions github-actions bot added docs-required This change requires docs update. and removed docs-not-required This change does not impact docs. labels Apr 29, 2024
@killme2008 killme2008 marked this pull request as ready for review April 29, 2024 13:00
@killme2008 killme2008 requested review from MichaelScofield and a team as code owners April 29, 2024 13:00
@killme2008 killme2008 mentioned this pull request Apr 29, 2024
37 tasks
Copy link

codecov bot commented Apr 29, 2024

Codecov Report

Attention: Patch coverage is 33.82789% with 223 lines in your changes are missing coverage. Please review.

Project coverage is 85.29%. Comparing base (f6e2039) to head (6c6d1b6).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3832      +/-   ##
==========================================
- Coverage   85.70%   85.29%   -0.42%     
==========================================
  Files         954      955       +1     
  Lines      162947   163262     +315     
==========================================
- Hits       139656   139250     -406     
- Misses      23291    24012     +721     

Copy link
Member

@sunng87 sunng87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no peer-id concept in frontend (and metasrv maybe). Introducing unique id will bring in overall operation and tooling complexity, which we would like to avoid. We can keep it all 0 and it requires explanations from docs.

Another idea is to change peer_addr to hostname:

  • The port number in peer_addr is implemented as grpc service port. However. it makes little sense when we listing those grpc ports from frontend/datanode/metasrv together because they serve different purpose.
  • In kubernetes and some other modern environment, hostname offers better readability than IP addresses. Also IP address may change after pod rebuild.

And I wonder if we have sufficient information to include a new field like state or health.

@killme2008
Copy link
Contributor Author

killme2008 commented Apr 30, 2024

There is no peer-id concept in frontend (and metasrv maybe). Introducing unique id will bring in overall operation and tooling complexity, which we would like to avoid. We can keep it all 0 and it requires explanations from docs.

Another idea is to change peer_addr to hostname:

  • The port number in peer_addr is implemented as grpc service port. However. it makes little sense when we listing those grpc ports from frontend/datanode/metasrv together because they serve different purpose.
  • In kubernetes and some other modern environment, hostname offers better readability than IP addresses. Also IP address may change after pod rebuild.

And I wonder if we have sufficient information to include a new field like state or health.

  1. Agree, frontends don't need the peer_id at all, but in datanodes it makes sense, so I like to set all the peer_id in frontends to be -1.
  2. Of course, we have a last_active_ts in NodeInfo, and we can use it to determine if a peer is alive, but it looks like we don't have it for Metasrv @fengjiachun
  3. Disagree. Because in some cases(not k8s env), users may deploy some nodes in the same pod or host, and the hostname can't distinguish the peers.

@fengjiachun
Copy link
Collaborator

Of course, we have a last_active_ts in NodeInfo, and we can use it to determine if a peer is alive, but it looks like we don't have it for Metasrv @fengjiachun

Metasrv cannot have a last_active_ts since there is no heartbeat from followers to leader.
However, due to the internal impl mechanism of metasrv, if a metasrv node is disconnected for more than a certain period of time, we will no longer be able to see it through the cluster info list, which means it will be automatically removed.
That is to say: when you see it, that means it's healthy.

@killme2008
Copy link
Contributor Author

@fengjiachun @sunng87 @MichaelScofield Please take a look, thank you.

@killme2008
Copy link
Contributor Author

killme2008 commented Apr 30, 2024

Of course, we have a last_active_ts in NodeInfo, and we can use it to determine if a peer is alive, but it looks like we don't have it for Metasrv @fengjiachun

Metasrv cannot have a last_active_ts since there is no heartbeat from followers to leader. However, due to the internal impl mechanism of metasrv, if a metasrv node is disconnected for more than a certain period of time, we will no longer be able to see it through the cluster info list, which means it will be automatically removed. That is to say: when you see it, that means it's healthy.

I added an active_time column to represent the time since the last activity of the peer.

cc @sunng87

src/catalog/src/information_schema/runtime_metrics.rs Outdated Show resolved Hide resolved
src/catalog/src/information_schema/utils.rs Show resolved Hide resolved
src/datanode/src/heartbeat.rs Show resolved Hide resolved
src/meta-srv/src/service/cluster.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM

@killme2008 killme2008 requested a review from waynexia May 1, 2024 14:35
Copy link
Collaborator

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@tisonkun tisonkun added this pull request to the merge queue May 2, 2024
Merged via the queue into GreptimeTeam:main with commit 65d47ba May 2, 2024
23 checks passed
@killme2008 killme2008 deleted the feature/cluster-info branch May 6, 2024 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-required This change requires docs update.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retrieves cluster metadata via GreptimeDB Cli Cluster management interface in Dashboard
4 participants