Skip to content

Commit

Permalink
Add Get Node List in the Cluster API (#13015)
Browse files Browse the repository at this point in the history
  • Loading branch information
wu-sheng authored Feb 1, 2025
1 parent 302b365 commit 44844dd
Show file tree
Hide file tree
Showing 7 changed files with 126 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
* OAP self observability: Add watermark circuit break/recover metrics.
* Add Baseline module for support alarm module query baseline data.
* BaseLine: Support query baseline metrics names.
* Add `Get Node List in the Cluster` API.

#### UI

Expand Down Expand Up @@ -102,6 +103,7 @@
* Add Status APIs docs.
* Simplified the release process with removing maven central publish relative processes.
* Add Circuit Breaking mechanism doc.
* Add `Get Node List in the Cluster` API doc.


All issues and pull requests are [here](https://github.com/apache/skywalking/milestone/224?closed=1)
Expand Down
7 changes: 7 additions & 0 deletions docs/en/setup/backend/backend-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@ There are various ways to manage the cluster in the backend. Choose the one that
In the `application.yml` file, there are default configurations for the aforementioned coordinators under the
section `cluster`. You can specify any of them in the `selector` property to enable it.

___
**NOTICE**,
Before you set up the cluster, please read the [Query Cluster Nodes](../../status/query_cluster_nodes.md) API to understand how to
verify the cluster node list. If the nodes don't match the expectation, the cluster is not working properly, there could
be many feature impacts, e.g. the metrics could be inaccurate and the alarms could not be triggered correctly.
___

# Cloud Native
## Kubernetes

Expand Down
39 changes: 39 additions & 0 deletions docs/en/status/query_cluster_nodes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Get Node List in the Cluster

The OAP cluster is a set of OAP servers that work together to provide a scalable and reliable service. The OAP cluster
supports [various cluster coordinator](../setup/backend/backend-cluster.md) to manage the cluster membership and the
communication.
This API provides capability to query the node list in the cluster from every OAP node perspective. If the cluster
coordinator doesn't work properly, the node list may be incomplete or incorrect. So, we recommend you to check the
node list when set up a cluster.

This API is used to get the unified and effective TTL configurations.

- URL, `http://{core restHost}:{core restPort}/status/cluster/nodes`
- HTTP GET method.

```json
{
"nodes": [
{
"host": "10.0.12.23",
"port": 11800,
"isSelf": true
},
{
"host": "10.0.12.25",
"port": 11800,
"isSelf": false
},
{
"host": "10.0.12.37",
"port": 11800,
"isSelf": false
}
]
}
```

The `nodes` list all the nodes in the cluster. The size of the list should be exactly same as your cluster setup.
The `host` and `port` are the address of the OAP node, which are used for OAP nodes communicating with each other. The
`isSelf` is a flag to indicate whether the node is the current node, others are remote nodes.
1 change: 1 addition & 0 deletions docs/en/status/status_apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ logs and self-observability solutions.
- [Dump Effective Initial Configurations API](../debugging/config_dump.md)
- [Tracing Query Execution APIs](../debugging/query-tracing.md)
- [Get Effective TTL Configurations API](query_ttl_setup.md)
- [Query Cluster Nodes API](query_cluster_nodes.md)

If you have a proposal about new status API, please don't hesitate
to [create a discussion](https://github.com/apache/skywalking/discussions/new?category=ideas).
Expand Down
2 changes: 2 additions & 0 deletions docs/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,8 @@ catalog:
path: "/en/debugging/query-tracing"
- name: "Get Effective TTL Configurations"
path: "/en/status/query_ttl_setup"
- name: "Get Node List in the Cluster"
path: "/en/status/query_cluster_nodes"
- name: "Customization"
catalog:
- name: "Overview"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

package org.apache.skywalking.oap.query.debug;

import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import com.linecorp.armeria.common.HttpRequest;
import com.linecorp.armeria.common.HttpResponse;
import com.linecorp.armeria.common.MediaType;
import com.linecorp.armeria.server.annotation.ExceptionHandler;
import com.linecorp.armeria.server.annotation.Get;
import lombok.extern.slf4j.Slf4j;
import org.apache.skywalking.oap.server.core.CoreModule;
import org.apache.skywalking.oap.server.core.remote.client.Address;
import org.apache.skywalking.oap.server.core.remote.client.RemoteClientManager;
import org.apache.skywalking.oap.server.library.module.ModuleManager;

@Slf4j
@ExceptionHandler(StatusQueryExceptionHandler.class)
public class ClusterStatusQueryHandler {
private final ModuleManager moduleManager;
private RemoteClientManager remoteClientManager;

public ClusterStatusQueryHandler(final ModuleManager manager) {
this.moduleManager = manager;
}

private RemoteClientManager getRemoteClientManager() {
if (remoteClientManager == null) {
remoteClientManager = moduleManager.find(CoreModule.NAME)
.provider()
.getService(RemoteClientManager.class);
}
return remoteClientManager;
}

@Get("/status/cluster/nodes")
public HttpResponse buildClusterNodeList(HttpRequest request) {
JsonObject clusterInfo = new JsonObject();

JsonArray nodeList = new JsonArray();
clusterInfo.add("nodes", nodeList);
getRemoteClientManager().getRemoteClient().stream().map(c -> {
final Address address = c.getAddress();
JsonObject node = new JsonObject();
node.addProperty("host", address.getHost());
node.addProperty("port", address.getPort());
node.addProperty("isSelf", address.isSelf());
return node;
}).forEach(nodeList::add);

return HttpResponse.of(MediaType.JSON_UTF_8, clusterInfo.toString());
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ public void start() throws ServiceNotProvidedException {
new TTLConfigQueryHandler(getManager()),
Collections.singletonList(HttpMethod.GET)
);
service.addHandler(
new ClusterStatusQueryHandler(getManager()),
Collections.singletonList(HttpMethod.GET)
);
}

public void notifyAfterCompleted() throws ServiceNotProvidedException, ModuleStartException {
Expand Down

0 comments on commit 44844dd

Please sign in to comment.