diff --git a/404.html b/404.html index 50b754d0314..45b3d609530 100644 --- a/404.html +++ b/404.html @@ -4,7 +4,7 @@
Code : 400 Bad Request
(URL query is invalid)
Code : 404 Not Found
(Partition or Queue not found)
Code : 500 Internal Server Error
Fetch all Applications for the given Partition/Queue/State combination and displays general information about the applications like used resources, queue name, submission time and allocations.
+The state parameter must be set to "active", which is not an actual application state but a virtual state used for this API call. This fake state represents the following application states: New, Accepted, Running, Completing, Failing, and Resuming. You can further narrow down the results using the optional status query parameter to filter for specific real states.
+URL : /ws/v1/partition/:partition/queue/:queue/applications/:state
Method : GET
Auth required : NO
+URL query parameters :
+status
(optional) : Filters active applications by their specific real state (New, Accepted, Running, Completing, Failing, Resuming)Example requests:
+/ws/v1/partition/default/queue/root/applications/active
/ws/v1/partition/default/queue/root/applications/active?status=running
Note: If the queue name contains any special characters, it needs to be URL escaped to avoid issues.
+Code : 200 OK
Content examples
+The content of the application object is the same as Queue Applications. See Queue Applications for details.
+Code : 400 Bad Request
(URL query is invalid)
Code : 404 Not Found
(Partition or Queue not found)
Code : 500 Internal Server Error
Fetch an Application given a Partition, Queue(optional) and Application ID and displays general information about the application like used resources, queue name, submission time and allocations. In case the queue name contains any special characters, it needs to be url escaped to avoid issues.
URL : /ws/v1/partition/{partitionName}/application/{appId}
or /ws/v1/partition/{partitionName}/queue/{queueName}/application/{appId}
Method : GET
Auth required : NO
-Code : 200 OK
Deprecated:
Field uuid
has been deprecated, would be removed from below response in YUNIKORN 1.7.0 release. AllocationID
has replaced uuid
. Both uuid
and AllocationID
fields have the same value. AllocationID
has extra suffix containing hyphen and counter (-0, -1 and so on) at the end.
Content example
{
"applicationID": "application-0001",
"usedResource": {
"memory": 4000000000,
"vcore": 4000
},
"maxUsedResource": {
"memory": 4000000000,
"vcore": 4000
},
"pendingResource": {
"memory": 4000000000,
"vcore": 4000
},
"partition": "default",
"queueName": "root.default",
"submissionTime": 1648754032076020293,
"requests": [
{
"allocationKey": "f137fab6-3cfa-4536-93f7-bfff92689382",
"allocationTags": {
"kubernetes.io/label/app": "sleep",
"kubernetes.io/label/applicationId": "application-0001",
"kubernetes.io/label/queue": "root.default",
"kubernetes.io/meta/namespace": "default",
"kubernetes.io/meta/podName": "task2"
},
"requestTime": 16487540320812345678,
"resource": {
"memory": 4000000000,
"vcore": 4000
},
"pendingCount": 1,
"priority": "0",
"requiredNodeId": "",
"applicationId": "application-0001",
"partition": "default",
"placeholder": false,
"placeholderTimeout": 0,
"taskGroupName": "",
"allocationLog": [
{
"message": "node(s) didn't match Pod's node affinity, node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate",
"lastOccurrence": 16487540320812346001,
"count": 81
},
{
"message": "node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, node(s) didn't match Pod's node affinity",
"lastOccurrence": 16487540320812346002,
"count": 504
},
{
"message": "node(s) didn't match Pod's node affinity",
"lastOccurrence": 16487540320812346003,
"count": 1170
}
]
}
],
"allocations": [
{
"allocationKey": "deb12221-6b56-4fe9-87db-ebfadce9aa20",
"allocationTags": {
"kubernetes.io/label/app": "sleep",
"kubernetes.io/label/applicationId": "application-0001",
"kubernetes.io/label/queue": "root.default",
"kubernetes.io/meta/namespace": "default",
"kubernetes.io/meta/podName": "task0"
},
"requestTime": 1648754034098912461,
"allocationTime": 1648754035973982920,
"allocationDelay": 1875070459,
"uuid": "9af35d44-2d6f-40d1-b51d-758859e6b8a8",
"allocationID": "9af35d44-2d6f-40d1-b51d-758859e6b8a8-0",
"resource": {
"memory": 4000000000,
"vcore": 4000
},
"priority": "0",
"nodeId": "node-0001",
"applicationId": "application-0001",
"partition": "default",
"placeholder": false,
"placeholderUsed": true
}
],
"applicationState": "Running",
"user": "system:serviceaccount:kube-system:deployment-controller",
"groups": [
"system:serviceaccounts",
"system:serviceaccounts:kube-system",
"system:authenticated"
],
"rejectedMessage": "",
"stateLog": [
{
"time": 1648741409145224000,
"applicationState": "Accepted"
},
{
"time": 1648741409147432100,
"applicationState": "Running"
}
],
"placeholderData": [
{
"taskGroupName": "task-group-example",
"count": 2,
"minResource": {
"memory": 1000000000,
"vcore": 100
},
"replaced": 1,
"timedout": 1
}
],
"hasReserved": false,
"reservations": []
}
Code : 400 Bad Request
(URL query is invalid)
Code : 404 Not Found
(Partition or Application not found)
Code : 500 Internal Server Error
Code : 200 OK
Content example
[
{
"userName": "user1",
"groups": {
"app2": "tester"
},
"queues":
{
"queuePath": "root",
"resourceUsage": {
"memory": 12000000000,
"vcore": 12000
},
"runningApplications": ["app1", "app2"],
"children": [
{
"queuePath": "root.default",
"resourceUsage": {
"memory": 6000000000,
"vcore": 6000
},
"runningApplications": ["app1"],
"children": []
},
{
"queuePath": "root.test",
"resourceUsage": {
"memory": 6000000000,
"vcore": 6000
},
"runningApplications": [
"app2"
],
"children": []
}]
}
},
{
"userName": "user2",
"groups": {
"app1": "tester"
},
"queues":
{
"queuePath": "root",
"resourceUsage": {
"memory": 11000000000,
"vcore": 10000
},
"runningApplications": ["app1", "app2", "app3"],
"children": [
{
"queuePath": "root.default",
"resourceUsage": {
"memory": 5000000000,
"vcore": 5000
},
"runningApplications": ["app1"],
"children": []
},
{
"queuePath": "root.test",
"resourceUsage": {
"memory": 4000000000,
"vcore": 4000
},
"runningApplications": [
"app3"
],
"children": []
}]
}
}
]
Code : 500 Internal Server Error
Code : 200 OK
Content example
{
"userName": "user1",
"groups": {
"app1": "tester"
},
"queues":
{
"queuePath": "root",
"resourceUsage": {
"memory": 12000000000,
"vcore": 12000
},
"runningApplications": ["app1", "app2"],
"children": [
{
"queuePath": "root.default",
"resourceUsage": {
"memory": 6000000000,
"vcore": 6000
},
"runningApplications": ["app1"],
"children": []
},
{
"queuePath": "root.test",
"resourceUsage": {
"memory": 6000000000,
"vcore": 6000
},
"runningApplications": [
"app2"
],
"children": []
}]
}
}
Code : 400 Bad Request
(URL query is invalid)
Code : 404 Not Found
(User not found)
Code : 500 Internal Server Error
Code : 200 OK
Content example
[
{
"groupName": "group1",
"applications": ["app1", "app2"],
"queues":
{
"queuePath": "root",
"resourceUsage": {
"memory": 12000000000,
"vcore": 12000
},
"runningApplications": ["app1", "app2"],
"children": [
{
"queuePath": "root.default",
"resourceUsage": {
"memory": 6000000000,
"vcore": 6000
},
"runningApplications": ["app1"],
"children": []
},
{
"queuePath": "root.test",
"resourceUsage": {
"memory": 6000000000,
"vcore": 6000
},
"runningApplications": [
"app2"
],
"children": []
}]
}
},
{
"groupName": "group2",
"applications": ["app1", "app2", "app3"],
"queues":
{
"queuePath": "root",
"resourceUsage": {
"memory": 11000000000,
"vcore": 10000
},
"runningApplications": ["app1", "app2", "app3"],
"children": [
{
"queuePath": "root.default",
"resourceUsage": {
"memory": 5000000000,
"vcore": 5000
},
"runningApplications": ["app1"],
"children": []
},
{
"queuePath": "root.test",
"resourceUsage": {
"memory": 4000000000,
"vcore": 4000
},
"runningApplications": [
"app3"
],
"children": []
}]
}
}
]
Code : 500 Internal Server Error
Code : 200 OK
Content example
{
"groupName": "group1",
"applications": ["app1", "app2"],
"queues":
{
"queuePath": "root",
"resourceUsage": {
"memory": 12000000000,
"vcore": 12000
},
"runningApplications": ["app1", "app2"],
"children": [
{
"queuePath": "root.default",
"resourceUsage": {
"memory": 6000000000,
"vcore": 6000
},
"runningApplications": ["app1"],
"children": []
},
{
"queuePath": "root.test",
"resourceUsage": {
"memory": 6000000000,
"vcore": 6000
},
"runningApplications": [
"app2"
],
"children": []
}]
}
}
Code : 400 Bad Request
(URL query is invalid)
Code : 404 Not Found
(Group not found)
Code : 500 Internal Server Error
URL : /ws/v1/partition/{partitionName}/nodes
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
Here you can see an example response from a 2-node cluster having 3 allocations.
[
{
"nodeID": "node-0001",
"hostName": "",
"rackName": "",
"attributes": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "node-0001",
"kubernetes.io/os": "linux",
"node-role.kubernetes.io/control-plane": "",
"node-role.kubernetes.io/master": "",
"node.kubernetes.io/exclude-from-external-load-balancers": "",
"ready": "true",
"si.io/hostname": "node-0001",
"si.io/rackname": "/rack-default",
"si/instance-type": "",
"si/node-partition": "[mycluster]default"
},
"capacity": {
"ephemeral-storage": 75850798569,
"hugepages-1Gi": 0,
"hugepages-2Mi": 0,
"memory": 14577000000,
"pods": 110,
"vcore": 10000
},
"allocated": {
"memory": 6000000000,
"vcore": 6000
},
"occupied": {
"memory": 154000000,
"vcore" :750
},
"available": {
"ephemeral-storage": 75850798569,
"hugepages-1Gi": 0,
"hugepages-2Mi": 0,
"memory": 6423000000,
"pods": 110,
"vcore": 1250
},
"utilized": {
"memory": 3,
"vcore": 13
},
"allocations": [
{
"allocationKey": "54e5d77b-f4c3-4607-8038-03c9499dd99d",
"allocationTags": {
"kubernetes.io/label/app": "sleep",
"kubernetes.io/label/applicationId": "application-0001",
"kubernetes.io/label/queue": "root.default",
"kubernetes.io/meta/namespace": "default",
"kubernetes.io/meta/podName": "task0"
},
"requestTime": 1648754034098912461,
"allocationTime": 1648754035973982920,
"allocationDelay": 1875070459,
"uuid": "08033f9a-4699-403c-9204-6333856b41bd",
"allocationID": "08033f9a-4699-403c-9204-6333856b41bd-0",
"resource": {
"memory": 2000000000,
"vcore": 2000
},
"priority": "0",
"nodeId": "node-0001",
"applicationId": "application-0001",
"partition": "default",
"placeholder": false,
"placeholderUsed": false
},
{
"allocationKey": "deb12221-6b56-4fe9-87db-ebfadce9aa20",
"allocationTags": {
"kubernetes.io/label/app": "sleep",
"kubernetes.io/label/applicationId": "application-0002",
"kubernetes.io/label/queue": "root.default",
"kubernetes.io/meta/namespace": "default",
"kubernetes.io/meta/podName": "task0"
},
"requestTime": 1648754034098912461,
"allocationTime": 1648754035973982920,
"allocationDelay": 1875070459,
"uuid": "9af35d44-2d6f-40d1-b51d-758859e6b8a8",
"allocationID": "9af35d44-2d6f-40d1-b51d-758859e6b8a8-0",
"resource": {
"memory": 4000000000,
"vcore": 4000
},
"priority": "0",
"nodeId": "node-0001",
"applicationId": "application-0002",
"partition": "default",
"placeholder": false,
"placeholderUsed": false
}
],
"schedulable": true
},
{
"nodeID": "node-0002",
"hostName": "",
"rackName": "",
"attributes": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "node-0002",
"kubernetes.io/os": "linux",
"ready": "false",
"si.io/hostname": "node-0002",
"si.io/rackname": "/rack-default",
"si/instance-type": "",
"si/node-partition": "[mycluster]default"
},
"capacity": {
"ephemeral-storage": 75850798569,
"hugepages-1Gi": 0,
"hugepages-2Mi": 0,
"memory": 14577000000,
"pods": 110,
"vcore": 10000
},
"allocated": {
"memory": 2000000000,
"vcore": 2000
},
"occupied": {
"memory": 154000000,
"vcore" :750
},
"available": {
"ephemeral-storage": 75850798569,
"hugepages-1Gi": 0,
"hugepages-2Mi": 0,
"memory": 6423000000,
"pods": 110,
"vcore": 1250
},
"utilized": {
"memory": 8,
"vcore": 38
},
"allocations": [
{
"allocationKey": "af3bd2f3-31c5-42dd-8f3f-c2298ebdec81",
"allocationTags": {
"kubernetes.io/label/app": "sleep",
"kubernetes.io/label/applicationId": "application-0001",
"kubernetes.io/label/queue": "root.default",
"kubernetes.io/meta/namespace": "default",
"kubernetes.io/meta/podName": "task1"
},
"requestTime": 1648754034098912461,
"allocationTime": 1648754035973982920,
"allocationDelay": 1875070459,
"uuid": "96beeb45-5ed2-4c19-9a83-2ac807637b3b",
"allocationID": "96beeb45-5ed2-4c19-9a83-2ac807637b3b-0",
"resource": {
"memory": 2000000000,
"vcore": 2000
},
"priority": "0",
"nodeId": "node-0002",
"applicationId": "application-0001",
"partition": "default",
"placeholder": false,
"placeholderUsed": false
}
],
"schedulable": true,
"isReserved": false,
"reservations": []
}
]
Code : 400 Bad Request
(URL query is invalid)
Code : 404 Not Found
(Partition not found)
Code : 500 Internal Server Error
URL : /ws/v1/partition/{partitionName}/node/{nodeId}
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
{
"nodeID":"node-0001",
"hostName":"",
"rackName":"",
"capacity":{
"ephemeral-storage":75850798569,
"hugepages-1Gi":0,
"hugepages-2Mi":0,
"memory":14577000000,
"pods":110,
"vcore":10000
},
"allocated":{
"memory":6000000000,
"vcore":6000
},
"occupied":{
"memory":154000000,
"vcore":750
},
"available":{
"ephemeral-storage":75850798569,
"hugepages-1Gi":0,
"hugepages-2Mi":0,
"memory":6423000000,
"pods":110,
"vcore":1250
},
"utilized":{
"memory":3,
"vcore":13
},
"allocations":[
{
"allocationKey":"54e5d77b-f4c3-4607-8038-03c9499dd99d",
"allocationTags":{
"kubernetes.io/label/app":"sleep",
"kubernetes.io/label/applicationId":"application-0001",
"kubernetes.io/label/queue":"root.default",
"kubernetes.io/meta/namespace":"default",
"kubernetes.io/meta/podName":"task0"
},
"requestTime":1648754034098912461,
"allocationTime":1648754035973982920,
"allocationDelay":1875070459,
"uuid":"08033f9a-4699-403c-9204-6333856b41bd",
"allocationID":"08033f9a-4699-403c-9204-6333856b41bd-0",
"resource":{
"memory":2000000000,
"vcore":2000
},
"priority":"0",
"nodeId":"node-0001",
"applicationId":"application-0001",
"partition":"default",
"placeholder":false,
"placeholderUsed":false
},
{
"allocationKey":"deb12221-6b56-4fe9-87db-ebfadce9aa20",
"allocationTags":{
"kubernetes.io/label/app":"sleep",
"kubernetes.io/label/applicationId":"application-0002",
"kubernetes.io/label/queue":"root.default",
"kubernetes.io/meta/namespace":"default",
"kubernetes.io/meta/podName":"task0"
},
"requestTime":1648754034098912461,
"allocationTime":1648754035973982920,
"allocationDelay":1875070459,
"uuid":"9af35d44-2d6f-40d1-b51d-758859e6b8a8",
"allocationID":"9af35d44-2d6f-40d1-b51d-758859e6b8a8-0",
"resource":{
"memory":4000000000,
"vcore":4000
},
"priority":"0",
"nodeId":"node-0001",
"applicationId":"application-0002",
"partition":"default",
"placeholder":false,
"placeholderUsed":false
}
],
"schedulable":true
}
Code : 400 Bad Request
(URL query is invalid)
Code : 404 Not Found
(Partition or Node not found)
Code : 500 Internal Server Error
URL : /ws/v1/scheduler/node-utilization
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
{
"type": "vcore",
"utilization": [
{
"bucketName": "0-10%",
"numOfNodes": 1,
"nodeNames": [
"aethergpu"
]
},
{
"bucketName": "10-20%",
"numOfNodes": 2,
"nodeNames": [
"primary-node",
"second-node"
]
},
...
]
}
Code : 500 Internal Server Error
Show the nodes utilization of different types of resources in a cluster.
URL : /ws/v1/scheduler/node-utilizations
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
[
{
"clusterId": "mycluster",
"partition": "default",
"utilizations": [
{
"type": "pods",
"utilization": [
{
"bucketName": "0-10%",
"numOfNodes": 2,
"nodeNames": [
"primary-node",
"second-node"
]
},
{
"bucketName": "10-20%"
},
...
]
},
{
"type": "vcores",
"utilization": [
{
"bucketName": "0-10%",
"numOfNodes": 2,
"nodeNames": [
"primary-node",
"second-node"
]
},
{
"bucketName": "10-20%"
},
...
]
},
...
]
}
]
Code : 500 Internal Server Error
Dumps the stack traces of the currently running goroutines.
URL : /ws/v1/stack
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
goroutine 356 [running
]:
github.com/apache/yunikorn-core/pkg/webservice.getStackInfo.func1(0x30a0060,
0xc003e900e0,
0x2)
/yunikorn/go/pkg/mod/github.com/apache/yunikorn-core@v0.0.0-20200717041747-f3e1c760c714/pkg/webservice/handlers.go: 41 +0xab
github.com/apache/yunikorn-core/pkg/webservice.getStackInfo(0x30a0060,
0xc003e900e0,
0xc00029ba00)
/yunikorn/go/pkg/mod/github.com/apache/yunikorn-core@v0.0.0-20200717041747-f3e1c760c714/pkg/webservice/handlers.go: 48 +0x71
net/http.HandlerFunc.ServeHTTP(0x2df0e10,
0x30a0060,
0xc003e900e0,
0xc00029ba00)
/usr/local/go/src/net/http/server.go: 1995 +0x52
github.com/apache/yunikorn-core/pkg/webservice.Logger.func1(0x30a0060,
0xc003e900e0,
0xc00029ba00)
/yunikorn/go/pkg/mod/github.com/apache/yunikorn-core@v0.0.0-20200717041747-f3e1c760c714/pkg/webservice/webservice.go: 65 +0xd4
net/http.HandlerFunc.ServeHTTP(0xc00003a570,
0x30a0060,
0xc003e900e0,
0xc00029ba00)
/usr/local/go/src/net/http/server.go: 1995 +0x52
github.com/gorilla/mux.(*Router).ServeHTTP(0xc00029cb40,
0x30a0060,
0xc003e900e0,
0xc0063fee00)
/yunikorn/go/pkg/mod/github.com/gorilla/mux@v1.7.3/mux.go: 212 +0x140
net/http.serverHandler.ServeHTTP(0xc0000df520,
0x30a0060,
0xc003e900e0,
0xc0063fee00)
/usr/local/go/src/net/http/server.go: 2774 +0xcf
net/http.(*conn).serve(0xc0000eab40,
0x30a61a0,
0xc003b74000)
/usr/local/go/src/net/http/server.go: 1878 +0x812
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go: 2884 +0x4c5
goroutine 1 [chan receive,
26 minutes
]:
main.main()
/yunikorn/pkg/shim/main.go: 52 +0x67a
goroutine 19 [syscall,
26 minutes
]:
os/signal.signal_recv(0x1096f91)
/usr/local/go/src/runtime/sigqueue.go: 139 +0x9f
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go: 23 +0x30
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go: 29 +0x4f
...
Code : 500 Internal Server Error
Endpoint to retrieve metrics from the Prometheus server. @@ -248,7 +274,7 @@
Code : 200 OK
Content examples
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 2.567e-05
go_gc_duration_seconds{quantile="0.25"} 3.5727e-05
go_gc_duration_seconds{quantile="0.5"} 4.5144e-05
go_gc_duration_seconds{quantile="0.75"} 6.0024e-05
go_gc_duration_seconds{quantile="1"} 0.00022528
go_gc_duration_seconds_sum 0.021561648
go_gc_duration_seconds_count 436
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 82
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.12.17"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 9.6866248e+07
...
# HELP yunikorn_scheduler_vcore_nodes_usage Nodes resource usage, by resource name.
# TYPE yunikorn_scheduler_vcore_nodes_usage gauge
yunikorn_scheduler_vcore_nodes_usage{range="(10%, 20%]"} 0
yunikorn_scheduler_vcore_nodes_usage{range="(20%,30%]"} 0
yunikorn_scheduler_vcore_nodes_usage{range="(30%,40%]"} 0
yunikorn_scheduler_vcore_nodes_usage{range="(40%,50%]"} 0
yunikorn_scheduler_vcore_nodes_usage{range="(50%,60%]"} 0
yunikorn_scheduler_vcore_nodes_usage{range="(60%,70%]"} 0
yunikorn_scheduler_vcore_nodes_usage{range="(70%,80%]"} 1
yunikorn_scheduler_vcore_nodes_usage{range="(80%,90%]"} 0
yunikorn_scheduler_vcore_nodes_usage{range="(90%,100%]"} 0
yunikorn_scheduler_vcore_nodes_usage{range="[0,10%]"} 0
URL : /ws/v1/validate-conf
Method : POST
Auth required : NO
-Regardless whether the configuration is allowed or not if the server was able to process the request, it will yield a 200 HTTP status code.
Code : 200 OK
URL : /ws/v1/config
Method : GET
Auth required : NO
-Code : 200 OK
Content example (with Accept: application/json
header)
{
"Partitions": [
{
"Name": "default",
"Queues": [
{
"Name": "root",
"Parent": true,
"Resources": {},
"SubmitACL": "*",
"ChildTemplate": {
"Resources": {}
}
}
],
"PlacementRules": [
{
"Name": "tag",
"Create": true,
"Filter": {
"Type": ""
},
"Value": "namespace"
}
],
"Preemption": {
"Enabled": false
},
"NodeSortPolicy": {
"Type": ""
}
}
],
"Checksum": "FD5D3726DF0F02416E02F3919D78F61B15D14425A34142D93B24C137ED056946",
"Extra": {
"event.trackingEnabled": "false",
"log.core.scheduler.level": "info",
"log.core.security.level": "info",
"log.level": "debug"
}
}
URL : /ws/v1/history/apps
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
[
{
"timestamp": 1595939966153460000,
"totalApplications": "1"
},
{
"timestamp": 1595940026152892000,
"totalApplications": "1"
},
{
"timestamp": 1595940086153799000,
"totalApplications": "2"
},
{
"timestamp": 1595940146154497000,
"totalApplications": "2"
},
{
"timestamp": 1595940206155187000,
"totalApplications": "2"
}
]
Code : 500 Internal Server Error
Endpoint to retrieve historical data about the number of total containers by timestamp.
URL : /ws/v1/history/containers
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
[
{
"timestamp": 1595939966153460000,
"totalContainers": "1"
},
{
"timestamp": 1595940026152892000,
"totalContainers": "1"
},
{
"timestamp": 1595940086153799000,
"totalContainers": "3"
},
{
"timestamp": 1595940146154497000,
"totalContainers": "3"
},
{
"timestamp": 1595940206155187000,
"totalContainers": "3"
}
]
Code : 500 Internal Server Error
Endpoint to retrieve historical data about critical logs, negative resource on node/cluster/app, ...
URL : /ws/v1/scheduler/healthcheck
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
{
"Healthy": true,
"HealthChecks": [
{
"Name": "Scheduling errors",
"Succeeded": true,
"Description": "Check for scheduling error entries in metrics",
"DiagnosisMessage": "There were 0 scheduling errors logged in the metrics"
},
{
"Name": "Failed nodes",
"Succeeded": true,
"Description": "Check for failed nodes entries in metrics",
"DiagnosisMessage": "There were 0 failed nodes logged in the metrics"
},
{
"Name": "Negative resources",
"Succeeded": true,
"Description": "Check for negative resources in the partitions",
"DiagnosisMessage": "Partitions with negative resources: []"
},
{
"Name": "Negative resources",
"Succeeded": true,
"Description": "Check for negative resources in the nodes",
"DiagnosisMessage": "Nodes with negative resources: []"
},
{
"Name": "Consistency of data",
"Succeeded": true,
"Description": "Check if a node's allocated resource <= total resource of the node",
"DiagnosisMessage": "Nodes with inconsistent data: []"
},
{
"Name": "Consistency of data",
"Succeeded": true,
"Description": "Check if total partition resource == sum of the node resources from the partition",
"DiagnosisMessage": "Partitions with inconsistent data: []"
},
{
"Name": "Consistency of data",
"Succeeded": true,
"Description": "Check if node total resource = allocated resource + occupied resource + available resource",
"DiagnosisMessage": "Nodes with inconsistent data: []"
},
{
"Name": "Consistency of data",
"Succeeded": true,
"Description": "Check if node capacity >= allocated resources on the node",
"DiagnosisMessage": "Nodes with inconsistent data: []"
},
{
"Name": "Reservation check",
"Succeeded": true,
"Description": "Check the reservation nr compared to the number of nodes",
"DiagnosisMessage": "Reservation/node nr ratio: [0.000000]"
}
]
}
URL : /ws/v1/fullstatedump
Method : GET
Auth required : NO
-Code : 200 OK
Content examples
The output of this REST query can be rather large, and it is a combination of those which have already been demonstrated.
@@ -356,11 +382,11 @@start
(optional) : Specifies the starting ID for retrieving events. If the specified ID is outside the ring buffer
(too low or too high), the response will include the lowest and highest ID values with EventRecords
being empty.
-Code: 200 OK
Content examples
{
"InstanceUUID": "400046c6-2180-41a2-9be1-1c251ab2c498",
"LowestID": 0,
"HighestID": 7,
"EventRecords": [
{
"type": 3,
"objectID": "yk8s-worker",
"message": "schedulable: true",
"timestampNano": 1701347180239597300,
"eventChangeType": 1,
"eventChangeDetail": 302,
"resource": {}
},
{
"type": 3,
"objectID": "yk8s-worker",
"message": "Node added to the scheduler",
"timestampNano": 1701347180239650600,
"eventChangeType": 2,
"resource": {
"resources": {
"ephemeral-storage": {
"value": 502921060352
},
"hugepages-1Gi": {},
"hugepages-2Mi": {},
"memory": {
"value": 33424998400
},
"pods": {
"value": 110
},
"vcore": {
"value": 8000
}
}
}
}
]
}
Code : 500 Internal Server Error
Creates a persistent HTTP connection for event streaming. New events are sent to the clients immediately, so unlike the batch interface, there is no need for polling. @@ -372,14 +398,14 @@
count
(optional) : Specifies the number of past events (those which have been generated before the connection establishment) to include in the response. Default value is 0.Code: 200 OK
Content examples
{"type":2,"objectID":"app-1","timestampNano":1708465452903045265,"eventChangeType":1,"eventChangeDetail":204,"resource":{}}
{"type":2,"objectID":"app-1","timestampNano":1708465452903192898,"eventChangeType":2,"eventChangeDetail":201,"referenceID":"alloc-1","resource":{"resources":{"memory":{"value":10000000},"vcore":{"value":1000}}}}
{"type":3,"objectID":"node-1:1234","timestampNano":1708465452903312146,"eventChangeType":2,"eventChangeDetail":303,"referenceID":"alloc-1","resource":{"resources":{"memory":{"value":10000000},"vcore":{"value":1000}}}}
{"type":2,"objectID":"app-1","timestampNano":1708465452903474210,"eventChangeType":1,"eventChangeDetail":205,"resource":{}}
{"type":5,"objectID":"testuser","timestampNano":1708465452903506166,"eventChangeType":2,"eventChangeDetail":603,"referenceID":"root.singleleaf","resource":{"resources":{"memory":{"value":10000000},"vcore":{"value":1000}}}}
Code : 400 Bad Request
(URL query is invalid)
Code : 503 Service Unavailable
(Too many active streaming connections)
Code : 500 Internal Server Error