Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐞 反馈问题:5.17.3 版本的 Rainbond 开源版其中一个 worker 占用资源异常 #2144

Open
3 of 4 tasks
Hanxc-erds opened this issue Feb 10, 2025 · 7 comments
Open
3 of 4 tasks
Assignees
Labels
Bug BUG Feedback

Comments

@Hanxc-erds
Copy link

请先确认以下事项:

问题描述

目前集群中 rainbond-worker 组件有三个节点,其中一个节点的 CPU、内存占用远比另外两个节点高,且一直在波动,最高达到 CPU 占用 4000+m,内存占用 8000+Mi。

出现异常情况时 worker 的日志是存在异常的第三方组件:
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/07bd767ec95e778a45951c497bf19efd
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/07bd767ec95e778a45951c497bf19efd
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/8cc05fccd3ead1bb762c14554cf16099
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/8cc05fccd3ead1bb762c14554cf16099
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/98fde89419a0a16480bb59872e4c018b
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/98fde89419a0a16480bb59872e4c018b
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/8e76c323149b0dc92bf2f4e7ce35f1d6
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/8e76c323149b0dc92bf2f4e7ce35f1d6
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/2cebcb4bccba5fcb32862ae6b4877578
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/2cebcb4bccba5fcb32862ae6b4877578
time="2025-02-07T09:07:58+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/17a71385734f8ef5293c93df4557fd13
time="2025-02-07T09:07:58+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/17a71385734f8ef5293c93df4557fd13

将异常的第三方组件清理以后,重启 worker 节点还是占用异常,再次查看日志后日志内容如下:

Image

目前的 worker 日志如下:
time="2025-02-10T11:20:28+08:00" level=info msg="start get all service disk size"
time="2025-02-10T11:20:47+08:00" level=info msg="success collect worker master metric"
time="2025-02-10T11:20:47+08:00" level=info msg="success collect worker master metric"
time="2025-02-10T11:22:11+08:00" level=info msg="update componentdefinition core-thirdcomponent"
time="2025-02-10T11:22:53+08:00" level=info msg="end get all service disk size,time consum 145 s"
time="2025-02-10T11:23:47+08:00" level=info msg="success collect worker master metric"
time="2025-02-10T11:23:47+08:00" level=info msg="success collect worker master metric"

该问题是否可以稳定重现?

不可重现

重现步骤

不一定可以重现,根据出现异常时的 worker 日志显示,当时存在异常的第三方组件,但是删除异常第三方组件后,worker 占用的资源还是异常。

截图

Image
Image
Image

Image

Image

日志

出现异常情况时 worker 的日志是存在异常的第三方组件:
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/07bd767ec95e778a45951c497bf19efd
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/07bd767ec95e778a45951c497bf19efd
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/8cc05fccd3ead1bb762c14554cf16099
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/8cc05fccd3ead1bb762c14554cf16099
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/98fde89419a0a16480bb59872e4c018b
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/98fde89419a0a16480bb59872e4c018b
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/8e76c323149b0dc92bf2f4e7ce35f1d6
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/8e76c323149b0dc92bf2f4e7ce35f1d6
time="2025-02-07T09:07:54+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/2cebcb4bccba5fcb32862ae6b4877578
time="2025-02-07T09:07:54+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/2cebcb4bccba5fcb32862ae6b4877578
time="2025-02-07T09:07:58+08:00" level=info msg="list component service success, size:0" thirdcomponent=zhwl/17a71385734f8ef5293c93df4557fd13
time="2025-02-07T09:07:58+08:00" level=warning msg="component service is empty" thirdcomponent=zhwl/17a71385734f8ef5293c93df4557fd13

将异常的第三方组件清理以后,重启 worker 节点时的日志内容:

Image

目前的 worker 日志如下:
time="2025-02-10T11:20:28+08:00" level=info msg="start get all service disk size"
time="2025-02-10T11:20:47+08:00" level=info msg="success collect worker master metric"
time="2025-02-10T11:20:47+08:00" level=info msg="success collect worker master metric"
time="2025-02-10T11:22:11+08:00" level=info msg="update componentdefinition core-thirdcomponent"
time="2025-02-10T11:22:53+08:00" level=info msg="end get all service disk size,time consum 145 s"
time="2025-02-10T11:23:47+08:00" level=info msg="success collect worker master metric"
time="2025-02-10T11:23:47+08:00" level=info msg="success collect worker master metric"

期望结果

让资源占用异常的 worker 节点恢复和其他两个节点使用的资源差不多。

解决方案(可选)

No response

操作系统 && Rainbond 版本

Centos 7.9;v5.17.3-release-1e9f3b73c-2024-07-31-03

是否愿意提交 PR 解决该问题?

  • 我愿意提交 PR 来解决该问题
@Hanxc-erds Hanxc-erds added the Bug BUG Feedback label Feb 10, 2025
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: 🐞 Feedback issue: One of the workers in the open source version of Rainbond in version 5.17.3 is abnormally consuming resources in the open source version of Rainbond.

@Hanxc-erds
Copy link
Author

Rainbond 版本不是最新的版本,是 5.17.3 版本,不选那个复选框不允许提交。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


The Rainbond version is not the latest version, it is version 5.17.3. Submission is not allowed without selecting the check box.

@zzzhangqi zzzhangqi assigned ZhangSetSail and unassigned zzzhangqi Feb 10, 2025
@ZhangSetSail
Copy link
Collaborator

work资源占用高的节点是否存在较多的存储,因为worker还会统计节点的存储占用,可能会占用较多的资源。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Whether nodes with high work resource occupancy has more storage, because the worker will also count the node's storage occupancy, which may occupy more resources.

@Hanxc-erds
Copy link
Author

work资源占用高的节点是否存在较多的存储,因为worker还会统计节点的存储占用,可能会占用较多的资源。

不是,异常的 worker 节点不是固定在一个服务器上的,如果使用 delete 杀死目前异常的 worker 节点,重新拉起的 worker 节点不会出现异常,仍然存在的其他 worker 节点中的某一个节点会开始资源异常占用。所有服务器的存储目前都在 70% 以下。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Whether nodes with high work resource occupancy has more storage, because the worker will also count the node's storage occupancy, which may occupy more resources.

No, the exception worker node is not fixed on a server. If you use delete to kill the currently exception worker node, the re-pulled worker node will not have an exception. One of the other worker nodes that still exist will start the resource. Exceptional occupation. All servers currently have storage below 70%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug BUG Feedback
Projects
None yet
Development

No branches or pull requests

4 participants