Skip to content

Commit

Permalink
[fix](cloud) Fix cloud decomission lead to fe cant start (#46783)
Browse files Browse the repository at this point in the history
Fix issue with SQL node decommissioning process

The SQL node decommissioning process does not wait for transactions at
the watermark level to complete before setting the backend's
isDecommissioned status to true.

As a result, the value displayed in show backends immediately reflects
isDecommissioned regardless of ongoing transactions initiated via SQL.

When a user calls drop be to remove a backend while there is only one
backend in the cluster, the edit log logs the drop backend action, which
removes the cluster information from memory.

After dropping the backend, the previous transaction watermark process
completes its tasks and attempts to modify the backend status, which
requires accessing the cluster information. However, since the cluster
information has already been deleted, this results in a null pointer
exception (NPE) during the lookup in the FE memory map, causing the FE
to crash.

Additionally, the sequence of edit logs is fixed as follows:

Edit log logs drop backend
Edit log modifies backend
FE fails to start up


```
2025-01-10 05:46:15,070 ERROR (replayer|15) [EditLog.loadJournal():1251] replay Operation Type 91, log id: 10578
java.lang.NullPointerException: Cannot invoke "org.apache.doris.system.Backend.getCloudClusterName()" because "memBe" is null
        at org.apache.doris.cloud.system.CloudSystemInfoService.replayModifyBackend(CloudSystemInfoService.java:461) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:432) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.replayJournal(Env.java:2999) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env$4.runOneCycle(Env.java:2761) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:119) ~[doris-fe.jar:1.2-SNAPSHOT]
```
  • Loading branch information
deardeng authored Jan 13, 2025
1 parent 87858e5 commit d9eb14a
Showing 1 changed file with 3 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,9 @@ public void replayDropBackend(Backend backend) {
@Override
public void replayModifyBackend(Backend backend) {
Backend memBe = getBackend(backend.getId());
if (memBe == null) {
return;
}
// for rename cluster
String originalClusterName = memBe.getCloudClusterName();
String originalClusterId = memBe.getCloudClusterId();
Expand Down

0 comments on commit d9eb14a

Please sign in to comment.