-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v7.5.0 After the "pd remove" operation, remnants persist. #8051
Comments
The IP address 10.25.248.131:2380 (VMS584328), which previously belonged to the tikv-oversea cluster, underwent a scale-in operation on 2024/04/08 at 10:19:26. The tiup cluster display command for tikv-oversea has shown that 10.25.248.131:2380 has been removed, and subsequently, the server VMS584328 was taken offline. However, the pd.log indicates that the tikv-oversea cluster is still attempting to connect to 10.25.248.131:2380 and is reporting connection errors, which persisted until 2024/04/10. On 2024/04/10, a new server VMS602679 was brought online, and its IP address, 10.25.248.131, was reused. At 13:47 on the same day, 10.25.248.131:2380 (VMS602679) was scaled out to the tikv-dal-test cluster, which transitioned the tikv-dal-test cluster into a 3+1 configuration. During this time, the six nodes of the tikv-oversea cluster also reconnected to 10.25.248.131:2380, forming a 6+1 configuration. Following this, the combination of 3+1+6, all ten pd nodes were connected, creating a ten-node pd cluster. At this point, data confusion occurred. tikv-oversea tikv-dal-test tikv-oversea pd log: |
Subsequently, I attempted to find a way to cancel the reconnection of the removed pd node. I discovered that changing the pd leader did not work; only by reload the pd could the error be eliminated. reload pd: |
Why do you need such a number of PDs? |
I reproduced this. What's the solution so far? reproduced steps:
As I understand it, these two nodes have already been scaled-in, and there should be no attempt to reconnect the scaled-in node |
Is the pd2 and pd3 still in the output of |
Neither pd2 or pd3 are shown in |
Which tool do you use to manage the cluster, tiup or tidb-operator? |
Does your start script still contain those deleted member? |
Instead of using tiup to deploy, I just deployed manually using docker. |
We have a tikv cluster, architecture 2:2:2. PD gourp loss leader, here are the relevant logs for the PD leader, the probable cause appears to be that meta data can’t be written to etcd. Is there any other way to troubleshoot the issue?
12:04:40, The network experienced a temporary outage.
13:48:55, write meta data failed.
TSO gap
Except for the leader, other nodes have only a small amount of logs: region sync with leader meet error.
The text was updated successfully, but these errors were encountered: