Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out of memory Using WAN federation or cluster peer-to-peer #22051

Open
yeshl opened this issue Jan 4, 2025 · 0 comments
Open

out of memory Using WAN federation or cluster peer-to-peer #22051

yeshl opened this issue Jan 4, 2025 · 0 comments

Comments

@yeshl
Copy link

yeshl commented Jan 4, 2025

Overview of the Issue

Using WAN federation or cluster peer-to-peer, when the peer network is interrupted and unable to connect,
Consul's memory will continue to grow until OOM


Reproduction Steps

  1. dc01:consul peering generate-token -name dc02
  2. dc02:consul peering establish -name dc01 -peering-token
  3. dc01:stop consul agent
    4.dc02:
    CPUS MEMS% VIRT RES PID USER Command('k' to kill)
    11 93.0 59.6G 58.4G 266365 zwsx consul agent config-dir=consul.d/
1月 03 17:51:03 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 03 18:58:33 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 03 20:05:46 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 03 21:14:41 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 03 22:22:46 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 03 23:29:40 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 00:36:38 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 01:43:35 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 02:50:38 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 03:57:35 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 05:04:33 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 06:31:28 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 07:38:22 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 08:45:24 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
1月 04 09:52:22 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.

1月 04 07:38:22 node92 consul[2049947]: 2025-01-04T07:38:22.442+0800 [ERROR] agent.server.peering-syncer: error managing peering stream: peer_id=0cdd8131-59ae-2206-1b19-717471ea320c peer_name=dc00-fz error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 218.66.5.250:8443: connect: connection refused\""
1月 04 07:38:22 node92 systemd[1553348]: consul.service: A process of this unit has been killed by the OOM killer.
1月 04 07:38:22 node92 systemd[1553348]: consul.service: Main process exited, code=killed, status=9/KILL
1月 04 07:38:22 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
...
1月 04 08:45:24 node92 consul[2261961]: 2025-01-04T08:45:24.436+0800 [ERROR] agent.server.peering-syncer: error managing peering stream: peer_id=0cdd8131-59ae-2206-1b19-717471ea320c peer_name=dc00-fz error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 218.66.5.250:8443: connect: connection refused\""
1月 04 08:45:24 node92 systemd[1553348]: consul.service: A process of this unit has been killed by the OOM killer.
1月 04 08:45:24 node92 systemd[1553348]: consul.service: Main process exited, code=killed, status=9/KILL
1月 04 08:45:24 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
...
1月 04 09:52:22 node92 consul[2474359]: 2025-01-04T09:52:22.655+0800 [ERROR] agent.server.peering-syncer: error managing peering stream: peer_id=0cdd8131-59ae-2206-1b19-717471ea320c peer_name=dc00-fz error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 218.66.5.250:8443: connect: connection refused\""
1月 04 09:52:22 node92 systemd[1553348]: consul.service: A process of this unit has been killed by the OOM killer.
1月 04 09:52:22 node92 systemd[1553348]: consul.service: Main process exited, code=killed, status=9/KILL
1月 04 09:52:22 node92 systemd[1553348]: consul.service: Failed with result 'oom-kill'.
...
...
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.202+0800 [WARN]  agent: [core][Channel #21313 SubChannel #21314] grpc: addrConn.createTransport failed to connect to {Addr: "218.66*.*:85>
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.202+0800 [ERROR] agent.server.peering-syncer: error managing peering stream: peer_id=0cdd8131-59ae-2206-1b19-717471ea320c peer_name=dc00-fz >
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.221+0800 [WARN]  agent: [core][Channel #21315 SubChannel #21316] grpc: addrConn.createTransport failed to connect to {Addr: "218.66*.*:85>
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.221+0800 [ERROR] agent.server.peering-syncer: error managing peering stream: peer_id=0cdd8131-59ae-2206-1b19-717471ea320c peer_name=dc00-fz >
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.231+0800 [WARN]  agent: [core][Channel #21317 SubChannel #21318] grpc: addrConn.createTransport failed to connect to {Addr: "218.66*.*:85>

...

Server info
agent:
	check_monitors = 0
	check_ttls = 0
	checks = 1
	services = 1
build:
	prerelease = 
	revision = 920cc7c6
	version = 1.20.1
	version_metadata = 
consul:
	acl = enabled
	bootstrap = true
	known_datacenters = 1
	leader = true
	leader_addr = 10.0.3.92:8300
	server = true
raft:
	applied_index = 13933
	commit_index = 13933
	fsm_pending = 0
	last_contact = 0
	last_log_index = 13933
	last_log_term = 43
	last_snapshot_index = 0
	last_snapshot_term = 0
	latest_configuration = [{Suffrage:Voter ID:38801fb7-175f-a6a6-e23f-d54d03f52108 Address:10.0.3.92:8300}]
	latest_configuration_index = 0
	num_peers = 0
	protocol_version = 3
	protocol_version_max = 3
	protocol_version_min = 0
	snapshot_version_max = 1
	snapshot_version_min = 0
	state = Leader
	term = 43
runtime:
	arch = amd64
	cpu_count = 72
	goroutines = 207
	max_procs = 72
	os = linux
	version = go1.22.7
serf_lan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 1
	event_time = 43
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	coordinate_resets = 0
	encrypted = true
	event_queue = 0
	event_time = 1
	failed = 0
	health_score = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Server agent HCL config

Operating system and Environment details

PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Log Fragments

many error log
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.202+0800 [WARN] agent: [core][Channel #21313 SubChannel #21314] grpc: addrConn.createTransport failed to connect to {Addr: "218.66*.:85>
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.202+0800 [ERROR] agent.server.peering-syncer: error managing peering stream: peer_id=0cdd8131-59ae-2206-1b19-717471ea320c peer_name=dc01 >
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.221+0800 [WARN] agent: [core][Channel #21315 SubChannel #21316] grpc: addrConn.createTransport failed to connect to {Addr: "218.66
.:85>
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.221+0800 [ERROR] agent.server.peering-syncer: error managing peering stream: peer_id=0cdd8131-59ae-2206-1b19-717471ea320c peer_name=dc01 >
1月 03 13:17:23 node92 consul[2755953]: 2025-01-03T13:17:23.231+0800 [WARN] agent: [core][Channel #21317 SubChannel #21318] grpc: addrConn.createTransport failed to connect to {Addr: "218.66
.*:85>
...
...

@yeshl yeshl changed the title out ofmemory out of memory Using WAN federation or cluster peer-to-peer Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant