Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM kill #2576

Closed
zhy827827 opened this issue Jul 18, 2023 · 21 comments
Closed

OOM kill #2576

zhy827827 opened this issue Jul 18, 2023 · 21 comments

Comments

@zhy827827
Copy link

zhy827827 commented Jul 18, 2023

I run the acala archive node, can't synchronize the data. The system always has been in OOM KILL

💻 CPU: Intel(R) Xeon(R) E-2236 CPU @ 3.40GHz
💻 CPU cores: 6
💻 Memory: 64091MB
💻 Kernel: 5.4.0-135-generic
💻 Linux distribution: Ubuntu 20.04.5 LTS

 kernel: [1640602.256461] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-3265.scope,task=acala,pid=1324744,uid=1000
 kernel: [1640602.256533] Out of memory: Killed process 1324744 (acala) total-vm:736942668kB, anon-rss:64497768kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:167084kB oom_score_adj:0

The startup parameter configuration is:
acala --base-path /opt/data/acala --chain=acala --execution wasm --wasm-execution compiled --db-cache 16000 --rpc-cors all --rpc-external --ws-external --pruning archive --no-prometheus -- --chain=polkadot

how to fix this issues??

@xlc
Copy link
Member

xlc commented Jul 18, 2023

What version of Acala? Is this a new node? How big is the db folder? Can you try to remove --db-cache 16000

@zhy827827
Copy link
Author

zhy827827 commented Jul 18, 2023

Yes,it is a new node.

acala version: 2.19.0-unknown

db folder:
258G acala

@zhy827827
Copy link
Author

zhy827827 commented Jul 19, 2023

2023-07-19 01:26:25 [Parachain] ✨ Imported #4042160 (0xe44d…2647)
2023-07-19 01:26:27 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16460836 (40 peers), best: #11916843 (0xd713…e2b6), finalized #11912746 (0xb9c8…87d4), ⬇ 135.3kiB/s ⬆ 128.5kiB/s
2023-07-19 01:26:27 [Parachain] 💤 Idle (8 peers), best: #4042152 (0x010f…8ac4), finalized #1284039 (0x6213…271e), ⬇ 19.1kiB/s ⬆ 1.1kiB/s
start_acala_archive.sh: line 3: 1337295 Killed                  acala --base-path /opt/data/acala --chain=acala --execution wasm --wasm-execution compiled --rpc-cors all --rpc-external --ws-external --pruning archive --no-prometheus -- --chain=polkadot```

@xlc 

@xlc
Copy link
Member

xlc commented Jul 19, 2023

How long does it take to crash? Would you be able to plot a graph of the memory usage since the node starts? Would you be able to reproduce this on a different machine?

@zhy827827
Copy link
Author

zhy827827 commented Jul 19, 2023

启动后,同步30s-1m就触发了oom killl. 我切换到full node 一样出现这种情况

@zhy827827
Copy link
Author

2023-07-19 01:50:59 Acala Node
2023-07-19 01:50:59 ✌️ version 2.19.0-unknown
2023-07-19 01:50:59 ❤️ by Acala Developers, 2019-2023
2023-07-19 01:50:59 📋 Chain specification: Acala
2023-07-19 01:50:59 🏷 Node name: hospitable-action-6119
2023-07-19 01:50:59 👤 Role: FULL
2023-07-19 01:50:59 💾 Database: RocksDb at /opt/rockx/acala1/chains/acala/db/full
2023-07-19 01:50:59 ⛓ Native runtime: acala-2190 (acala-0.tx3.au1)
2023-07-19 01:50:59 Parachain id: Id(2000)
2023-07-19 01:50:59 Is collating: no

@zhy827827
Copy link
Author

2023-07-19 01:55:59 [Relaychain] 💻 Operating system: linux
2023-07-19 01:55:59 [Relaychain] 💻 CPU architecture: x86_64
2023-07-19 01:55:59 [Relaychain] 💻 Target environment: gnu
2023-07-19 01:55:59 [Relaychain] 💻 CPU: Intel(R) Xeon(R) E-2236 CPU @ 3.40GHz
2023-07-19 01:55:59 [Relaychain] 💻 CPU cores: 6
2023-07-19 01:55:59 [Relaychain] 💻 Memory: 64091MB
2023-07-19 01:55:59 [Relaychain] 💻 Kernel: 5.4.0-135-generic
2023-07-19 01:55:59 [Relaychain] 💻 Linux distribution: Ubuntu 20.04.5 LTS
2023-07-19 01:55:59 [Relaychain] 💻 Virtual machine: no
2023-07-19 01:55:59 [Relaychain] 📦 Highest known block at #9851392
2023-07-19 01:55:59 [Relaychain] 〽️ Prometheus exporter started at 127.0.0.1:9616
2023-07-19 01:55:59 [Relaychain] Running JSON-RPC HTTP server: addr=127.0.0.1:9934, allowed origins=["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"]
2023-07-19 01:55:59 [Relaychain] Running JSON-RPC WS server: addr=127.0.0.1:9945, allowed origins=["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"]
2023-07-19 01:55:59 [Relaychain] Starting with an empty approval vote DB.
2023-07-19 01:55:59 [Parachain] 🏷  Local node identity is: 12D3KooWMvocyLGXV2D6bTrsMfLv46yd9NwMAtU4Ka9HjqWfXjiV
2023-07-19 01:55:59 [Parachain] 💻 Operating system: linux
2023-07-19 01:55:59 [Parachain] 💻 CPU architecture: x86_64
2023-07-19 01:55:59 [Parachain] 💻 Target environment: gnu
2023-07-19 01:55:59 [Parachain] 💻 CPU: Intel(R) Xeon(R) E-2236 CPU @ 3.40GHz
2023-07-19 01:55:59 [Parachain] 💻 CPU cores: 6
2023-07-19 01:55:59 [Parachain] 💻 Memory: 64091MB
2023-07-19 01:55:59 [Parachain] 💻 Kernel: 5.4.0-135-generic
2023-07-19 01:55:59 [Parachain] 💻 Linux distribution: Ubuntu 20.04.5 LTS
2023-07-19 01:55:59 [Parachain] 💻 Virtual machine: no
2023-07-19 01:55:59 [Parachain] 📦 Highest known block at #4042252
2023-07-19 01:55:59 [Parachain] Running JSON-RPC HTTP server: addr=0.0.0.0:9933, allowed origins=["*"]
2023-07-19 01:55:59 [Parachain] Running JSON-RPC WS server: addr=0.0.0.0:9944, allowed origins=["*"]
2023-07-19 01:55:59 [Relaychain] discovered: 12D3KooWMvocyLGXV2D6bTrsMfLv46yd9NwMAtU4Ka9HjqWfXjiV /ip4/51.79.229.85/tcp/30333/ws
2023-07-19 01:55:59 [Parachain] discovered: 12D3KooWSkND3gUceMquDEH97oH8LU67PKSRCwwb6jMvrifdw8Ru /ip4/51.79.229.85/tcp/30334/ws
2023-07-19 01:56:00 [Relaychain] 🔍 Discovered new external address for our node: /ip4/51.79.229.85/tcp/30334/ws/p2p/12D3KooWSkND3gUceMquDEH97oH8LU67PKSRCwwb6jMvrifdw8Ru
2023-07-19 01:56:00 [Parachain] 🔍 Discovered new external address for our node: /ip4/51.79.229.85/tcp/30333/ws/p2p/12D3KooWMvocyLGXV2D6bTrsMfLv46yd9NwMAtU4Ka9HjqWfXjiV
2023-07-19 01:56:01 [Parachain] ✨ Imported #4042264 (0x42ee…5dc2)
2023-07-19 01:56:01 [Parachain] ✨ Imported #4042307 (0x7374…1696)
2023-07-19 01:56:04 [Relaychain] ⚙️  Syncing, target=#16461134 (14 peers), best: #9851456 (0xac83…8fa6), finalized #9850368 (0xd905…6b2c), ⬇ 3.4MiB/s ⬆ 76.9kiB/s
2023-07-19 01:56:04 [Parachain] 💤 Idle (8 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 154.7kiB/s ⬆ 13.0kiB/s
2023-07-19 01:56:09 [Relaychain] ⚙️  Syncing 62.0 bps, target=#16461134 (35 peers), best: #9851766 (0x0a53…5437), finalized #9850368 (0xd905…6b2c), ⬇ 5.4MiB/s ⬆ 1014.3kiB/s
2023-07-19 01:56:09 [Parachain] 💤 Idle (10 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 1.6kiB/s ⬆ 1.8kiB/s
2023-07-19 01:56:13 [Parachain] ✨ Imported #4042308 (0x0ee3…f857)
2023-07-19 01:56:14 [Relaychain] ⚙️  Syncing 27.6 bps, target=#16461135 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 277.0kiB/s ⬆ 307.6kiB/s
2023-07-19 01:56:14 [Parachain] 💤 Idle (11 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 27.0kiB/s ⬆ 98.4kiB/s
2023-07-19 01:56:19 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461135 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 29.3kiB/s ⬆ 33.5kiB/s
2023-07-19 01:56:19 [Parachain] 💤 Idle (12 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 1.1kiB/s ⬆ 90.3kiB/s
2023-07-19 01:56:24 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461135 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 36.4kiB/s ⬆ 1.1MiB/s
2023-07-19 01:56:24 [Parachain] 💤 Idle (13 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 1.6kiB/s ⬆ 5.4kiB/s
2023-07-19 01:56:25 [Parachain] ✨ Imported #4042309 (0x6e3d…7125)
2023-07-19 01:56:29 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461135 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 56.6kiB/s ⬆ 887.0kiB/s
2023-07-19 01:56:29 [Parachain] 💤 Idle (13 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 21.2kiB/s ⬆ 41.5kiB/s
2023-07-19 01:56:34 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461135 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 79.9kiB/s ⬆ 71.2kiB/s
2023-07-19 01:56:34 [Parachain] 💤 Idle (13 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 0.7kiB/s ⬆ 39.1kiB/s
2023-07-19 01:56:37 [Parachain] ✨ Imported #4042310 (0x8063…e45c)
2023-07-19 01:56:39 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461139 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 190.3kiB/s ⬆ 161.5kiB/s
2023-07-19 01:56:39 [Parachain] 💤 Idle (13 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 24.3kiB/s ⬆ 6.8kiB/s
2023-07-19 01:56:44 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461140 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 293.9kiB/s ⬆ 250.7kiB/s
2023-07-19 01:56:44 [Parachain] 💤 Idle (13 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 0.3kiB/s ⬆ 0.3kiB/s
2023-07-19 01:56:49 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461140 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 374.0kiB/s ⬆ 276.6kiB/s
2023-07-19 01:56:49 [Parachain] 💤 Idle (13 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 0.7kiB/s ⬆ 0.7kiB/s
2023-07-19 01:56:50 [Parachain] ✨ Imported #4042311 (0x51e0…18a2)
2023-07-19 01:56:54 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461140 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 237.5kiB/s ⬆ 186.4kiB/s
2023-07-19 01:56:54 [Parachain] 💤 Idle (12 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 29.0kiB/s ⬆ 0.7kiB/s
2023-07-19 01:56:59 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461140 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 264.2kiB/s ⬆ 272.6kiB/s
2023-07-19 01:56:59 [Parachain] 💤 Idle (13 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 1.0kiB/s ⬆ 1.7kiB/s
2023-07-19 01:57:01 [Parachain] ✨ Imported #4042312 (0xb381…a50a)
2023-07-19 01:57:04 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461143 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 188.7kiB/s ⬆ 989.1kiB/s
2023-07-19 01:57:04 [Parachain] 💤 Idle (14 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 24.8kiB/s ⬆ 7.0kiB/s
2023-07-19 01:57:09 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461143 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 152.3kiB/s ⬆ 155.6kiB/s
2023-07-19 01:57:09 [Parachain] 💤 Idle (14 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 0.2kiB/s ⬆ 3.1kiB/s
2023-07-19 01:57:13 [Parachain] ✨ Imported #4042313 (0x77a6…0cc5)
2023-07-19 01:57:14 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461144 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 145.1kiB/s ⬆ 150.5kiB/s
2023-07-19 01:57:14 [Parachain] 💤 Idle (14 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 24.0kiB/s ⬆ 8.7kiB/s
2023-07-19 01:57:19 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461145 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 115.1kiB/s ⬆ 118.6kiB/s
2023-07-19 01:57:19 [Parachain] 💤 Idle (14 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 1.1kiB/s ⬆ 3.1kiB/s
2023-07-19 01:57:24 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461145 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 263.3kiB/s ⬆ 197.3kiB/s
2023-07-19 01:57:24 [Parachain] 💤 Idle (14 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 0.5kiB/s ⬆ 0.8kiB/s
2023-07-19 01:57:25 [Parachain] ✨ Imported #4042314 (0x0db7…5ebf)
2023-07-19 01:57:29 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461146 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 138.4kiB/s ⬆ 141.9kiB/s
2023-07-19 01:57:29 [Parachain] 💤 Idle (14 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 27.2kiB/s ⬆ 9.3kiB/s
2023-07-19 01:57:34 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461148 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 142.0kiB/s ⬆ 141.0kiB/s
2023-07-19 01:57:34 [Parachain] 💤 Idle (15 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 1.3kiB/s ⬆ 53.5kiB/s
2023-07-19 01:57:39 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461148 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 124.5kiB/s ⬆ 127.3kiB/s
2023-07-19 01:57:39 [Parachain] 💤 Idle (15 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 0.4kiB/s ⬆ 86.9kiB/s
2023-07-19 01:57:43 [Parachain] ✨ Imported #4042315 (0xe060…3a74)
2023-07-19 01:57:44 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461149 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 154.9kiB/s ⬆ 199.4kiB/s
2023-07-19 01:57:44 [Parachain] 💤 Idle (16 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 26.8kiB/s ⬆ 80.9kiB/s
2023-07-19 01:57:49 [Relaychain] ⚙️  Syncing  0.0 bps, target=#16461150 (40 peers), best: #9851904 (0xeda5…d977), finalized #9850368 (0xd905…6b2c), ⬇ 129.7kiB/s ⬆ 306.1kiB/s
2023-07-19 01:57:49 [Parachain] 💤 Idle (16 peers), best: #4042306 (0x0161…8574), finalized #186239 (0x5877…3c30), ⬇ 0.7kiB/s ⬆ 43.6kiB/s
start_acala.sh: line 4: 1338979 Killed                  acala --base-path /opt//acala1 --chain=acala --execution wasm --rpc-cors all --rpc-external --ws-external --state-pruning 1000 --blocks-pruning 1000 --no-prometheus -- --chain=polkadot --pruning 1000

@xlc
Copy link
Member

xlc commented Jul 19, 2023

What's the OS and kernel version?
Can you try old Acala release such 2.18.0? https://github.com/AcalaNetwork/Acala/releases/tag/2.18.0

@zhy827827
Copy link
Author

Linux distribution: Ubuntu 20.04.5 LTS 5.4.0-135-generi

@xlc
Copy link
Member

xlc commented Jul 19, 2023

I am suspecting this is due to a kernel bug. The version you are using looks like old. Is it possible for you to try to upgrade to a recent version or try on a machine with recent OS?

@zhy827827
Copy link
Author

我现在删除数据,
acala --base-path /opt/rockx/acala1 --chain=acala --database paritydb --execution wasm --rpc-cors all --rpc-external --ws-external --state-pruning 1000 --blocks-pruning 1000 --no-prometheus -- --chain=polkadot

--database paritydb 用这个db试试

@zhy827827
Copy link
Author

I'll try upgrading the kernel

@zhy827827
Copy link
Author

zhy827827 commented Jul 20, 2023

Jul 20 01:58:51 ns5005750 kernel: [69017.831990] Out of memory: Killed process 3204 (acala) total-vm:935370056kB, anon-rss:63319692kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:178232kB oom_score_adj:0
Jul 20 01:58:51 ns5005750 systemd[1]: session-1.scope: A process of this unit has been killed by the OOM killer

start_acala.sh: line 3: 3204 Killed acala --base-path /opt/rockx/acala --chain=acala --execution wasm --rpc-cors all --rpc-external --ws-external --pruning archive --no-prometheus -- --chain=polkadot

system info:ubuntu 22.04TLS ,kennel: 5.15.0-76-generic
acala version 2.18.0 ,also oom-kill

@xlc

@xlc
Copy link
Member

xlc commented Jul 20, 2023

Can you try this https://askubuntu.com/a/1405588

@zhy827827
Copy link
Author

zhy827827 commented Jul 20, 2023

This handling will cause the system to fake death, and then the server cannot be logged in and only be restarted

@xlc
Copy link
Member

xlc commented Jul 20, 2023

I don't really think this is an issue in Acala node. Could you try other chains like Polkadot or other parachains? And let me know if this happens on other machines.

@zhy827827
Copy link
Author

statemint\polkadex\Moonbeam There are no problem with these chains

@zhy827827
Copy link
Author

Why don't others have this problem?

@xlc
Copy link
Member

xlc commented Jul 21, 2023

This is the only report of OOM issue we have received and I cannot reproduce the issue. Can you try to reproduce this on a different machine?

@zhy827827
Copy link
Author

I test on another server

@zhy827827
Copy link
Author

It's really a server problem, I replaced a server that configured the same configuration, without OOM-Kill

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants