Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified interesting event of RDMA socket poll #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vaalgit
Copy link

@vaalgit vaalgit commented Aug 26, 2024

In a High Availability (HA) environment with RDMA connections, after forcibly powering off one of the nodes, fio is unable to continue performing I/O operations.

Description:

Environment Details:

We are using Pacemaker to set up a High Availability (HA) environment with properly arranged resource startup orders. The systems are configured as follows:

  • Storage Nodes:
    • beegfs-oss1 with RHEL 9p1, kernel 5.14.0-162.6.1.el9_1.x86_64 (IP: 10.10.2.231 for graidstor1, floating VIP 10.10.2.233 for graidstor2)
    • beegfs-oss2 with RHEL 9p1, kernel 5.14.0-162.6.1.el9_1.x86_64
  • Client Node:
    • rocky9p1 with Rocky Linux 9p1, kernel 5.14.0-162.23.1.el9_1.x86_64
  • Management/Meta Node:
    • beegfs-mgt with RHEL 9p1, kernel 5.14.0-162.6.1.el9_1.x86_64

Issue Reproduction Steps:

  1. graidstor1 (10.10.2.231) is running on beegfs-oss1.

  2. graidstor2 (10.10.2.233) is running on beegfs-oss2.

  3. On the client node, run the following fio command:

    fio --filename=/mnt/beegfs/123 --direct=1 --rw=randwrite --bs=1M --ioengine=libaio --iodepth=16 --runtime=300 --numjobs=1 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --size=40G
  4. On beegfs-oss2, execute: echo b > /proc/sysrq-trigger to simulate Power Failure.

  5. At this point, the fio process on the client temporarily becomes unresponsive

  6. graidstor2 (10.10.2.233) is migrated by Pacemaker to beegfs-oss1 and activated there.

  7. However, even after the migration, fio on the client remains unresponsive.

Debug Information:

  • After enabling BEEGFS_DEBUG and checking the dmesg log, it appears that the process enters an infinite loop in IBVSocket_poll and cannot exit. The relevant dmesg log and code snippet are provided below.

  • /opt/beegfs/src/client/client_module_7/source/common/toolkit/SocketTk.c

    // for each sock: ask for available data and register waitqueue
    list_for_each_entry(socket, &state->list, poll._list)
    {
       if(Socket_getSockType(socket) == NICADDRTYPE_RDMA)
       { // RDMA socket
          struct RDMASocket* currentRDMASock = (RDMASocket*)socket;
          bool finishPoll = (numSocksWithREvents || !__timeout);
    
          unsigned long mask = RDMASocket_poll(
             currentRDMASock, socket->poll._events, finishPoll);
    
          if(mask) // HERE !!
          { // interesting event occurred
             socket->poll.revents = mask; // save event mask as revents
             numSocksWithREvents++;
          }
       }
    // Modified
    unsigned long mask = RDMASocket_poll(
         currentRDMASock, socket->poll._events, finishPoll);
    
    if(mask & socket->poll._events) // here !!
    { // interesting event occurred
       socket->poll.revents = mask; // save event mask as revents
       numSocksWithRE
         
     .....
     

    We discovered that when handling RDMA socket polling, the "interesting event occurred" behavior is not the same as with standard sockets. RDMA sockets will consider all poll events as interesting. This seems to cause the pollerr state to continue looping after a migration, preventing the socket from reconnecting to the new node.

    To address this, we tested modifying the conditions for interesting events to be more similar to those for standard sockets. However, we kept only the poll._events as interesting, allowing pollerr to exit the infinite loop and re-establish the connection.

    We would like to ask if you have any better suggestions or alternative modifications for this issue?

  • infinite loop issue, dmesg log

    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:284: min=00000000da47691e cur=00000000da47691e skipped=0 cur.avail=1 cur.established=1 cur.used=179898653418364
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:293: return 00000000da47691e
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:284: min=00000000f2aa895a cur=00000000f2aa895a skipped=0 cur.avail=1 cur.established=1 cur.used=179898661989217
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:293: return 00000000f2aa895a
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:284: min=00000000da47691e cur=00000000da47691e skipped=0 cur.avail=1 cur.established=1 cur.used=179899276147069
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:293: return 00000000da47691e
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:284: min=00000000f2aa895a cur=00000000f2aa895a skipped=0 cur.avail=1 cur.established=1 cur.used=179899283893049
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:293: return 00000000f2aa895a
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:284: min=00000000da47691e cur=00000000da47691e skipped=0 cur.avail=1 cur.established=1 cur.used=179899334450783
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:293: return 00000000da47691e
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:284: min=00000000f2aa895a cur=00000000f2aa895a skipped=0 cur.avail=1 cur.established=1 cur.used=179899343116056
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:293: return 00000000f2aa895a
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:284: min=00000000da47691e cur=00000000da47691e skipped=0 cur.avail=1 cur.established=1 cur.used=179899397114198
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:293: return 00000000da47691e
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:284: min=00000000f2aa895a cur=00000000f2aa895a skipped=0 cur.avail=1 cur.established=1 cur.used=179899405673051
    Aug 22 16:00:34 rocky9p1 kernel: beegfs: fio(37074): NodeConnPool_rdmaNicPriority:293: return 00000000f2aa895a
    Aug 22 16:00:42 rocky9p1 kernel: beegfs: kworker/u32:2(26302): __IBVSocket_cmaHandler:1742: rdma event: 7, status: -110
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
    Aug 22 16:00:49 rocky9p1 kernel: beegfs: fio(37074): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
  • infinite loop issue, beegfs-client.log

(4) Aug22 16:00:32 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:33 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 8672772096; size: 1048576)
(4) Aug22 16:00:33 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 8672772096; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:33 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 24818745344; size: 1048576)
(4) Aug22 16:00:33 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 24818745344; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:33 *beegfs_XNodeSyn(37048) [Update states and mirror groups] >> Metadata node states synced.
(4) Aug22 16:00:33 *beegfs_XNodeSyn(37048) [Update states and mirror groups] >> Storage target states synced.
(4) Aug22 16:00:33 *beegfs_DGramLis(37047) [RefreshTargetStatesMsg incoming] >> Received a RefreshTargetStatesMsg from: 10.10.2.123
(4) Aug22 16:00:33 *beegfs_DGramLis(37047) [RefreshTargetStatesMsg incoming] >> Received a RefreshTargetStatesMsg from: 10.10.2.123
(4) Aug22 16:00:33 *beegfs_DGramLis(37047) [RefreshTargetStatesMsg incoming] >> Received a RefreshTargetStatesMsg from: 10.10.2.123
(5) Aug22 16:00:33 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 16885219328; size: 1048576)
(4) Aug22 16:00:33 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 16885219328; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:33 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 1317011456; size: 1048576)
(4) Aug22 16:00:33 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 1317011456; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:33 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 23117955072; size: 1048576)
(4) Aug22 16:00:33 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 23117955072; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:33 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 11723079680; size: 1048576)
(4) Aug22 16:00:33 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 11723079680; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:33 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:34 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 31132221440; size: 1048576)
(4) Aug22 16:00:34 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 31132221440; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:34 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:34 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:34 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 9007267840; size: 1048576)
(4) Aug22 16:00:34 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 9007267840; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:34 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:34 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:34 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 37290508288; size: 1048576)
(4) Aug22 16:00:34 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 37290508288; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:34 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:34 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(5) Aug22 16:00:34 *fio(37074) [write_common] >> called. Path: /123; EntryID: 0-66C6F013-3; (offset: %lld; size: %lld)(offset: 31635537920; size: 1048576)
(4) Aug22 16:00:34 *fio(37074) [FhgfsOpsRemoting_writefileVec] >> size: 1048576; offset: 31635537920; fileHandleID: E9EC1E1#0-66C6F013-3
(4) Aug22 16:00:34 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:34 *fio(37074) [NodeConn (acquire stream)] >> Preferred IP addr is 0xda020a0a
(4) Aug22 16:00:38 *beegfs_XNodeSyn(37048) [Update states and mirror groups] >> Metadata node states synced.
(4) Aug22 16:00:38 *beegfs_XNodeSyn(37048) [Update states and mirror groups] >> Storage target states synced.
(4) Aug22 16:00:48 *beegfs_DGramLis(37047) [RefreshTargetStatesMsg incoming] >> Received a RefreshTargetStatesMsg from: 10.10.2.123
(4) Aug22 16:00:48 *beegfs_XNodeSyn(37048) [Update states and mirror groups] >> Metadata node states synced.
(4) Aug22 16:00:48 *beegfs_XNodeSyn(37048) [Update states and mirror groups] >> Storage target states synced.
(4) Aug22 16:00:53 *beegfs_DGramLis(37047) [RefreshTargetStatesMsg incoming] >> Received a RefreshTargetStatesMsg from: 10.10.2
  • Modified interesting event , dmesg log
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 00000000de7cbf70
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=0000000092b45c33 cur=0000000092b45c33 skipped=0 cur.avail=1 cur.established=1 cur.used=183201055856107
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 0000000092b45c33
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=00000000de7cbf70 cur=00000000de7cbf70 skipped=0 cur.avail=1 cur.established=1 cur.used=183201092765881
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 00000000de7cbf70
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=0000000092b45c33 cur=0000000092b45c33 skipped=0 cur.avail=1 cur.established=1 cur.used=183201102361240
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 0000000092b45c33
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=00000000de7cbf70 cur=00000000de7cbf70 skipped=0 cur.avail=1 cur.established=1 cur.used=183201139951501
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 00000000de7cbf70
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=0000000092b45c33 cur=0000000092b45c33 skipped=0 cur.avail=1 cur.established=1 cur.used=183201149832528
[Thu Aug 22 16:55:36 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 0000000092b45c33
[Thu Aug 22 16:55:43 2024] beegfs: kworker/u32:0(36803): __IBVSocket_cmaHandler:1742: rdma event: 7, status: -110
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): IBVSocket_poll:1612: called for an erroneous connection. ErrCode: -1
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): __IBVSocket_cleanupCommContext:813: Free CommContext @ 000000003545b741
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=0000000092b45c33 cur=0000000092b45c33 skipped=0 cur.avail=0 cur.established=0 cur.used=183201198856308
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 0000000092b45c33
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): IBVSocket_setTimeouts:574: connectMS=5000 completionMS=300000 flowSendMS=180000 flowRecvMS=180000 pollMS=10000
[Thu Aug 22 16:55:51 2024] beegfs: kworker/u32:0(36803): __IBVSocket_cmaHandler:1742: rdma event: 0, status: 0
[Thu Aug 22 16:55:51 2024] beegfs: kworker/u32:0(36803): __IBVSocket_cmaHandler:1742: rdma event: 2, status: 0
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): __IBVSocket_createCommContext:610: Alloc CommContext @ 0000000029873012
[Thu Aug 22 16:55:51 2024] beegfs: enabling unsafe global rkey
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): __IBVSocket_createCommContext:671: __IBVSocket_createCommContext: checkConnRkey = 132642
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): __IBVSocket_initCommDest:890: __IBVSocket_initCommDest: rkey=132642 vaddr=5393383016
[Thu Aug 22 16:55:51 2024] beegfs: kworker/9:2(26034): __IBVSocket_cmaHandler:1742: rdma event: 9, status: 0
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=00000000de7cbf70 cur=00000000de7cbf70 skipped=0 cur.avail=1 cur.established=1 cur.used=183201189215859
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 00000000de7cbf70
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=0000000092b45c33 cur=0000000092b45c33 skipped=0 cur.avail=1 cur.established=1 cur.used=183216667024713
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 0000000092b45c33
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=00000000de7cbf70 cur=00000000de7cbf70 skipped=0 cur.avail=1 cur.established=1 cur.used=183216768237191
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 00000000de7cbf70
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=0000000092b45c33 cur=0000000092b45c33 skipped=0 cur.avail=1 cur.established=1 cur.used=183216775048148
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:293: return 0000000092b45c33
[Thu Aug 22 16:55:51 2024] beegfs: fio(41774): NodeConnPool_rdmaNicPriority:284: min=00000000de7cbf70 cur=00000000de7cbf70 skipped=0 cur.avail=1 cur.established=1 cur.used=183216812234402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant