router: replica's prio is decreased on every replication flack #512

Serpentian · 2025-02-13T11:25:14Z

Currently router's failover checks replication state:

Lines 1233 to 1240 in 85f9b66

    
           local status = upstream.status 
        
           if not status or status == 'stopped' or status == 'disconnected' then 
        
               -- All other states mean either that everything is ok ('follow') 
        
               -- or that replica is connecting. In all these cases replica 
        
               -- is considered healthy. 
        
               local msg = string.format('Upstream to master has status "%s"', status) 
        
               return consts.STATUS.RED, msg 
        
           end

It doesn't pay attention to idle and lag if upstream is broken. This is not correct, since replica's priority will be changed on every disconnect and we should decrease replica's prio only after some timeout is passed. For that it's proposed to introduce failover_replica_lag_limit, which will check either idle or lag, depending on upstream.status. Initial proposal:

follow, sync => lag
all others => idle

The text was updated successfully, but these errors were encountered:

Serpentian · 2025-02-13T11:26:20Z

Needed for #505

Serpentian self-assigned this Feb 13, 2025

Serpentian added bug Something isn't working router labels Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

router: replica's prio is decreased on every replication flack #512

router: replica's prio is decreased on every replication flack #512

Serpentian commented Feb 13, 2025 •

edited

Loading

Serpentian commented Feb 13, 2025

router: replica's prio is decreased on every replication flack #512

router: replica's prio is decreased on every replication flack #512

Comments

Serpentian commented Feb 13, 2025 • edited Loading

Serpentian commented Feb 13, 2025

Serpentian commented Feb 13, 2025 •

edited

Loading