Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: replica's prio is decreased on every replication flack #512

Open
Serpentian opened this issue Feb 13, 2025 · 1 comment
Open

router: replica's prio is decreased on every replication flack #512

Serpentian opened this issue Feb 13, 2025 · 1 comment
Assignees
Labels
bug Something isn't working router

Comments

@Serpentian
Copy link
Contributor

Serpentian commented Feb 13, 2025

Currently router's failover checks replication state:

vshard/vshard/router/init.lua

Lines 1233 to 1240 in 85f9b66

local status = upstream.status
if not status or status == 'stopped' or status == 'disconnected' then
-- All other states mean either that everything is ok ('follow')
-- or that replica is connecting. In all these cases replica
-- is considered healthy.
local msg = string.format('Upstream to master has status "%s"', status)
return consts.STATUS.RED, msg
end

It doesn't pay attention to idle and lag if upstream is broken. This is not correct, since replica's priority will be changed on every disconnect and we should decrease replica's prio only after some timeout is passed. For that it's proposed to introduce failover_replica_lag_limit, which will check either idle or lag, depending on upstream.status. Initial proposal:

follow, sync => lag
all others => idle

@Serpentian Serpentian self-assigned this Feb 13, 2025
@Serpentian
Copy link
Contributor Author

Needed for #505

@Serpentian Serpentian added bug Something isn't working router labels Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working router
Projects
None yet
Development

No branches or pull requests

1 participant