Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the last replayed LSN as the value of LSN location if the last received LSN is at the starting point of the WAL segment #227

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gavinThinking
Copy link

@gavinThinking gavinThinking commented Apr 7, 2024

Problem Statement

Issue: #228

Root cause analysis

If the standby instance remains without a primary instance to synchronize with, then the value of pg_last_wal_receive_lsn() will always be the initial value, which is the starting point of the last segment in the local pg_wal folder.

Solution

In most cases, the pg_last_wal_receive_lsn() can accurately retrieve the last received LSN location.

We choose pg_last_wal_receive_lsn() because a standby can lag replaying WAL based eg. on its read only activity. That means the standby that received more data from the primary than the others might have replayed less of them during the monitor or promote action.

Therefore, we still use this pg_last_wal_receive_lsn() of obtaining the LSN location in most situations.
In the scenario where the entire cluster has just restarted:
If the last three bytes (or six hexadecimal digits) of the last received LSN are zeros, indicating that the LSN is the starting point of the last WAL segment in the local pg_wal folder, then the current LSN is not accurate.
In this case, we query the last replayed LSN and compare it with the last received LSN. If the value of the last replay LSN is greater than the last received LSN, we use the last replay LSN as the LSN location.
Note: The scope of the changes only involves scenarios where the cluster is restarting.

When the standby is restarted, it must replay the transaction log to bring the database tables back to their correct state.
So in this scenario, the last replayed LSN is accurate.

pg_is_in_recovery() pg_is_wal_replay_paused() pg_last_wal_receive_lsn() pg_last_wal_replay_lsn()
t f 1/86000000 1/862B9CC0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant