Realtime subscribe and reliable Change Data Capture #311

sampok · 2022-04-09T13:49:19Z

sampok
Apr 9, 2022

Being able to reliably sync every PostgreSQL table insert, update, delete into another system (another datastore, Elasticsearch, etc) is a useful pattern for many services. Apparently the most reliable approach currently is to use Debezium and a Kafka cluster, which is great, but an overkill for some services. It's not that hard to develop a custom sync using a logical replication slot, wal2json, and sql queries with functions pg_logical_slot_peek_changes & pg_replication_slot_advance. However, large transactions will cause out of memory errors unless using the right wal2json settings, and cursor-based consumption of slot changes. Lsn advance logic is also complicated unless you really learn all concepts (such as what happens if you advance to a lsn in the middle of a transaction). And processing in batches is not as efficient as streaming.

The postgres library now supports subscribe, which seems great as it produces a stream of change events. However for a reliable sync process it seems it'd need a way to manually acknowledge processed entries. This is to allow the consumer to crash/restart safely without missing any updates. Now it seems great for realtime updates if you can afford to miss some. Is this a correct understanding of the current functionality?

Would it be feasible to improve the realtime subscribe to support reliable processing? Or does the subscription protocol effectively make this hard to accomplish?

porsager · 2022-04-10T07:26:53Z

porsager
Apr 10, 2022
Maintainer

@sampok That is really interesting!

You are correct that my idea for the use cases now is just having change events, and not worrying if some are missed.

I definitely think we could explore expanding the current functionality to allow defining names of permanent replication slots to connect to and the LSN. I think the code in subscribe.js is fairly straight forward, and you should be able to take that almost verbatim into your own setup to explore defining a permanent replication slot and LSN. I don't have the time to dig into that right now, but I would love to answer any questions you might have to help you along the way.

1 reply

sampok Apr 12, 2022
Author

Thanks @porsager!

I have just completed a working, tested sync with logical replication sql functions and wal2json. It's not the most efficient, but it works, so I don't have an immediate need to rewrite it (unless some issues come up). However, if I have time, I could peek into it!

The first thing for me (or anyone else taking a stab at this) would be to understand how the server – client acks process and LSNs are supposed to work. I haven't found much documentation on it in otherwise awesome Postgres docs. The server is expecting only 'replay progress messages' that can define 3 different last WAL byte locations. Is the second one (flushed to disk) the one that advances the replication slot LSN on the server? Another thing that would be good to understand is what exactly LSNs mean. If I remember right, sometimes a DML message and transaction end might share the same LSN. And can LSN be advanced only to the beginning/end of each transaction?

Do you know of any good docs on the acks & LSN advacement?

porsager · 2022-04-20T09:20:11Z

porsager
Apr 20, 2022
Maintainer

Hey @sampok ... Sorry for the late response.

Yes, the documentation for logical replication and the protocol is a bit hidden - don't know if you found this and its subsections

The LSN (Log Sequence Number) is a number representing a byte position in the WAL (Write Ahead Log)

The LSNs supplied in the Logical Replication Protocol will represent various boundaries in the WAL like BEGIN, COMMIT etc. The one Postgres.js uses is the last consistent_point returned when starting the temporary replication slot. After that it will reply on pong with the latest BEGIN lsn received.

I hope the docs pointers above can help you getting further :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realtime subscribe and reliable Change Data Capture #311

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Realtime subscribe and reliable Change Data Capture #311

sampok Apr 9, 2022

Replies: 2 comments · 1 reply

porsager Apr 10, 2022 Maintainer

sampok Apr 12, 2022 Author

porsager Apr 20, 2022 Maintainer

sampok
Apr 9, 2022

Replies: 2 comments 1 reply

porsager
Apr 10, 2022
Maintainer

sampok Apr 12, 2022
Author

porsager
Apr 20, 2022
Maintainer