Replies: 2 comments 1 reply
-
@sampok That is really interesting! You are correct that my idea for the use cases now is just having change events, and not worrying if some are missed. I definitely think we could explore expanding the current functionality to allow defining names of permanent replication slots to connect to and the LSN. I think the code in subscribe.js is fairly straight forward, and you should be able to take that almost verbatim into your own setup to explore defining a permanent replication slot and LSN. I don't have the time to dig into that right now, but I would love to answer any questions you might have to help you along the way. |
Beta Was this translation helpful? Give feedback.
-
Hey @sampok ... Sorry for the late response. Yes, the documentation for logical replication and the protocol is a bit hidden - don't know if you found this and its subsections The LSN (Log Sequence Number) is a number representing a byte position in the WAL (Write Ahead Log) The LSNs supplied in the Logical Replication Protocol will represent various boundaries in the WAL like BEGIN, COMMIT etc. The one Postgres.js uses is the last I hope the docs pointers above can help you getting further :) |
Beta Was this translation helpful? Give feedback.
-
Being able to reliably sync every PostgreSQL table insert, update, delete into another system (another datastore, Elasticsearch, etc) is a useful pattern for many services. Apparently the most reliable approach currently is to use Debezium and a Kafka cluster, which is great, but an overkill for some services. It's not that hard to develop a custom sync using a logical replication slot, wal2json, and sql queries with functions pg_logical_slot_peek_changes & pg_replication_slot_advance. However, large transactions will cause out of memory errors unless using the right wal2json settings, and cursor-based consumption of slot changes. Lsn advance logic is also complicated unless you really learn all concepts (such as what happens if you advance to a lsn in the middle of a transaction). And processing in batches is not as efficient as streaming.
The postgres library now supports subscribe, which seems great as it produces a stream of change events. However for a reliable sync process it seems it'd need a way to manually acknowledge processed entries. This is to allow the consumer to crash/restart safely without missing any updates. Now it seems great for realtime updates if you can afford to miss some. Is this a correct understanding of the current functionality?
Would it be feasible to improve the realtime subscribe to support reliable processing? Or does the subscription protocol effectively make this hard to accomplish?
Beta Was this translation helpful? Give feedback.
All reactions