3 - CDC Proposal #4

sh-soltanpour · 2023-12-24T23:31:50Z

The proposed "CDC" plugin aims to facilitate a seamless migration process, allowing users to transition from one database instance to another with minimal disruption to production environments.

Signed-off-by: Shahryar Soltanpour <[email protected]>

sh-soltanpour · 2023-12-24T23:33:01Z

Hi @mostafa ,

I have just written an initial version of the proposal for #3. Please let me know if you have any comments.

Thank you!

mostafa

@sh-soltanpour Thanks for the proposal! 👍

I read the article and other resources you linked. I have mixed feelings about using Debezium, considering how complicated the setup would be for the end-users (considering the sheer size of its docs about PostgreSQL) if they choose to deploy it together with GatewayD, but that's nothing that can be solved with a well-written docs and tutorial (for the GatewayD plugin). And at the same time it is a proven piece of software that works the way it should.

On the other hand, we can capture the change log ourselves using available libraries/frameworks (wal2json, wal-g or similar) and manage the entire logical replication in the plugin, which is the method that is not touched upon in your proposal. This method replicates the behavior of Debezium in a simpler way (as in, not relying on JVM and too many dependency libraries, having smaller configuration files and such).

There should be a balance between seamlessness and complexity (and more control over the entire process) between the following methods:

Integration with Debezium using a wrapper plugin. (your chosen approach)
Implementation of logical replication in a plugin. (my suggestion - similar to Debezium Server/Engine - needs to be discussed - see example)
Implementation of logical replication from incoming queries and database responses. (dismissed approach)

I can think of what the second and third approaches can do and how we can implement them, yet I cannot wrap my head around what the wrapper plugin in the first approach does (and entails). Regarding the deployment and distribution of Debezium, one thing I can think of is to use GraalVM native-image to produce a single binary with all the dependencies that can be controlled with the wrapper plugin (a downside of this approach is that the binary size would be very large, and there may be other caveats).

Can you please elaborate on the first approach? And WDYT about my take on it? I'd also be happy to know your opinion about the second approach.

sh-soltanpour · 2023-12-27T18:37:09Z

@mostafa Thanks for your comment!

About the wrapper approach, I was thinking that what our plugins does is to get connected to GatewayD, and configures and automates running Debezium. So basically it gets the necessary information from the user (or GatewayD) about the databases, message brokers, and calls appropriate Debezium functions in the core to monitor changes on WAL. And for the consumer part, we can still have our consumer or using the open source tools. The plugin core part which is to monitor and publish logs is delegated to Debezium but it should be able to be replaced with any other logic in the future if needed.
Besides these, the plugin can add some more stats and report related to this. So the plugin only uses Debezium as something that monitors the WAL and publishes the records to Kafka.

But I completely agree with what you said about it being complex to configure and run it inside our plugin.
The reason that I wrote this approach and did not talk about writing the code by ourselves was that I was avoiding re-inventing the wheel and using Debezium as it also gives us the feature to extend the plugins for various pairs of databases.

Yeah I agree with the second approach, it probably would take some time to test and debug the approach and make sure it's working, but at the end we have a light-weight binary without huge dependencies. What do you think about using a message broker in this approach? Do you think we still need it? I think we should still use something like Kafka to make sure that messages are repeatable and not lost. In the paid version ,we can have our own Kafka and handles these in the background and the user just specifies the source and the target database (similar to Google Cloud).
Let me know if I understand this approach correctly (mostly about the need to have a pub-sub configuration) and I will add this approach to the proposal as well.

To conclude, I think the second approach is quite reasonable as well, and I don't think it should be hard to implement. So if you think that my current take from it is correct, I can add it to the proposal as well.

mostafa · 2023-12-27T21:00:13Z

@sh-soltanpour

Using a message broker/queue is a good way to prevent overload of the systems and to make the solution scalable, whether we use Kafka or others.

I suppose we should create a wrapper plugin either way (by using either Debezium or wal-something), so we'd better create a standard interface to integrate with both (if possible). I am thinking that either way we need to somehow configure and run a binary (or a set of binaries) to read the config and act on them, so the wrapper should be a runner that runs a workflow, just like a CI runner, something like taskctl. We can provide a set of predefined tasks that can be used to run the commands needed. Another way is to implement this ticket in GatewayD, so it can run a container as a plugin, so we can embed everything else inside it (Debezium, wal-sth, etc.) and then configure and run them as the plugin. (we can also do it another way) WDYT?

Please add the second approach to the proposal, so we can work on it.

sh-soltanpour · 2023-12-28T03:02:02Z

I have just added a third approach to the proposal with the steps.

About running it as a plugin and in a container, yeah I think we can attach it as a plugin to GatewayD, and whenever the user sends a signals (not sure we have something for signals implemented in GatewayD), the GatewayD puts the plugin in the circuit and the plugins goes through the steps as specified in the proposal.

If we want to run it as a taskctl, we still need to be able to get connected to GatewayD and maybe reject or hold some queries (as specified in step 5 of the new approach).

The plugin should also be able to send a signal to GatewayD to put it the source database in readonly mode or even deprecate it and use the target database.

So I'd say it's important to be able to manipulate GatewayD configuration as well, which probably is more suitable to be done by a specified protocol and by plugins.

Would be happy to know your opinion on these.

mostafa

I agree with your points and there needs to be some changes in GatewayD to handle the signaling and changing of the database. Currently there is a one-way street for calling the plugin hooks and the returned results can contain a single "signal", called terminate, which is returned by the traffic hooks. We can extended this to support more signals, so as to switch databases and take other interesting actions.

001-cdc/001---cdc.md

sh-soltanpour · 2023-12-28T19:59:59Z

Yeah that sounds good to me.
I think we can start with implementing an initial version of the third approach using the mentioned tools, and then probably extend it to be compatible with other tools as well. What do you think?

mostafa · 2023-12-28T20:48:13Z

@sh-soltanpour Sound good! Let's do it!

github-actions · 2023-12-28T22:39:57Z

⚠️ This PR contains unsigned commits. To get your PR merged, please sign those commits (git commit -S --amend --no-edit) and force push them to this branch (git push --force-with-lease).

If you're new to commit signing, there are different ways to set it up:

Sign commits with gpg

Follow the steps below to set up commit signing with gpg:

Sign commits with ssh-agent

Follow the steps below to set up commit signing with ssh-agent:

Sign commits with 1Password

You can also sign commits using 1Password, which lets you sign commits with biometrics without the signing key leaving the local 1Password process.

Learn how to use 1Password to sign your commits.

mostafa · 2023-12-28T22:42:22Z

@sh-soltanpour Sign your commits and I'll merge this.

github-actions · 2023-12-28T22:44:08Z

⚠️ This PR contains unsigned commits. To get your PR merged, please sign those commits (git commit -S --amend --no-edit) and force push them to this branch (git push --force-with-lease).

If you're new to commit signing, there are different ways to set it up:

Sign commits with gpg

Follow the steps below to set up commit signing with gpg:

Sign commits with ssh-agent

Follow the steps below to set up commit signing with ssh-agent:

Sign commits with 1Password

You can also sign commits using 1Password, which lets you sign commits with biometrics without the signing key leaving the local 1Password process.

Learn how to use 1Password to sign your commits.

Signed-off-by: Shahryar Soltanpour <[email protected]> Signed-off-by: Shahryar Soltanpour <[email protected]>

Signed-off-by: Shahryar Soltanpour <[email protected]>

sh-soltanpour · 2023-12-28T22:47:16Z

@mostafa done!

mostafa

LGTM! 🚀

sh-soltanpour added 2 commits December 24, 2023 16:29

Add the initial version of CDC proposal

56dd856

Signed-off-by: Shahryar Soltanpour <[email protected]>

Update identation and add more links

c56be51

Signed-off-by: Shahryar Soltanpour <[email protected]>

sh-soltanpour requested a review from mostafa December 24, 2023 23:32

sh-soltanpour linked an issue Dec 24, 2023 that may be closed by this pull request

Keep two database instances sync #3

Closed

sh-soltanpour added 2 commits December 24, 2023 17:01

Polish the proposal and add image

c5d5c86

Fix the referece link

b09953e

mostafa reviewed Dec 26, 2023

View reviewed changes

This comment was marked as resolved.

Sign in to view

sh-soltanpour force-pushed the 3-keep-instance-sync branch from 2427573 to 323a8aa Compare December 28, 2023 03:03

This comment was marked as resolved.

Sign in to view

sh-soltanpour force-pushed the 3-keep-instance-sync branch from 323a8aa to 72403ad Compare December 28, 2023 03:05

mostafa reviewed Dec 28, 2023

View reviewed changes

001-cdc/001---cdc.md Outdated Show resolved Hide resolved

001-cdc/001---cdc.md Outdated Show resolved Hide resolved

001-cdc/001---cdc.md Outdated Show resolved Hide resolved

001-cdc/001---cdc.md Outdated Show resolved Hide resolved

001-cdc/001---cdc.md Outdated Show resolved Hide resolved

sh-soltanpour force-pushed the 3-keep-instance-sync branch from 105db96 to 499b6df Compare December 28, 2023 22:39

sh-soltanpour added 3 commits December 28, 2023 15:46

Add new approach

c5d7f84

Signed-off-by: Shahryar Soltanpour <[email protected]> Signed-off-by: Shahryar Soltanpour <[email protected]>

Add selected approach

dce4407

Signed-off-by: Shahryar Soltanpour <[email protected]> Signed-off-by: Shahryar Soltanpour <[email protected]>

Apply suggestions

dfb884b

Signed-off-by: Shahryar Soltanpour <[email protected]>

sh-soltanpour force-pushed the 3-keep-instance-sync branch from b2b28db to dfb884b Compare December 28, 2023 22:46

sh-soltanpour requested a review from mostafa December 28, 2023 22:57

mostafa approved these changes Dec 29, 2023

View reviewed changes

mostafa merged commit c3f5a5b into main Dec 29, 2023
1 check passed

mostafa deleted the 3-keep-instance-sync branch December 29, 2023 00:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3 - CDC Proposal #4

3 - CDC Proposal #4

sh-soltanpour commented Dec 24, 2023

sh-soltanpour commented Dec 24, 2023

mostafa left a comment •

edited

Loading

sh-soltanpour commented Dec 27, 2023

mostafa commented Dec 27, 2023 •

edited

Loading

sh-soltanpour commented Dec 28, 2023

This comment was marked as resolved.

This comment was marked as resolved.

mostafa left a comment

sh-soltanpour commented Dec 28, 2023

mostafa commented Dec 28, 2023

github-actions bot commented Dec 28, 2023

mostafa commented Dec 28, 2023

github-actions bot commented Dec 28, 2023

sh-soltanpour commented Dec 28, 2023

mostafa left a comment

3 - CDC Proposal #4

3 - CDC Proposal #4

Conversation

sh-soltanpour commented Dec 24, 2023

sh-soltanpour commented Dec 24, 2023

mostafa left a comment • edited Loading

Choose a reason for hiding this comment

sh-soltanpour commented Dec 27, 2023

mostafa commented Dec 27, 2023 • edited Loading

sh-soltanpour commented Dec 28, 2023

This comment was marked as resolved.

This comment was marked as resolved.

mostafa left a comment

Choose a reason for hiding this comment

sh-soltanpour commented Dec 28, 2023

mostafa commented Dec 28, 2023

github-actions bot commented Dec 28, 2023

mostafa commented Dec 28, 2023

github-actions bot commented Dec 28, 2023

sh-soltanpour commented Dec 28, 2023

mostafa left a comment

Choose a reason for hiding this comment

mostafa left a comment •

edited

Loading

mostafa commented Dec 27, 2023 •

edited

Loading