-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3 - CDC Proposal #4
Conversation
Signed-off-by: Shahryar Soltanpour <[email protected]>
Signed-off-by: Shahryar Soltanpour <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sh-soltanpour Thanks for the proposal! 👍
I read the article and other resources you linked. I have mixed feelings about using Debezium, considering how complicated the setup would be for the end-users (considering the sheer size of its docs about PostgreSQL) if they choose to deploy it together with GatewayD, but that's nothing that can be solved with a well-written docs and tutorial (for the GatewayD plugin). And at the same time it is a proven piece of software that works the way it should.
On the other hand, we can capture the change log ourselves using available libraries/frameworks (wal2json, wal-g or similar) and manage the entire logical replication in the plugin, which is the method that is not touched upon in your proposal. This method replicates the behavior of Debezium in a simpler way (as in, not relying on JVM and too many dependency libraries, having smaller configuration files and such).
There should be a balance between seamlessness and complexity (and more control over the entire process) between the following methods:
- Integration with Debezium using a wrapper plugin. (your chosen approach)
- Implementation of logical replication in a plugin. (my suggestion - similar to Debezium Server/Engine - needs to be discussed - see example)
- Implementation of logical replication from incoming queries and database responses. (dismissed approach)
I can think of what the second and third approaches can do and how we can implement them, yet I cannot wrap my head around what the wrapper plugin in the first approach does (and entails). Regarding the deployment and distribution of Debezium, one thing I can think of is to use GraalVM native-image to produce a single binary with all the dependencies that can be controlled with the wrapper plugin (a downside of this approach is that the binary size would be very large, and there may be other caveats).
Can you please elaborate on the first approach? And WDYT about my take on it? I'd also be happy to know your opinion about the second approach.
@mostafa Thanks for your comment! About the wrapper approach, I was thinking that what our plugins does is to get connected to GatewayD, and configures and automates running Debezium. So basically it gets the necessary information from the user (or GatewayD) about the databases, message brokers, and calls appropriate Debezium functions in the core to monitor changes on WAL. And for the consumer part, we can still have our consumer or using the open source tools. The plugin core part which is to monitor and publish logs is delegated to Debezium but it should be able to be replaced with any other logic in the future if needed. But I completely agree with what you said about it being complex to configure and run it inside our plugin. Yeah I agree with the second approach, it probably would take some time to test and debug the approach and make sure it's working, but at the end we have a light-weight binary without huge dependencies. What do you think about using a message broker in this approach? Do you think we still need it? I think we should still use something like Kafka to make sure that messages are repeatable and not lost. In the paid version ,we can have our own Kafka and handles these in the background and the user just specifies the source and the target database (similar to Google Cloud). To conclude, I think the second approach is quite reasonable as well, and I don't think it should be hard to implement. So if you think that my current take from it is correct, I can add it to the proposal as well. |
Using a message broker/queue is a good way to prevent overload of the systems and to make the solution scalable, whether we use Kafka or others. I suppose we should create a wrapper plugin either way (by using either Debezium or wal-something), so we'd better create a standard interface to integrate with both (if possible). I am thinking that either way we need to somehow configure and run a binary (or a set of binaries) to read the config and act on them, so the wrapper should be a runner that runs a workflow, just like a CI runner, something like taskctl. We can provide a set of predefined tasks that can be used to run the commands needed. Another way is to implement this ticket in GatewayD, so it can run a container as a plugin, so we can embed everything else inside it (Debezium, wal-sth, etc.) and then configure and run them as the plugin. (we can also do it another way) WDYT? Please add the second approach to the proposal, so we can work on it. |
I have just added a third approach to the proposal with the steps. About running it as a plugin and in a container, yeah I think we can attach it as a plugin to GatewayD, and whenever the user sends a signals (not sure we have something for signals implemented in GatewayD), the GatewayD puts the plugin in the circuit and the plugins goes through the steps as specified in the proposal. If we want to run it as a taskctl, we still need to be able to get connected to GatewayD and maybe reject or hold some queries (as specified in step 5 of the new approach). The plugin should also be able to send a signal to GatewayD to put it the source database in readonly mode or even deprecate it and use the target database. So I'd say it's important to be able to manipulate GatewayD configuration as well, which probably is more suitable to be done by a specified protocol and by plugins. Would be happy to know your opinion on these. |
This comment was marked as resolved.
This comment was marked as resolved.
2427573
to
323a8aa
Compare
This comment was marked as resolved.
This comment was marked as resolved.
323a8aa
to
72403ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your points and there needs to be some changes in GatewayD to handle the signaling and changing of the database. Currently there is a one-way street for calling the plugin hooks and the returned results can contain a single "signal", called terminate
, which is returned by the traffic hooks. We can extended this to support more signals, so as to switch databases and take other interesting actions.
Yeah that sounds good to me. |
@sh-soltanpour Sound good! Let's do it! |
105db96
to
499b6df
Compare
If you're new to commit signing, there are different ways to set it up: Sign commits with
|
@sh-soltanpour Sign your commits and I'll merge this. |
If you're new to commit signing, there are different ways to set it up: Sign commits with
|
Signed-off-by: Shahryar Soltanpour <[email protected]> Signed-off-by: Shahryar Soltanpour <[email protected]>
Signed-off-by: Shahryar Soltanpour <[email protected]> Signed-off-by: Shahryar Soltanpour <[email protected]>
Signed-off-by: Shahryar Soltanpour <[email protected]>
b2b28db
to
dfb884b
Compare
@mostafa done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
The proposed "CDC" plugin aims to facilitate a seamless migration process, allowing users to transition from one database instance to another with minimal disruption to production environments.