-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Kafka as a buffer #3322
Comments
@JonahCalvo , This is an exciting feature. Thanks for putting this proposal together. What is the |
@JonahCalvo, We can also simplify the buffer name. Use |
This feature is completed by quite a few PRs. |
Hi everyone, Can somebody give us a working sample configuration of Kafka Buffer to test it? Because I tried with the example in the first post, but is not working, :-( From my tests and reading the logs:
Thanks for your help! |
Answering myself, the correct way to config the Kafka Buffer is this one:
Regards! |
Use-case
Currently, the only buffer available with Data Prepper is the
bounding_blocking
buffer, which stores events in memory. This can lead to data loss if a pipeline crashes, or the buffer overflows. A disk based buffer is required to prevent this data loss.This proposal is to implement a Kafka buffer. Kafka offers robust buffering capabilities by persistently storing data on disk across multiple nodes, ensuring high availability and fault tolerance.
Basic Configuration
The buffer will:
Sample configuration
The configuration will be similar to that of the Kafka source and sink. Notably, only one topic will be provided, and
serde_format
will not be configurable, as the buffer will read and write bytes. Attributes that were previously set for each topic, such as workers, will be made attributes of the plugin, rather than topic.Detailed Process
RawByteHandler
interface. This interface will include adeserializeBytes()
function, which the Kafka buffer will callback to when reading data.Encryption
The Kafka buffer will offer optional encryption via KMS:
GenerateDataKeyPair
API will be invoked to obtain a data key pair.Encrypt
API will then encrypt the private key, which will be sent to Kafka alongside the encrypted data.Decrypt
API will decrypt the private key, which will then decrypt the data.Metrics
The Kafka buffer will incorporate the standard buffer metrics, as well as the metrics reported by Kafka Source/Sink:
The text was updated successfully, but these errors were encountered: