S3 buffer using pipeline transformations #4809

dlvenable · 2024-08-02T16:39:26Z

Is your feature request related to a problem? Please describe.

For workloads that are smaller and want durability, using S3 as a buffer can be a good solution.

Describe the solution you'd like

Data Prepper already has a few things that we can combine to create an S3 buffer.

An S3 source
An S3 sink
Pipeline transformations

I propose that we have a new buffer - pipeline_s3 which is implemented only as a pipeline transformation.

my-pipeline:
  source:
    http:
  buffer:
    pipeline_s3:
      bucket: mybucket
  sink:
    - opensearch:

This would transform into:

my-pipeline-source:
  source:
    http:
  buffer:
    bounded_blocking:
  sink:
    - s3:
        bucket: mybucket

my-pipeline-sink:
  source:
    s3:
      scan:
        buckets:
          - bucket:
               name: mybucket
  buffer:
    bounded_blocking:
  sink:
    - opensearch:

Describe alternatives you've considered (Optional)

We could implement an S3 buffer similar to the Kafka buffer that does not require splitting the pipeline. But, creating this would be quite a bit faster.

Also, I think we should leave room for a possible S3 buffer that is implement. My proposal is to alter the name of this buffer to make it distinct from an S3 buffer. And also to avoid confusing with other buffers such as Kafka. Thus, I called this pipeline_s3.

One alternative to changing the name is to use a flag instead - split_pipeline: true or asynchronous_buffer: true.

Additional context

N/A

The text was updated successfully, but these errors were encountered:

kkondaka · 2024-08-06T03:02:55Z

David we probably need some kind of partitioning mechanism (using folders) and make sure items in a partition are processed in order.

dlvenable added the untriaged label Aug 2, 2024

kkondaka added enhancement New feature or request and removed untriaged labels Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 buffer using pipeline transformations #4809

S3 buffer using pipeline transformations #4809

dlvenable commented Aug 2, 2024

kkondaka commented Aug 6, 2024

S3 buffer using pipeline transformations #4809

S3 buffer using pipeline transformations #4809

Comments

dlvenable commented Aug 2, 2024

kkondaka commented Aug 6, 2024