You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Data Prepper currently waits a period of time to flush the buffer on shutdown. The current logic is to wait for the entire buffer to drain or for the drain timeout to expire.
This logic does not account for end-to-end acknowledgements. If a sink is taking a while to send acknowledgements, but the buffer is empty, Data Prepper will think that the pipeline is ready for shutdown.
Because of this, Data Prepper may produce duplicate data when shutdown in the middle of reading an S3 object (e.g. half the file is sent to the sink, but we shutdown before the second half is completed).
Describe the solution you'd like
Update Data Prepper to track the acknowledgement sets for a give pipeline. Consider this when performing the shutdown to ensure that it is completed.
Describe alternatives you've considered (Optional)
None
Additional context
I was working toward a solution to #4575 which would allow the S3 source to continue to keep the message visibility open while the sink flushed. Then, I found that the sink doesn't wait at all.
I used this pipeline and a local hold_forever sink (see #4737) to demonstrate:
Is your feature request related to a problem? Please describe.
Data Prepper currently waits a period of time to flush the buffer on shutdown. The current logic is to wait for the entire buffer to drain or for the drain timeout to expire.
This logic does not account for end-to-end acknowledgements. If a sink is taking a while to send acknowledgements, but the buffer is empty, Data Prepper will think that the pipeline is ready for shutdown.
Because of this, Data Prepper may produce duplicate data when shutdown in the middle of reading an S3 object (e.g. half the file is sent to the sink, but we shutdown before the second half is completed).
Describe the solution you'd like
Update Data Prepper to track the acknowledgement sets for a give pipeline. Consider this when performing the shutdown to ensure that it is completed.
Describe alternatives you've considered (Optional)
None
Additional context
I was working toward a solution to #4575 which would allow the S3 source to continue to keep the message visibility open while the sink flushed. Then, I found that the sink doesn't wait at all.
I used this pipeline and a local
hold_forever
sink (see #4737) to demonstrate:data-prepper-config.yaml:
Data Prepper shutdown immediately. It should wait 5 minutes because the
hold_forever
sink is not sending any acknowledgements.The text was updated successfully, but these errors were encountered: