Add byte-based ingestion limits to the queue and output #39776

faec · 2024-05-30T21:34:43Z

Allow the memory queue's size to be specified in bytes rather than event count. Add bulk_max_bytes to the Elasticsearch output config, to specify ingest request sizes in bytes.

The main technical difficulties in this change were:

dynamically growing the size of the memory queue's circular buffer, since there is no longer a hard limit on its length. This is now implemented with a new circularBuffer helper that handles the index arithmetic as the buffer grows, so that event indices can be used unchanged no matter the buffer's current size.
letting incoming insert requests accumulate when they need to block. This involved linked list boilerplate that was so very similar to existing helpers that I merged them into a generic FIFO helper.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

mergify · 2024-05-30T21:35:20Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @faec? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2024-05-31T15:48:15Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b queue-byte-limits upstream/queue-byte-limits
git merge upstream/main
git push upstream queue-byte-limits

elasticmachine · 2024-06-18T22:03:16Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

leehinman

I still need to do a thorough review, but a couple test cases came to mind and I don't think we currently have tests to cover them

queue is exactly full (from byte perspective) and we try to add another event
queue is empty and we try to add an event that is larger than queue can hold
queue is partially full and we try to add an event that would put us over the limit

cmacknz · 2024-06-19T19:18:46Z

I am also interested in seeing if there are any performance changes with for these two scenarios:

A baseline comparison of this branch vs the commit on main it was branched from when using event based parameters.
A build from this branch configured to use event based parameters and the same build configured to use byte based parameters.

libbeat/common/fifo/fifo.go

cmacknz · 2024-06-19T19:27:21Z

libbeat/outputs/util.go

@@ -80,12 +81,12 @@ func NetworkClients(netclients []NetworkClient) []Client {
 // The first argument is expected to contain a queue config.Namespace.
 // The queue config is passed to assign the queue factory when
 // elastic-agent reloads the output.
-func SuccessNet(cfg config.Namespace, loadbalance bool, batchSize, retry int, encoderFactory queue.EncoderFactory, netclients []NetworkClient) (Group, error) {
+func SuccessNet(cfg config.Namespace, loadbalance bool, batchEvents, batchBytes, retry int, encoderFactory queue.EncoderFactory, netclients []NetworkClient) (Group, error) {


Is it worth defining a wrapper struct for batchEvents and batchBytes and passing through a copy of that so that nobody can ever accidentally reverse them in the argument list anywhere they are passed together like this?

cmacknz · 2024-06-19T19:41:10Z

libbeat/publisher/queue/memqueue/config.go


 func (c *config) Validate() error {
-	if c.MaxGetRequest > c.Events {
-		return errors.New("flush.min_events must be less events")
+	if c.Bytes != nil && *c.Bytes < minQueueBytes {


The validation and configuration could be unit tested, it is now more complex than before.

cmacknz · 2024-06-19T19:44:21Z

libbeat/publisher/queue/memqueue/runloop.go

+	if broker.useByteLimits() {
+		// The queue is using byte limits, start with a buffer of 2^10 and
+		// we will expand it as needed.
+		eventBufSize = 1 << 10


If your memory no longer has space for the powers of two, this reads bigger than it is. Why not just use 1024 directly?

cmacknz · 2024-06-19T19:47:16Z

libbeat/publisher/queue/memqueue/runloop.go

+// The buffer position is the entry's index modulo the buffer size: for
+// a queue with buffer size N, the entries stored in buf[0] will have
+// entry indices 0, N, 2*N, 3*N, ...
+func (l *runLoop) growEventBuffer() {


This seems reasonable but seems like a good reason to introduce benchmark tests (testing.B)

Benchmarking growEventBuffer specifically? We could do that, although (since buffer growth is one-way) we expect this to be called a ~constant number of times on any run. (If the settings and event sizes are such that the queue needs to store a million events simultaneously, that's still only 10 calls to growEventBuffer over the lifetime of the program, the main bottlenecks are still in actually processing the events.)

Not necessarily growEventBuffer, I just wanted us to stop and evaluate if there was any place microbenchmarking would be beneficial and save us time going through iterations of the end to end performance framework. If the answer is they aren't beneficial that is fine.

I think in this case the most impactful choice is probably the starting size of the buffer since that defines how many initial doublings need to happen to get to whatever steady state is.

cmacknz · 2024-07-19T13:32:49Z

As discussed, exposing these configurations immediately may complicate some of our future plans. We could still merge this while we finalize those so the code doesn't rot, but not document the options and explicitly mark them as technical preview so that we are free to change them if we need to.

cmacknz · 2024-07-19T13:33:42Z

I think if we merge this, any use of bytes based options should log a warning that the feature is in technical preview.

faec added 30 commits April 10, 2024 18:26

cleanup

a4019ec

cleanups

a3d3757

break input sources up into separate helper functions

597e0a5

finish helper function split

0df748a

rewrite the sqsReader main loop

4b70900

simplify sqsReader loop

90d9e24

adjust variable names

5f94e9b

remove unused parameter

b797261

createS3Lister -> createS3Poller

88f3980

remove unused error checks

9f32df6

cleanup

48ec82a

make a wrapper for v2.Canceler that doesn't use an extra goroutine

58e084a

remove unused parameter

1974f8f

cleanup

646374c

remove redundant helper

a43cae6

adjust variable names

f46ef06

remove extra index indirection in state lookup

d9be04b

remove redundant sync.Map

5e1fbcc

merge redundant state maps

c16a22f

remove redundant state map

f07915a

simplify s3Poller worker handling

0f483a3

Merge branch 'main' of github.com:elastic/beats into awss3-cleanup

8916d91

simplify waitgroup handling / unused errors

a8cb6bd

clean up context handling

78a7db4

adjust delay timer

edc1bd3

remove unused struct fields

1497be4

cleanup

a3e0dc8

Refactor cloudwatch worker task allocation

219e857

add unit tests for cloudwatchPoller.receive

977a0d3

Merge branch 'main' of github.com:elastic/beats into cloudwatch-fix

71134f9

replace batchList implementation with FIFO helper

ac94a2b

faec added enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels May 30, 2024

faec self-assigned this May 30, 2024

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 30, 2024

faec added 2 commits May 30, 2024 17:47

remove unrelated test change

d1ba6a1

remove unused error

91e4534

faec added 3 commits June 18, 2024 13:12

Merge branch 'main' of github.com:elastic/beats into queue-byte-limits

5aa42eb

fix merge + tests

b585727

Add docs / parameter checks

3643a8f

faec marked this pull request as ready for review June 18, 2024 22:03

faec requested a review from a team as a code owner June 18, 2024 22:03

faec requested review from rdner and fearful-symmetry June 18, 2024 22:03

pierrehilbert requested review from leehinman and removed request for rdner June 19, 2024 06:41

leehinman reviewed Jun 19, 2024

View reviewed changes

cmacknz reviewed Jun 19, 2024

View reviewed changes

libbeat/common/fifo/fifo.go Show resolved Hide resolved

cmacknz reviewed Jun 19, 2024

View reviewed changes

pierrehilbert marked this pull request as draft August 30, 2024 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add byte-based ingestion limits to the queue and output #39776

Add byte-based ingestion limits to the queue and output #39776

faec commented May 30, 2024 •

edited

Loading

mergify bot commented May 30, 2024

mergify bot commented May 31, 2024

elasticmachine commented Jun 18, 2024

leehinman left a comment

cmacknz commented Jun 19, 2024

cmacknz Jun 19, 2024 •

edited

Loading

cmacknz Jun 19, 2024

cmacknz Jun 19, 2024

cmacknz Jun 19, 2024 •

edited

Loading

faec Jun 20, 2024

cmacknz Jun 20, 2024

cmacknz commented Jul 19, 2024 •

edited

Loading

cmacknz commented Jul 19, 2024

Add byte-based ingestion limits to the queue and output #39776

Are you sure you want to change the base?

Add byte-based ingestion limits to the queue and output #39776

Conversation

faec commented May 30, 2024 • edited Loading

Checklist

mergify bot commented May 30, 2024

mergify bot commented May 31, 2024

elasticmachine commented Jun 18, 2024

leehinman left a comment

Choose a reason for hiding this comment

cmacknz commented Jun 19, 2024

cmacknz Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

cmacknz Jun 19, 2024

Choose a reason for hiding this comment

cmacknz Jun 19, 2024

Choose a reason for hiding this comment

cmacknz Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

faec Jun 20, 2024

Choose a reason for hiding this comment

cmacknz Jun 20, 2024

Choose a reason for hiding this comment

cmacknz commented Jul 19, 2024 • edited Loading

cmacknz commented Jul 19, 2024

faec commented May 30, 2024 •

edited

Loading

cmacknz Jun 19, 2024 •

edited

Loading

cmacknz Jun 19, 2024 •

edited

Loading

cmacknz commented Jul 19, 2024 •

edited

Loading