-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add syslog batching implementation #491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it looks fine to me. I don't seem where it adds the newline character to delimit between syslog lines though...
Would love to see a demo at the next ARP WG meeting.
(I sort of disregarded that this was a POC at points in time and some of my comments are more implementation-focused, sorry about that 😅 )
The newline is already part of the syslog messages, so these are added already by a method beforhand (linked for anyone curious): this is true for all possible syslog messages, so I do not even need to add this, which is really convenient. |
Adressed all the comments and additions by @ctlong above, if sufficient, please close the threads :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've still some specific concerns, which I've left as comments in this review.
In general, the implementation looks fine, though I'm not sure that I understand the necessity of the new TriggerTimer
struct.
@nicklas-dohrn can you please sign the CLA. We can't merge this unless you've done so. |
@ctlong I will take care about the CLA. @nicklas-dohrn has to be added to one of our GitHub orgs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, I think this proof of concept is correct. Implementation-wise the timer still has some issues.
Once those are fixed, I would suggest rebasing this off #573 and testing the two changes together to see if it achieves the throughput you want. Then we're all ready for a real implementation (with tests).
🙏 Could you please also update the PR description, thanks.
c937231
to
21666c8
Compare
I reimplemented the changes using a similar approach to what @ctlong proposed. |
I did some elaborate testing on the current and new approach for syslog-batching, sending from our dev cf landscape with 4 diego cells and 4 loggregator agents to a cls instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally it looks fine, I found two little things.
I will wait on @ctlong for his review
This is a new approach to switch between http and http batching. It only is different in this regard from the previous attempts, and only contains refactorings besides this change.
I rebased off the current main of loggregator agent, and adopted all changes to work with the https batching. |
@nicklas-dohrn the tests and linting are still failing. Also, I'm not sure what's going on with the history of this branch but the full PR does not reflect the latest commit you added to adopt the changes I'd suggested. Like if you go to |
Will try getting it to work now. |
I got all the tests to work again, was only due to changing settings for testing purposes, and simply forgot to change them back. |
There is a real issue turned up by the linting errors:
This is unsolvable in the current architecture, as the inversion of batch creation and retry logic would be the way better approach. This is difficult due to the creation of batching in front of unmarshalling the envelopes. @ctlong what is your opinion on this issue? P.S.: I could disable it by appending |
I see what you mean and I agree that finding a way to return an error does seems like something that should be added eventually. However, how badly do you want retries and error logs on write failures? If you want to disable this error with a TODO to come back and refactor it, I'm willing to approve that to get this change through since you've been waiting a long time for it. I don't think the "right" fix is very straightforward unfortunately. It seems like this writer should either have its own retry and logging functionality, or else inline batch writes similar to how you had them before. The former approach would be more complex while the latter approach comes with the obvious downside of envelopes potentially never being sent if there isn't a constant stream of them. What do you think? Maybe you have a better idea? |
For now, I would go to implementing the retry directly as a temporary within the http batching code, omitting the error reporting to the retry writer for now, essentially making it inert. |
Hey @ctlong, I opened an issue for discussion on the state of the retry stack, so this issue gets discoupled from that, as it is a different issue to solve: |
@nicklas-dohrn this PR is still failing linting and unit tests |
will fix that, was thinking that was fixed by the additions I made. |
Description
This is our proposal to implement syslog batching for sending logs via https.
it includes a switch between the normal syslog one log per request mode via a syslog query parameter.
This can be done with the
https-batch
:If you enable the syslog batching behaviour, it will currently write syslogbatches, where single messages are newline delimited (\n).
Currently, the batch sizes are hardwired to be around 256kb, which is already sufficient for speeding up throughput by a factor of 10x at least.
making it configurable would be an option, but I did not see the need so far.
please let me know what you think of the current approach.