-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(host_metrics source): add a new collector for tcp stats #22057
Conversation
6efd703
to
32a5fe8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aryan9600 ! Exposing these new metrics seem useful. What do you think of putting them under the existing network
collector though? Exporting as network_tcp_*
.
i wanted to keep it as a separate collector, as these metrics are only available for linux. also, to avoid any changes to the output of users when they upgrade to the newer version with these changes. if its a blocker, then i can move it to the |
I think we already have a precedent of collectors having OS-specific metrics (e.g. |
tcp
collectornetwork
collector to expose TCP stats
39bc32d
to
bdb5596
Compare
network
collector to expose TCP statsnetwork
collector to expose TCP stats
7392558
to
0bcdd70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aryan9600, thank you for this PR!
I noticed big change in Cargo.lock
, I wonder if you can roll that file back and apply individual updates for the newly added crates only.
0bcdd70
to
9faccc7
Compare
a problem with having these TCP metrics inside the |
Hi @aryan9600, one test is failing. I will take another look at this PR once that is fixed. You can run any test locally with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ci-run-unit-mac
6419d0b
to
9faccc7
Compare
Running this locally on macOS I get multiple compilation errors originating
|
Ah, good point, that could be actually be a reason to separate the collector if it is not filterable by network device. |
@jszwedko ack, moving these metrics back into its own dedicated collector. |
9faccc7
to
53d380a
Compare
@pront this should be fixed. |
tests are passing now, previous CI run failed because third party licenses were not up to date. this has been fixed now. |
network
collector to expose TCP statsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on the UX. I'll leave the code review to @pront
964de55
to
114e3e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments on diagnostics but otherwise this is good to go 🚀
9 => Ok(TcpState::LastAck), | ||
10 => Ok(TcpState::Listen), | ||
11 => Ok(TcpState::Closing), | ||
_ => Err(TcpError::InvalidTcpState { state: value }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we could pass the string representation of state
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a bit confused, what would be the string representation of state
here, since value
doesn't correspond to a valid tcp state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad, I see that this is a catch all for states that don't map to a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing one last check on my side because last time we tried to generate linux specific platform docs it broke component generation on macOS. This something for us (Vector team) to figure out. PR looks great otherwise.
9 => Ok(TcpState::LastAck), | ||
10 => Ok(TcpState::Listen), | ||
11 => Ok(TcpState::Closing), | ||
_ => Err(TcpError::InvalidTcpState { state: value }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad, I see that this is a catch all for states that don't map to a string.
I think maybe the cue files were manually modified rather than using |
Though I do see the "Only available on Linux" bit of the doc comment is dropped. That might be an issue in the scaffolding that generates the cue file based on the structs. |
Thanks @jszwedko. @aryan9600 please take a look at the above. |
I will take a look at this. |
should i just apply the diff manually? but the diff drops the note about the metrics about linux only. i'm a bit confused about how to resolve this cleanly. |
I didn't have time to look into this but what Jesse mentioned is going to be a problem (even after we apply the patch). |
I believe that error occurs if you are running Ruby 2. Could try with Ruby 3? |
I tried Ref: #22120 This is the same issue, the options are removed when the command is ran on a platform that doesn't support them. Solving this is out of scope for this PR. We will temporarily assume the website is the
|
Add a new `tcp` collector to the `host_metrics` source to expose information about the systems's TCP stack. It exposes three metrics: * `tcp_connections_total`: The total number of TCP connections. It includes the `state` of the connection as a tag. * `tcp_tx_queued_bytes_total`: The sum of the number of bytes in the send queue across all connections. * `tcp_rx_queued_bytes_total`: The sum of the number of bytes in the receive queue across all connections. The collector is only enabled for Linux as it uses the netlink subsystem. Signed-off-by: Sanskar Jaiswal <[email protected]>
Signed-off-by: Sanskar Jaiswal <[email protected]>
Signed-off-by: Sanskar Jaiswal <[email protected]>
Signed-off-by: Sanskar Jaiswal <[email protected]>
Signed-off-by: Sanskar Jaiswal <[email protected]>
Signed-off-by: Sanskar Jaiswal <[email protected]>
3dd11e6
to
86539b8
Compare
We will address this issue. You can just run |
Thank you @aryan9600 for driving this to completion despite the docs hiccups. I created #22198 to address the problems we discovered here. |
Summary
Add a new tcp collector to the host_metrics source to expose information about the systems's TCP stack. It exposes three metrics:
tcp_connections_total
: The total number of TCP connections. It includes the state of the connection as a tag.tcp_tx_queued_bytes_total
: The sum of the number of bytes in the send queue across all connections.tcp_rx_queued_bytes_total
: The sum of the number of bytes in the receive queue across all connections.These metrics are only available for Linux systems as it uses the netlink subsystem.
Change Type
Is this a breaking change?
How did you test this PR?
ss -a -t
on both Linux x86 and arm systems.Does this PR include user facing changes?
Checklist
Cargo.lock
), pleaserun
dd-rust-license-tool write
to regenerate the license inventory and commit the changes (if any). More details here.References
closes #21972