Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metrics & ema before breaking the connection loop #4414

Merged
merged 2 commits into from
Jan 14, 2025

Conversation

lijunwangs
Copy link

Problem

It is found that active_streams is none zero even after no connections to the server. This is due to we missed updating metrics in case of connection error when handling chunks.

Summary of Changes

Update metrics and ema before break conn loop.

Fixes #

@lijunwangs
Copy link
Author

Example stats:

[2025-01-11T21:54:36.003926852Z INFO solana_metrics::metrics] datapoint: bench_vote_metrics active_connections=0i active_streams=366i new_connections=0i new_streams=0i evictions=0i connection_added_from_staked_peer=0i connection_added_from_unstaked_peer=0i connection_add_failed=0i connection_add_failed_invalid_stream_count=0i connection_add_failed_staked_node=0i connection_add_failed_unstaked_node=0i connection_add_failed_on_pruning=0i connection_removed=0i connection_remove_failed=0i connection_setup_timeout=0i connection_setup_error=0i connection_setup_error_timed_out=0i connection_setup_error_closed=0i connection_setup_error_transport=0i connection_setup_error_app_closed=0i connection_setup_error_reset=0i connection_setup_error_locally_closed=0i connection_rate_limited_across_all=0i connection_rate_limited_per_ipaddr=0i invalid_stream_size=0i packets_allocated=0i packet_batches_allocated=0i packets_sent_for_batching=0i staked_packets_sent_for_batching=0i unstaked_packets_sent_for_batching=0i bytes_sent_for_batching=0i chunks_sent_for_batching=0i packets_sent_to_consumer=0i bytes_sent_to_consumer=0i chunks_processed_by_batcher=0i chunks_received=0i staked_chunks_received=0i unstaked_chunks_received=0i packet_batch_send_error=0i handle_chunk_to_packet_batcher_send_error=0i packet_batches_sent=0i packet_batch_empty=0i stream_read_errors=0i stream_read_timeouts=0i throttled_streams=0i stream_load_ema=917i stream_load_ema_overflow=0i stream_load_capacity_overflow=0i throttled_unstaked_streams=0i throttled_staked_streams=0i process_sampled_packets_us_90pct=0i process_sampled_packets_us_min=0i process_sampled_packets_us_max=0i process_sampled_packets_us_mean=0i process_sampled_packets_count=0i perf_track_overhead_us=0i connection_rate_limiter_length=2i outstanding_incoming_connection_attempts=1i total_incoming_connection_attempts=6916i quic_endpoints_count=32i open_connections=0i refused_connections_too_many_open_connections=0i

Note the persistent non 0 value: active_streams.

@@ -1167,6 +1167,8 @@ async fn handle_connection(
CONNECTION_CLOSE_CODE_INVALID_STREAM.into(),
CONNECTION_CLOSE_REASON_INVALID_STREAM,
);
stats.total_streams.fetch_sub(1, Ordering::Relaxed);
stream_load_ema.update_ema_if_needed();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR says update metrics but this is a logic change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to restore the logic which is was broken in commit: 2be7c2e. We need to do the correct bookkeeping of the total_streams and ema before breaking the connection loop. For example,
we did the following

    stats.total_streams.fetch_add(1, Ordering::Relaxed);

before the inner loop.

After the inner loop, we need to correct these counters.
The direct break to the outer loop cause these to be missed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I understand. What I'm saying is that the PR title says "update metrics" but the most important thing in this PR is arguably updating the EMA.

Having both changes in the same PR is fine of course, but please update PR/commit to reflect what's actually being changed. Also we're backporting this to 2.1 yeah?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay -- updated the title to include ema.

@lijunwangs lijunwangs changed the title Update metrics before breaking connections Update metrics & ema before breaking connections Jan 13, 2025
@lijunwangs lijunwangs changed the title Update metrics & ema before breaking connections Update metrics & ema before breaking the connection loop Jan 13, 2025
@lijunwangs lijunwangs requested a review from sakridge January 13, 2025 09:45
@alessandrod alessandrod added the v2.1 Backport to v2.1 branch label Jan 14, 2025
Copy link

mergify bot commented Jan 14, 2025

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

@lijunwangs lijunwangs merged commit 83919b8 into anza-xyz:master Jan 14, 2025
48 checks passed
mergify bot pushed a commit that referenced this pull request Jan 14, 2025
* Update metrics and ema before breaking connection loop

(cherry picked from commit 83919b8)
alessandrod pushed a commit that referenced this pull request Jan 15, 2025
…ort of #4414) (#4450)

Update metrics & ema before breaking the connection loop (#4414)

* Update metrics and ema before breaking connection loop

(cherry picked from commit 83919b8)

Co-authored-by: Lijun Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.1 Backport to v2.1 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants