Enabling tracing by usage of opentelemetry #99

rem1-dev · 2024-08-15T05:36:13Z

I've added tracing to the project by introducing mainly telemetry, opentelemetry and Tokyo crates. Having tracing allows analysis of bottleneck functions (functions that take majority of execution time).

RCasatta · 2024-08-19T13:27:19Z

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

philippem · 2024-08-22T23:46:33Z

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

prometheus gives us metrics ( example there is a histogram of times spent making RPC calls to bitcoind). Many alerts can be directly built on prometheus metrics.
opentelemtry (OTLP) gives us tracing within and between processes (for example we can look at a mempool update operation and see where the time is being spent)

They both enhance observability but are complementary.

shesek · 2024-08-24T21:42:28Z

Isn't prometheus trying to achieve the same thing? Should we remove that if we decide to use opentelemetry?

...
They both enhance observability but are complementary.

I'm still not quite sure I understand the differences. Prometheus can and is being used to track function execution times, for example in these functions (via _timer and its Drop) where opentelemetry was added too:

+    #[instrument(skip_all, name="Mempool::utxo")]
     pub fn utxo(&self, scripthash: &[u8]) -> Vec<Utxo> {
         let _timer = self.latency.with_label_values(&["utxo"]).start_timer();
         let entries = match self.history.get(scripthash) {
@@ -209,6 +216,7 @@ impl Mempool {
             .collect()
     }
 
+    #[instrument(skip_all, name="Mempool::stats")]
     // @XXX avoid code duplication with ChainQuery::stats()?
     pub fn stats(&self, scripthash: &[u8]) -> ScriptStats {
         let _timer = self.latency.with_label_values(&["stats"]).start_timer();
@@ -258,12 +266,14 @@ impl Mempool {
         stats
     }
 
+    #[instrument(skip_all, name="Mempool::txids")]
     // Get all txids in the mempool
     pub fn txids(&self) -> Vec<&Txid> {
         let _timer = self.latency.with_label_values(&["txids"]).start_timer();
         self.txstore.keys().collect()
     }

src/bin/electrs.rs

src/electrum/server.rs

src/config.rs

Cargo.toml

shesek · 2024-08-24T22:23:20Z

It's not critical, but it could aid review if the addition of #[instrument()] attributes was separated to its own commit

…-type Add support for anchor output type

src/daemon.rs

Cargo.toml

shesek · 2024-09-20T18:51:54Z

How do you feel about defining a no-op #[instrument] macro when the tracing feature is disabled, so that we don't have to wrap it in #[cfg_attr(feature = "tracing",... )) every time?

rem1-dev · 2024-10-01T13:00:24Z

How do you feel about defining a no-op #[instrument] macro when the tracing feature is disabled, so that we don't have to wrap it in #[cfg_attr(feature = "tracing",... )) every time?

I found out that instead of defining my own procedural macro I can just use tracing/max_level_off to turn off tracing by default and make #[instrument] macro no-op by default. I implemented it.

rem1-dev · 2024-10-01T16:33:56Z

Tracing is turned off by default and can be turned on by providing following params for both cargo build and cargo run commands:

--no-default-features --features otlp-tracing

shesek · 2024-10-01T22:31:23Z

It would be nice if it was possible to avoid the additional dependencies altogether, not just disable tracing

rem1-dev · 2024-10-03T10:19:15Z

It would be nice if it was possible to avoid the additional dependencies altogether, not just disable tracing

I guess that needs, as you suggested, a new macro that is no-op in case of otlp-tracing feature flag is off. You're right that we still have dependency on new crates. But by default the only new crates we will depend on is just one: tracing - so that we can turn off it's macro that we used. Only if we turn on future flag we pull other dependencies in addition to tracing:
"tracing-subscriber",
"opentelemetry",
"tracing-opentelemetry",
"opentelemetry-otlp",
"opentelemetry-semantic-conventions"

so we add 1 new dependency only (or 6 if one decides to use tracing). Is it still not acceptable @shesek ?

shesek · 2024-10-15T05:16:58Z

I had a preference to avoid additional dependencies altogether, and thought this should be as simple as

#[not(cfg(feature = "otlp"))]
#[proc_macro_attribute]
pub fn instrument(_attr: TokenStream, item: TokenStream) -> TokenStream { item }

However I realized that this would require making this a proc-macro crate would would prevent us from exporting anything other than proc macros, or make a separate crate just for this, which overly complicates things... So yeah, let's stick with the one extra dependency.

Cargo.toml

shesek

Added a suggestion and some nits (with apologies for the newline pedantry 😅)

It's kind of unfortunate that we have to use a negative feature, which requires using --no-default-features to disable no-otlp-tracing when enabling otlp-tracing. It'll become more annoying if we ever add other default features, as they'll have to be re-enabled manually. But I don't see a way around it... is there? (/cc @RCasatta)

src/daemon.rs

src/electrum/server.rs

src/new_index/query.rs

src/new_index/schema.rs

src/rest.rs

src/bin/electrs.rs

shesek · 2024-10-22T11:56:47Z

It's kind of unfortunate that we have to use a negative feature ...

One thing we could do to make this less of a footgun is to add some explicit useful error messages when --no-default-features is used without re-enabling either otlp features, or when otlp-tracing is enabled without using --no-default-features:

#[cfg(not(any(feature = "otlp-tracing", feature = "no-otlp-tracing")))]
compile_error!("Must enable one of the 'otlp-tracing' or 'no-otlp-tracing' features");
#[cfg(all(feature = "otlp-tracing", feature = "no-otlp-tracing"))]
compile_error!("Cannot enable both the 'otlp-tracing' and 'no-otlp-tracing' features");

RCasatta · 2024-10-22T13:21:33Z

It's kind of unfortunate that we have to use a negative feature, which requires using --no-default-features to disable no-otlp-tracing when enabling otlp-tracing. It'll become more annoying if we ever add other default features, as they'll have to be re-enabled manually. But I don't see a way around it... is there? (/cc @RCasatta)

By looking here https://github.com/tokio-rs/tracing/blob/bdbaf8007364ed2a766cca851d63de31b7c47e72/tracing/src/level_filters.rs#L68 it seems that maximum tracing is the default so we don't need to opt-in for it? Thus, it seems to me we can avoid the positive feaure?
It would require to use code like not(feature = "no_otlp_tracing") which doesn't look great but at least we would need only 1 feature and continue to guarantee that features are additive.

shesek · 2024-10-22T14:53:38Z

Thus, it seems to me we can avoid the positive feature?

The original goal for using the feature was to make (most) of the additional dependencies optional, which (I think?) we can't do with just a negative feature

RCasatta · 2024-10-23T08:34:39Z

I see... I understand the issue of having too many deps, but I think it's greater in "money-handling" libs such as wallets than in electrs, so I would bite the bullet and build some deps even though they are not used instead of adding complexity.

rem1-dev · 2024-10-23T08:37:46Z

Thus, it seems to me we can avoid the positive feature?

The original goal for using the feature was to make (most) of the additional dependencies optional, which (I think?) we can't do with just a negative feature

Yes, that was the motivation behind this, keep additional dependencies at minimum and also make #instrument[] annotations no-op by default. I couldn't find other solution than --no-default-features --features otlp-tracing and turning off tracing in default features.

src/bin/electrs.rs

- Avoid unnecessary copying of prev outpoints - When looking for both mempool and on-chain txos, accumulate the set of outpoints that remain to be looked up to avoid re-checking for them later again in the found set - Refactored lookup_txos() to use lookup_txo() internally rather than the other way around, which was less efficient - Lookup txos in mempool first, then on-chain - ChainQuery::lookup_txos() now returns a Result instead of panicking when outpoints are missing - Removed ChainQuery::lookup_avail_txos() and allow_missing, which are no longer neceesary

add binLiquid to the inherit list to enable specific builds

Using criterion so that we don't need nightly to run benchmark. The "bench" features is also introduced as a way to bench private methods via public methods feature gated via this feature.

on my machine: add_blocks time: [4.7562 ms 4.7754 ms 4.7972 ms] change: [-15.059% -14.650% -14.169%] (p = 0.00 < 0.05) Performance has improved.

add_blocks time: [3.9458 ms 3.9586 ms 3.9717 ms] change: [-17.564% -17.103% -16.660%] (p = 0.00 < 0.05) Performance has improved.

add_blocks time: [2.9788 ms 2.9938 ms 3.0101 ms] change: [-24.810% -24.373% -23.897%] (p = 0.00 < 0.05) Performance has improved.

Now on nix env we provide the env vars of the executables needed for integration testing, so we can enable tests. To be coherent with the electrum nix version used, upgrade also the autodownloaded one. Note we have to change a test assertion, it seems electrum behaviour changed, upgrading balances before confirmation.

to instead have logs like before use `RUST_LOG=debug cargo test`

By doing so simply `cargo run` launch the server, while now it is required `cargo run --bin electrs`

While compaction during initial sync makes thing much slower, it may be preferred to not require to size the disk the double of the final required space.

it is less interesting to see how many rows are written in the db and more interesting knowing the last height indexed

@TheBlueMatt

This actually hurts performance because the batched response has to be bueffered on the bitcoind side, as @TheBlueMatt explains at romanz#373 (comment) Instead, send multiple individual RPC requests in parallel using a thread pool, with a separate RPC TCP connection for each thread. Also see romanz#374

The indexing process was adding transactions into the store so that prevouts funded & spent within the same batch could be looked up via Mempool::lookup_txos(). If the indexing process later failed for any reason, these transactions would remain in the store. With this change, we instead explicitly look for prevouts funded within the same batch, then look for the rest in the chain/mempool indexes and fail if any are missing, without keeping the transactions in the store.

Previously, if any of the mempool transactions were not available because they were evicted between getting the mempool txids and txs themselves, the mempool syncing operation would be aborted and tried again from scratch. With this change, we instead keep whatever transactions we were able to fetch, then get the updated list of mempool txids and re-try fetching missing ones continuously until we're able to get a full snapshot.

- Reduce logging level for bitcoind's JSONRPC response errors These can happen pretty often for missing mempool txs and do not warrant warn-level logging. Unexpected RPC errors will bubble up and be handled appropriately. - Add more verbose logging for mempool syncing

Keep RPC TCP connections open between sync runs and reuse them, to reduce TCP connection initialization overhead.

Following Blockstream#89 (comment) and Blockstream#89 (comment)

RCasatta changed the title ~~Enabling tracing by usage of opentelemtry~~ Enabling tracing by usage of opentelemetry Aug 15, 2024

shesek reviewed Aug 24, 2024

View reviewed changes

src/bin/electrs.rs Outdated Show resolved Hide resolved

shesek reviewed Aug 24, 2024

View reviewed changes

src/electrum/server.rs Outdated Show resolved Hide resolved

shesek reviewed Aug 24, 2024

View reviewed changes

src/config.rs Show resolved Hide resolved

shesek reviewed Aug 24, 2024

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

junderw pushed a commit to junderw/electrs that referenced this pull request Sep 4, 2024

Merge pull request Blockstream#99 from mempool/mononaut/anchor-output…

961a255

…-type Add support for anchor output type

rem1-dev force-pushed the trace-otlp branch from db37cef to 0441c07 Compare September 20, 2024 10:16

philippem reviewed Sep 20, 2024

View reviewed changes

src/daemon.rs Outdated Show resolved Hide resolved

philippem reviewed Sep 20, 2024

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

rem1-dev force-pushed the trace-otlp branch from e0d85fe to 9a22b04 Compare October 7, 2024 13:24

shesek reviewed Oct 15, 2024

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

rem1-dev force-pushed the trace-otlp branch 2 times, most recently from 1dd5bc8 to e871af5 Compare October 21, 2024 12:06

shesek requested changes Oct 22, 2024

View reviewed changes

shesek reviewed Oct 23, 2024

View reviewed changes

src/bin/electrs.rs Outdated Show resolved Hide resolved

mariusz-reichert and others added 28 commits November 6, 2024 16:40

Instrumenting functions

f813909

Make telemetry optional

a5dc40a

Renaming flag for turning on tracing

28facfd

Make tracing macro no-op by default

26b006c

Restoring log init

ceeee99

nix: add binLiquid for flake.nix

3de782f

add binLiquid to the inherit list to enable specific builds

add support for blockchain.scripthash.unsubscribe

ee5ad54

introduce benchmarking

421730f

Using criterion so that we don't need nightly to run benchmark. The "bench" features is also introduced as a way to bench private methods via public methods feature gated via this feature.

Avoid recomputing txids

6c7a8f6

on my machine: add_blocks time: [4.7562 ms 4.7754 ms 4.7972 ms] change: [-15.059% -14.650% -14.169%] (p = 0.00 < 0.05) Performance has improved.

Avoid recomputing txid in TxConfRow

0bed9ca

add_blocks time: [3.9458 ms 3.9586 ms 3.9717 ms] change: [-17.564% -17.103% -16.660%] (p = 0.00 < 0.05) Performance has improved.

Avoid recomputing txid in TxRow

f6f4f08

add_blocks time: [2.9788 ms 2.9938 ms 3.0101 ms] change: [-24.810% -24.373% -23.897%] (p = 0.00 < 0.05) Performance has improved.

upgrade electrumd dep

fa2d058

Avoid print logs in tests

7e8b2a0

to instead have logs like before use `RUST_LOG=debug cargo test`

print test logs in CI

d98e0d3

Use electrs as default run

3038504

By doing so simply `cargo run` launch the server, while now it is required `cargo run --bin electrs`

Flag to enable compaction during initial sync

0122580

While compaction during initial sync makes thing much slower, it may be preferred to not require to size the disk the double of the final required space.

improve logging during initial sync

3c3a5fc

it is less interesting to see how many rows are written in the db and more interesting knowing the last height indexed

Make sure the chain tip doesn't move while fetching the mempool

bad4061

Reuse RPC threads and TCP connections

1f68ab0

Keep RPC TCP connections open between sync runs and reuse them, to reduce TCP connection initialization overhead.

Avoid recomputing txids when possible

ea061ce

Following Blockstream#89 (comment) and Blockstream#89 (comment)

removing redundand dependency

ae7b299

getblocks retry 5 times on 'Block not found on disk'

d1c9fda

rem1-dev force-pushed the trace-otlp branch from 30d3dea to d1c9fda Compare November 6, 2024 15:51

rem1-dev mentioned this pull request Nov 7, 2024

Enabling tracing by usage of opentelemetry #128

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling tracing by usage of opentelemetry #99

Enabling tracing by usage of opentelemetry #99

rem1-dev commented Aug 15, 2024

RCasatta commented Aug 19, 2024

philippem commented Aug 22, 2024

shesek commented Aug 24, 2024

shesek commented Aug 24, 2024

shesek commented Sep 20, 2024

rem1-dev commented Oct 1, 2024

rem1-dev commented Oct 1, 2024 •

edited

Loading

shesek commented Oct 1, 2024

rem1-dev commented Oct 3, 2024 •

edited

Loading

shesek commented Oct 15, 2024

shesek left a comment •

edited

Loading

shesek commented Oct 22, 2024

RCasatta commented Oct 22, 2024 •

edited

Loading

shesek commented Oct 22, 2024

RCasatta commented Oct 23, 2024 •

edited

Loading

rem1-dev commented Oct 23, 2024

Enabling tracing by usage of opentelemetry #99

Are you sure you want to change the base?

Enabling tracing by usage of opentelemetry #99

Conversation

rem1-dev commented Aug 15, 2024

RCasatta commented Aug 19, 2024

philippem commented Aug 22, 2024

shesek commented Aug 24, 2024

shesek commented Aug 24, 2024

shesek commented Sep 20, 2024

rem1-dev commented Oct 1, 2024

rem1-dev commented Oct 1, 2024 • edited Loading

shesek commented Oct 1, 2024

rem1-dev commented Oct 3, 2024 • edited Loading

shesek commented Oct 15, 2024

shesek left a comment • edited Loading

Choose a reason for hiding this comment

shesek commented Oct 22, 2024

RCasatta commented Oct 22, 2024 • edited Loading

shesek commented Oct 22, 2024

RCasatta commented Oct 23, 2024 • edited Loading

rem1-dev commented Oct 23, 2024

rem1-dev commented Oct 1, 2024 •

edited

Loading

rem1-dev commented Oct 3, 2024 •

edited

Loading

shesek left a comment •

edited

Loading

RCasatta commented Oct 22, 2024 •

edited

Loading

RCasatta commented Oct 23, 2024 •

edited

Loading