feat: produce OpenTelemetry traces with `hs-opentelemetry` #3140

develop7 · 2024-01-04T17:58:05Z

This PR introduces producing OpenTelemetry traces containing, among others, metrics same as in ServerTiming header from before.

TODO:

build with Nix as well (for now Stack only)
~~make an example of exporting log messages~~ hs-opentelemetry doesn't support logging, per Logging roadmap iand675/hs-opentelemetry#100
export actual error
make getTracer available globally: we're interested in using as many different spans as it makes sense, so getTracer should be available everywhere, as described in hs-opentelemetry-sdk's README
make use of hs-opentelemetry-wai middleware
~~look into failing Windows builds~~ hs-opentelemetry-sdk depends on unix, tracking in Windows support iand675/hs-opentelemetry#109
fix "memory test" failures (https://github.com/PostgREST/postgrest/actions/runs/7888534966/job/21526302784?pr=3140)
fix dependency issues on 9.8 (https://github.com/PostgREST/postgrest/actions/runs/7888534966/job/21526300453?pr=3140)

Running:

I sort of gave up deploying and configuring all the moving bits locally, so you'd need to create the honeycomb.io account for this one (or ask me for the invite). After that, it's quite straightforward:

Build PostgREST executable with stack build, and get its path with stack exec -- which postgrest
Get a PostgreSQL server running (e.g. run nix-shell, then postgrest-with-postgresql-15 -- cat)
Do the JWT dance (generate the JWT secret and encode the role into a token, e.g. postgrest_test_anonymous for the example DB from the above example)

Run PostgREST server with

OTEL_EXPORTER_OTLP_ENDPOINT='https://api.honeycomb.io/' \
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=<honeycomb_api_key>"  \
OTEL_SERVICE_NAME='PostgREST' OTEL_LOG_LEVEL='debug' OTEL_TRACES_SAMPLER='always_on' \
PGRST_DB_URI='<postgresql_server_url>'  PGRST_JWT_SECRET='<jwt_secret>'  \
/path/to/compiled/bin/postgrest

request some data and check the honeycomb dashboard for the traces:

steve-chavez · 2024-01-04T19:23:18Z

Awesome work! 🔥 🔥

I sort of gave up deploying and configuring all the moving bits locally, so you'd need to create the honeycomb.io account for this one

Found this Nix flake that contains an OTel GUI: https://flakestry.dev/flake/github/FriendsOfOpenTelemetry/opentelemetry-nix/1.0.1

I'll try to integrate that once the PR is ready for review.

develop7 · 2024-01-29T17:30:24Z

The recent problem I'm seemingly stuck with is hs-opentelemetry is using UnliftIO, which seems not quite composable with our (implicit, correct?) monad stack. So the deeper into the call stack the instrumented code is (the one I'm trying to wrap with inSpan), the more ridiculously complex it should be changed to be instrumented, i.e. https://github.com/PostgREST/postgrest/pull/3140/files#diff-5de3ff2b2d013b33dccece6ead9aeb61feffeb0fbd6e38779750511394cf9701R156-R157, up to the point I have no idea how to proceed further (e.g. wrapping App.handleRequests cases with their own spans, which is semantically correct)

There's a more straightforward MonadIO-involving opentelemetry library, with less activity and quite different approach to the telemetry data export (GHC eventlog → file/pipe by the GHC runtime). It looks less invasive approach, refactoring-wise, but requires more hoops to jump to actually deliver traces to Honeycomb/Lightstep/whatnot (pull eventlog → convert it to zipkin/jaeger/b3 → upload somewhere for analysis).

It also seems to boil down to the conceptual choice between online and offline traces' delivery-wise, or push and pull model.

@steve-chavez @wolfgangwalther @laurenceisla what do you think guys?

steve-chavez · 2024-01-29T23:47:15Z

@develop7 Would vault help? It was introduced on #1988, I recall it helped with IORef handling.

It's still used on

postgrest/src/PostgREST/Auth.hs

Lines 160 to 165 in d2fb67f

    
           jwtDurKey :: Vault.Key Double 
        
           jwtDurKey = unsafePerformIO Vault.newKey 
        
           {-# NOINLINE jwtDurKey #-} 
        
           getJwtDur :: Wai.Request -> Maybe Double 
        
           getJwtDur =  Vault.lookup jwtDurKey . Wai.vault

I'm still not that familiar with OTel but the basic idea I had was to store these traces on AppState and export them async.

steve-chavez · 2024-02-12T15:54:32Z

@develop7 Recently merged #3213, which logs schema cache stats to stderr. Perhaps that can be used for introductory OTel integration instead? It might be easier since the scache stats are already in IO space.

develop7 · 2024-02-13T14:47:45Z

Would vault help?

hs-opentelemetry is using it already

basic idea I had was to store these traces on AppState and export them async

Not only that, you want traces in tests too, for one.

The good news is hs-opentelemetry-utils-exceptions seems to be just what we need, let me try it.

Perhaps that can be used for introductory OTel integration instead?

Good call @steve-chavez, thank you for the suggestion. Will try too.

develop7 · 2024-02-13T17:48:51Z

it works!

steve-chavez · 2024-02-21T00:03:52Z

Since now we have an observer function and Observation module

postgrest/src/PostgREST/App.hs

Lines 170 to 172 in 229bc77

    
           handleRequest :: AuthResult -> AppConfig -> AppState.AppState -> Bool -> Bool -> PgVersion -> ApiRequest -> SchemaCache -> 
        
                           Maybe Double -> Maybe Double -> (Observation -> IO ()) -> Handler IO Wai.Response 
        
           handleRequest AuthResult{..} conf appState authenticated prepared pgVer apiReq@ApiRequest{..} sCache jwtTime parseTime observer =

postgrest/src/PostgREST/Observation.hs

Lines 15 to 18 in 229bc77

    
           data Observation 
        
             = AdminStartObs (Maybe Int) 
        
             | AppStartObs ByteString 
        
             | AppServerPortObs NS.PortNumber

Perhaps we can add some observations for the timings?

Also the Logger is now used like:

postgrest/src/PostgREST/Logger.hs

Lines 53 to 54 in 7c6c056

    
           logObservation :: LoggerState -> Observation -> IO () 
        
           logObservation loggerState obs = logWithZTime loggerState $ observationMessage obs

postgrest/src/PostgREST/CLI.hs

Line 50 in 7c6c056

CmdRun -> App.run appState (Logger.logObservation loggerState))

For OTel, maybe the following would make sense:

otelState <- Otel.init

App.run appState (Logger.logObservation loggerState >> OTel.tracer otelState))

develop7 · 2024-02-23T15:46:02Z

Perhaps we can add some observations for the timings?

Agreed, server timings definitely belong there.

develop7 · 2024-03-11T15:41:26Z

Okay, the PR is in the cooking for long enough, let's pull the plug and start small. Let's have it reviewed while I'm fixing the remaining CI failures.

wolfgangwalther · 2024-03-11T16:51:09Z

cabal.project

+
+source-repository-package
+    type: git
+    location: https://github.com/develop7/hs-opentelemetry.git
+    tag: ec5a87729ad3ad99c59fdcdfa754bafc87edac57
+    subdir: sdk api propagators/b3 propagators/w3c exporters/otlp utils/exceptions instrumentation/wai otlp


I've been working hard lately to get rid of non-hackage dependencies and would not like to introduce them again. Why do we need a fork here?

The original hs-opentelemetry depends on unix, which fails to build on Windows. But there are only two functions it uses from unix, getProcessID and getEffectiveUserID. The former is provided by unix-compat, the latter isn't. So the fork replaces unix dependency with unix-compat and removes the collection of "effective username" attribute, making it build on Windows here and now.

I see. Do you plan to upstream your fixes into hs-opentelemetry itself?

I absolutely am, and I have no intent to maintain the fork more than I need to. Windows support is tracked upstream at iand675/hs-opentelemetry#109.

wolfgangwalther · 2024-03-11T16:57:09Z

hs-opentelemetry is, according to the repo, in alpha state. According to the TODO list above, the issue tracker and the repo description, it does not support:

GHC 9.8.x
Windows
Metrics or Logging

I don't think we depend on this in the current state. And we should certainly not depend on an even-less-maintained fork of the same.

So to go forward here, there needs to be some effort put into the upstream package first, to make it usable for us.

develop7 · 2024-03-29T17:00:45Z

A status update:

GHC 9.8: hs-opentelemetry-sdk doesn't build against 9.8 because of hs-opentelemetry-exporter-otlp → proto-lens chain. Given the upstream of the latter being bit unresponsive for the suggestions to bump upper bounds, I've managed to make the latter build for 9.8 in develop7/proto-lens@985290f, but haven't figured out how to pick it up to the project since it depends on the google's protobuf compiler installed and the protobuf's source checked out. Another approach is to not use hs-o-sdk and hs-o-e-otlp altogether, which I probably should've tried way before.

wolfgangwalther · 2024-03-30T17:07:39Z

GHC 9.8: hs-opentelemetry-sdk doesn't build against 9.8 because of hs-opentelemetry-exporter-otlp → proto-lens chain. Given the upstream of the latter being bit unresponsive for the suggestions to bump upper bounds, I've managed to make the latter build for 9.8 in develop7/proto-lens@985290f,

Hm. I looked at your fork. It depends on support for GHC 9.8 in ghc-source-gen. This repo has a PR, which just was updated 3 days ago. I wouldn't call that "unresponsive", yet. Once ghc-source-gen is GHC 9.8 compatible, you could open a PR to update bounds in proto-lens itself. But since the last release for GHC 9.6 support was in December... I would not expect this to take too long to get responded to. It certainly doesn't look like it's unmaintained.

I guess for GHC 9.8 support it's just a matter of time.

What about the other issues mentioned above? Were you able to make progress on those?

mkleczek · 2024-04-04T04:22:43Z

The recent problem I'm seemingly stuck with is hs-opentelemetry is using UnliftIO, which seems not quite composable with our (implicit, correct?) monad stack. So the deeper into the call stack the instrumented code is (the one I'm trying to wrap with inSpan), the more ridiculously complex it should be changed to be instrumented, i.e. https://github.com/PostgREST/postgrest/pull/3140/files#diff-5de3ff2b2d013b33dccece6ead9aeb61feffeb0fbd6e38779750511394cf9701R156-R157, up to the point I have no idea how to proceed further (e.g. wrapping App.handleRequests cases with their own spans, which is semantically correct)

There's a more straightforward MonadIO-involving opentelemetry library, with less activity and quite different approach to the telemetry data export (GHC eventlog → file/pipe by the GHC runtime). It looks less invasive approach, refactoring-wise, but requires more hoops to jump to actually deliver traces to Honeycomb/Lightstep/whatnot (pull eventlog → convert it to zipkin/jaeger/b3 → upload somewhere for analysis).

It also seems to boil down to the conceptual choice between online and offline traces' delivery-wise, or push and pull model.

@steve-chavez @wolfgangwalther @laurenceisla what do you think guys?

In my prototype I actually played with replacing HASQL Session with an https://github.com/haskell-effectful/effectful based monad to make it extensible:

https://github.com/mkleczek/hasql-api/blob/master/src/Hasql/Api/Eff/Session.hs#L37

Using it in PostgREST required some mixins usage in Cabal:

29b946e#diff-eb6a76805a0bd3204e7abf68dcceb024912d0200dee7e4e9b9bce3040153f1e1R140

Some work was required in PostgREST startup/configuration code to set-up appropriate effect handlers and middlewares but the changes were quite well isolated.

At the end of the day I think basing your monad stack on an effect library (effectful, cleff etc.) is the way forward as it makes the solution highly extensible and configurable.

develop7 · 2024-11-04T16:54:54Z

Update: rebased the PR against latest master, updated hs-opentelemetry (with the Windows support merged!) & asked hs-opentelemetry maintainers to cut a new release in iand675/hs-opentelemetry#154 so we don't have to depend on forks again.

develop7 · 2024-11-04T16:56:10Z

Building hs-opentelemetry from source introduces a build-time dependency of protoc, a protobuf compiler, so that's why Cabal & Stack builds fail

steve-chavez · 2024-11-04T17:22:15Z

src/PostgREST/App.hs

-          delay <- AppState.getNextDelay appState
-          return $ addRetryHint delay response
-        respond resp
+    \req respond -> inSpanM (getOTelTracer appState) "respond" defaultSpanArguments $


Hey @develop7!

QQ, this would only grant us otel traces for the JSON error responses right? It would not send any other logs and I think otel is meant to send these as well?

No explicit inSpan* calls means no traces altogether. https://hackage.haskell.org/package/hs-opentelemetry-instrumentation-auto could help with that, but it requires MonadUnliftIO and, I guess, adopting mtl style?

It would not send any other logs and I think otel is meant to send these as well?

No it won't; not at the moment — while OTel spec does have logs, hs-opentelemetry is yet to support them, as well as metrics.

steve-chavez · 2024-11-06T21:28:46Z

I still wonder if we could do this in a less invasive way 🤔. We added Prometheus metrics in a simple way by using our Observation module:

postgrest/src/PostgREST/AppState.hs

Line 133 in 2564b32

    
           let observer = liftA2 (>>) (Logger.observationLogger loggerState configLogLevel) (Metrics.observationMetrics metricsState)

Ideally we would add an Otel module, add an Otel.traces function there and that would be it.

I'm not sure if this can be done entirely with hs-opentelemetry but in theory we would need a grpc dependency (format used by otel) plus an http client that does the work in a separate thread.

It seems better to maintain some of this functionality in-tree, specially since the dependency is not fully featured yet. That would also give us better control.

develop7 · 2024-11-08T15:48:42Z

@steve-chavez this could be even more preferable since every observer invocation is accompanied by the tag, so I don't have to invent span labels. As long as observer is called in all/most interesting places, right?

I'm not sure if this can be done entirely with hs-opentelemetry

It seems so — I didn't sent any of these traces to honeycomb myself, so. I might only need to fiddle with the call stack so it won't have observer on top, but even that will do.

develop7 force-pushed the feat_opentelemetry-traces branch from 8c0e16a to 64a0ee9 Compare January 29, 2024 17:01

develop7 force-pushed the feat_opentelemetry-traces branch from 6b891c2 to 586e7a1 Compare February 12, 2024 14:26

develop7 force-pushed the feat_opentelemetry-traces branch from 0830a45 to dc882f1 Compare February 14, 2024 15:58

steve-chavez mentioned this pull request Feb 17, 2024

refactor: add observation module #3232

Merged

develop7 force-pushed the feat_opentelemetry-traces branch from dc882f1 to 7794848 Compare February 23, 2024 15:44

develop7 force-pushed the feat_opentelemetry-traces branch from 7794848 to 398206b Compare February 23, 2024 16:04

develop7 force-pushed the feat_opentelemetry-traces branch from 398206b to 4cd99c6 Compare March 7, 2024 14:58

develop7 requested a review from steve-chavez March 11, 2024 15:37

develop7 marked this pull request as ready for review March 11, 2024 15:38

develop7 force-pushed the feat_opentelemetry-traces branch from 4cd99c6 to 94d2b9b Compare March 11, 2024 15:49

wolfgangwalther reviewed Mar 11, 2024

View reviewed changes

develop7 force-pushed the feat_opentelemetry-traces branch 2 times, most recently from 590d142 to e809a65 Compare March 12, 2024 16:31

develop7 marked this pull request as draft March 29, 2024 16:43

develop7 force-pushed the feat_opentelemetry-traces branch from e809a65 to 4697009 Compare October 23, 2024 17:01

develop7 added 2 commits October 31, 2024 14:40

otel: add opentelemetry traces

75c9e81

fix build on windows

7329ae4

develop7 added 5 commits October 31, 2024 14:40

fix: update hs-opentelemetry

4789a88

fix: 9.8 build

af7083c

otel: update hs-opentelemetry

b7af3f9

style: autofix

9f9979b

stack: fix build with stack

ac33872

develop7 force-pushed the feat_opentelemetry-traces branch from 650d008 to ac33872 Compare October 31, 2024 13:40

steve-chavez reviewed Nov 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: produce OpenTelemetry traces with `hs-opentelemetry` #3140

feat: produce OpenTelemetry traces with `hs-opentelemetry` #3140

develop7 commented Jan 4, 2024 •

edited

Loading

steve-chavez commented Jan 4, 2024

develop7 commented Jan 29, 2024

steve-chavez commented Jan 29, 2024

steve-chavez commented Feb 12, 2024

develop7 commented Feb 13, 2024

develop7 commented Feb 13, 2024

steve-chavez commented Feb 21, 2024 •

edited

Loading

develop7 commented Feb 23, 2024

develop7 commented Mar 11, 2024

wolfgangwalther Mar 11, 2024

develop7 Mar 12, 2024

wolfgangwalther Mar 12, 2024

develop7 Mar 12, 2024

wolfgangwalther commented Mar 11, 2024

develop7 commented Mar 29, 2024

wolfgangwalther commented Mar 30, 2024

mkleczek commented Apr 4, 2024

develop7 commented Nov 4, 2024

develop7 commented Nov 4, 2024

steve-chavez Nov 4, 2024

develop7 Nov 5, 2024 •

edited

Loading

steve-chavez commented Nov 6, 2024

develop7 commented Nov 8, 2024

feat: produce OpenTelemetry traces with hs-opentelemetry #3140

Are you sure you want to change the base?

feat: produce OpenTelemetry traces with hs-opentelemetry #3140

Conversation

develop7 commented Jan 4, 2024 • edited Loading

TODO:

Running:

steve-chavez commented Jan 4, 2024

develop7 commented Jan 29, 2024

steve-chavez commented Jan 29, 2024

steve-chavez commented Feb 12, 2024

develop7 commented Feb 13, 2024

develop7 commented Feb 13, 2024

steve-chavez commented Feb 21, 2024 • edited Loading

develop7 commented Feb 23, 2024

develop7 commented Mar 11, 2024

wolfgangwalther Mar 11, 2024

Choose a reason for hiding this comment

develop7 Mar 12, 2024

Choose a reason for hiding this comment

wolfgangwalther Mar 12, 2024

Choose a reason for hiding this comment

develop7 Mar 12, 2024

Choose a reason for hiding this comment

wolfgangwalther commented Mar 11, 2024

develop7 commented Mar 29, 2024

wolfgangwalther commented Mar 30, 2024

mkleczek commented Apr 4, 2024

develop7 commented Nov 4, 2024

develop7 commented Nov 4, 2024

steve-chavez Nov 4, 2024

Choose a reason for hiding this comment

develop7 Nov 5, 2024 • edited Loading

Choose a reason for hiding this comment

steve-chavez commented Nov 6, 2024

develop7 commented Nov 8, 2024

feat: produce OpenTelemetry traces with `hs-opentelemetry` #3140

feat: produce OpenTelemetry traces with `hs-opentelemetry` #3140

develop7 commented Jan 4, 2024 •

edited

Loading

steve-chavez commented Feb 21, 2024 •

edited

Loading

develop7 Nov 5, 2024 •

edited

Loading