Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some endpoints are not instrumented in the configured tracing tool (OpenTelemetry) #19696

Open
mna opened this issue Jun 12, 2024 · 4 comments
Open
Assignees
Labels
~backend Backend-related issue. ~engineering-initiated Engineering-initiated story, such as a bug, refactor, or contributor experience improvement. #g-endpoint-ops Endpoint ops product group :incoming New issue in triage process. story A user story defining an entire feature

Comments

@mna
Copy link
Member

mna commented Jun 12, 2024

Fleet version:
4.51.0

Web browser and operating system:
N/A


Update (2024/11/18): The APM part of this issue is covered in #23902

💥  Actual behavior

APM (or opentelemetry) is setup for API routes here:

if config.Logging.TracingEnabled {
if config.Logging.TracingType == "opentelemetry" {
r.Use(otmiddleware.Middleware("fleet"))
} else {
apmgorilla.Instrument(r)
}
}

However, we have a number of routes that are handled outside of that Gorilla Mux:

  • health check, version and assets:

    fleet/cmd/fleet/serve.go

    Lines 967 to 970 in 9867991

    rootMux := http.NewServeMux()
    rootMux.Handle("/healthz", service.PrometheusMetricsHandler("healthz", health.Handler(httpLogger, healthCheckers)))
    rootMux.Handle("/version", service.PrometheusMetricsHandler("version", version.Handler()))
    rootMux.Handle("/assets/", service.PrometheusMetricsHandler("static_assets", service.ServeStaticAssets("/assets/")))
  • Prometheus metrics endpoint:

    fleet/cmd/fleet/serve.go

    Lines 1015 to 1028 in 9867991

    if config.Prometheus.BasicAuth.Username != "" && config.Prometheus.BasicAuth.Password != "" {
    rootMux.Handle("/metrics", basicAuthHandler(
    config.Prometheus.BasicAuth.Username,
    config.Prometheus.BasicAuth.Password,
    service.PrometheusMetricsHandler("metrics", promhttp.Handler()),
    ))
    } else {
    if config.Prometheus.BasicAuth.Disable {
    level.Info(logger).Log("msg", "metrics endpoint enabled with http basic auth disabled")
    rootMux.Handle("/metrics", service.PrometheusMetricsHandler("metrics", promhttp.Handler()))
    } else {
    level.Info(logger).Log("msg", "metrics endpoint disabled (http basic auth credentials not set)")
    }
    }
  • The frontend routes:
    rootMux.Handle("/", frontendHandler)
  • The debug routes:
    rootMux.Handle("/debug/", debugHandler)
  • The Apple MDM SCEP routes:
    mux.Handle(apple_mdm.SCEPPath, scepHandler)
  • The Apple MDM protocol routes:
    mux.Handle(apple_mdm.MDMPath, mdmHandler)

This ticket is to determine which of those (all? all but frontend? just the Apple MDM ones?) should be instrumented, and to add that instrumentation accordingly (same-ish way it is done for the API routes at the moment).

🕯️ More info (optional)

It looks like both Elastic APM and OpenTelemetry have a package to wrap a standard net/http handler, similar to the one they provide to wrap the gorilla mux used in our API routes, so those could probably be used:

We should take the time to check that they are setup in a consistent way vs the Gorilla one. Elastic APM can be tested locally (see https://github.com/fleetdm/fleet/tree/main/tools/apm-elastic).

@mna mna added bug Something isn't working as documented ~backend Backend-related issue. ~engineering-initiated Engineering-initiated story, such as a bug, refactor, or contributor experience improvement. :incoming New issue in triage process. labels Jun 12, 2024
@sharon-fdm sharon-fdm added #g-endpoint-ops Endpoint ops product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. labels Jun 14, 2024
@sharon-fdm
Copy link
Collaborator

@mna can you please provide some incentives why it's a bug (not improvement) ?
cc: @lukeheath to prioritize if an improvement.

@lukeheath lukeheath added the ~released bug This bug was found in a stable release. label Jun 14, 2024
@mna
Copy link
Member Author

mna commented Jun 17, 2024

@sharon-fdm @lukeheath Good question, I originally labeled it a bug as it looks like we wanted to instrument all routes and missed a few, but I could see it as an improvement too (we instrumented some endpoints, and now we want to extend that instrumentation).

I think for Fleet users that enable instrumentation, it would be surprising to only have a subset of endpoints show up in their dashboard - which makes it more a bug. But on the other hand I don't think we document much how to enable instrumentation (whether it's the open telemetry one or the Elastic APM one, I can't find any mention of how to enable them in https://fleetdm.com/docs/configuration/fleet-server-configuration).

@sharon-fdm
Copy link
Collaborator

sharon-fdm commented Jun 18, 2024

@mna, OK. Will treat as a bug and will estimate next estimation session.

@sharon-fdm sharon-fdm removed the bug Something isn't working as documented label Jun 19, 2024
@sharon-fdm
Copy link
Collaborator

sharon-fdm commented Jun 19, 2024

We estimate as 3 and think it's a feature that needs prioritization.
@mna Since this is more of a story, I'd like to see the user story (what's the pain/benefit?)
cc: @lukeheath

@sharon-fdm sharon-fdm added the story A user story defining an entire feature label Jun 19, 2024
@sharon-fdm sharon-fdm added this to the 4.54.0-tentative milestone Jun 24, 2024
@sharon-fdm sharon-fdm added :product Product Design department (shows up on 🦢 Drafting board) and removed ~released bug This bug was found in a stable release. :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. labels Jun 24, 2024
@noahtalerman noahtalerman removed the :product Product Design department (shows up on 🦢 Drafting board) label Jun 27, 2024
@lukeheath lukeheath removed this from the 4.54.0 milestone Jul 9, 2024
@getvictor getvictor self-assigned this Nov 15, 2024
getvictor added a commit that referenced this issue Nov 18, 2024
#19696 

# Checklist for submitter

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
- [x] Manual QA for all new/changed functionality
@getvictor getvictor changed the title Some endpoints are not instrumented in the configured tracing tool (Elastic APM or OpenTelemetry) Some endpoints are not instrumented in the configured tracing tool (OpenTelemetry) Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
~backend Backend-related issue. ~engineering-initiated Engineering-initiated story, such as a bug, refactor, or contributor experience improvement. #g-endpoint-ops Endpoint ops product group :incoming New issue in triage process. story A user story defining an entire feature
Development

No branches or pull requests

5 participants