Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extends GraphQL Spans #562

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

PascalSenn
Copy link

@PascalSenn PascalSenn commented Nov 27, 2023

Fixes #

Changes

This pull requests extends the capabilities of graphql open telemetry.

It adds a span for the execution of a GraphQL operation and a span for the resolvers.

I left a few comments in the definition to open a dicussion about the changes.

Merge requirement checklist

@PascalSenn PascalSenn requested review from a team November 27, 2023 16:14
Copy link

linux-foundation-easycla bot commented Nov 27, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@PascalSenn PascalSenn marked this pull request as draft November 27, 2023 16:15
brief: "The number of errors that occurred during the operation."
type: int
examples: 3
# TODO should we have something like outcome (success, failure)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure if we should have a property that signals if the execution was a success or a failure.
Then we would have to specify aswell what a 'success' and what a 'failure' is.
As a operation can fail completely ('data' is null) or partially (there are errors)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an exception we are talking about here, there's already https://github.com/open-telemetry/semantic-conventions/blob/v1.23.0/docs/exceptions/exceptions-spans.md

In general if the request fails on the client side, the span status then is marked as error. See here for some example (for HTTP) https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md#status

type: int
examples: 3
# TODO should we have something like outcome (success, failure)
# TODO how do we specify errors?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A GraphQL Operation can have multiple errors in the response:

{
  "data": null,
  "errors": [
    {
      "message": "Cannot return null for non-nullable field User",
      "locations": [
        {
          "line": 3,
          "column": 5
        }
      ],
      "path": [
        "createUser",
        "user"
      ],
      "extensions": {
        "code": "USERNAME_INVALID",
        "message": "username is invalid"
      }
    }
  ]
}

What is the best way to represent this in OpenTelemetry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples: 3
# TODO should we have something like outcome (success, failure)
# TODO how do we specify errors?
# TODO Should we ref network.transport, network.type, server.address etc?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to be done with caution though. as graphql is protocol agonstic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But those are still very protocol agnostic, no?

# TODO should we have something like outcome (success, failure)
# TODO how do we specify errors?
# TODO Should we ref network.transport, network.type, server.address etc?
# TODO There are more spans like validation, parsing, variable coercion and response formatting. Should we add them as separate span types?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It provides a lot of value to know how long a specific operation is spending validating, parsing or formatting.
Should we add specific spans for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can model how the trace should like as we want - One thing to always keep in mind is verbosity. Spans costs money in the end. Creating too many may be a problem in some cases, so if it's not essential, maybe they can be turned off or opt-in.

If so, please open a separated issue explaining the value/what it should do so we can tackle it in a separate PR.

@@ -1,6 +1,6 @@
groups:
- id: graphql
prefix: graphql
- id: graphql.server.request
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This are just the graphql.server spans. Should we also specific the graphql.client spans?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is desired, yes we should. But most likely not as part of this PR. Also, maybe it's good to create an issue so maybe discussions can happen there.

- id: selection.field.isDeprecated
brief: "Whether the field that is beeing resolved is deprecated."
type: bool
examples: true
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we do with validation errors? If someone sends a invalid GraphQL requests that e.g. could not be parsed. Which span should be emitted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same span as you would do, but in the case of an error, you set the exception attributes where the error can be recorded (as well as the span status). See HTTP for example https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md

- id: selection.field.isDeprecated
brief: "Whether the field that is beeing resolved is deprecated."
type: bool
examples: true
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that is not yet specified are the subscriptions.

A subscription is a long running operation that returns an event stream. Similar to websockets/signalR etc.

Is there prior art in open telemetry how this could be handled?

A subscription starts with a "Subscribe" call that returns a event stream. Each event in this event stream is mapped to a graphql result. This means that, the subscribe call and the execution of each event, give different insights. These insights are more important than a arbitrary long root spans that spans the whole graphql execution.

Copy link

@tobias-tengler tobias-tengler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small fix to keep the example of Query.findBookById consistent.

model/trace/instrumentation/graphql.yml Outdated Show resolved Hide resolved
model/trace/instrumentation/graphql.yml Outdated Show resolved Hide resolved
PascalSenn and others added 2 commits November 27, 2023 21:02
Co-authored-by: Tobias Tengler <[email protected]>
Co-authored-by: Tobias Tengler <[email protected]>
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 25, 2024
Copy link

github-actions bot commented Feb 2, 2024

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Feb 2, 2024
@PascalSenn
Copy link
Author

@jsuereth Can this Pull Request be reopened? How can i get feedback for this?

@lmolkova lmolkova reopened this Jul 1, 2024
@PascalSenn PascalSenn marked this pull request as ready for review September 12, 2024 18:33
@github-actions github-actions bot removed the Stale label Sep 13, 2024
Copy link

github-actions bot commented Oct 4, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 4, 2024
@PascalSenn
Copy link
Author

bump

@github-actions github-actions bot removed the Stale label Oct 5, 2024
@lmolkova lmolkova added the experts needed This issue or pull request is outside an area where general approvers feel they can approve label Oct 11, 2024
@lmolkova
Copy link
Contributor

Hi @PascalSenn,

Sorry for not reviewing the PR for a long time - unfortunately we don't have any GraphQL experts among semantic conventions approvers.

We have some folks who contributed #1389 recently and I wonder if they could give this one a review as well.

@kaylareopelle @robertlaurin would you mind taking a look and sharing ant feedback?

Thanks!

PS: happy to review from general semconv perspective sometime next week.

@@ -1,6 +1,6 @@
groups:
- id: graphql
prefix: graphql
- id: graphql.server.request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is desired, yes we should. But most likely not as part of this PR. Also, maybe it's good to create an issue so maybe discussions can happen there.

persisted queries.
type: string
examples: "aa3e37c1bf54708e93f12c137afba004"
- id: error.count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a generic "request count/duration" metric? With the error type attribute then one can find out how many errors occurred. https://github.com/open-telemetry/semantic-conventions/blob/v1.23.0/docs/http/http-metrics.md#metric-httpserverrequestduration

brief: "The number of errors that occurred during the operation."
type: int
examples: 3
# TODO should we have something like outcome (success, failure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an exception we are talking about here, there's already https://github.com/open-telemetry/semantic-conventions/blob/v1.23.0/docs/exceptions/exceptions-spans.md

In general if the request fails on the client side, the span status then is marked as error. See here for some example (for HTTP) https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md#status

type: int
examples: 3
# TODO should we have something like outcome (success, failure)
# TODO how do we specify errors?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples: 3
# TODO should we have something like outcome (success, failure)
# TODO how do we specify errors?
# TODO Should we ref network.transport, network.type, server.address etc?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But those are still very protocol agnostic, no?

# TODO should we have something like outcome (success, failure)
# TODO how do we specify errors?
# TODO Should we ref network.transport, network.type, server.address etc?
# TODO There are more spans like validation, parsing, variable coercion and response formatting. Should we add them as separate span types?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can model how the trace should like as we want - One thing to always keep in mind is verbosity. Spans costs money in the end. Creating too many may be a problem in some cases, so if it's not essential, maybe they can be turned off or opt-in.

If so, please open a separated issue explaining the value/what it should do so we can tackle it in a separate PR.

Comment on lines +52 to +80
- id: graphql.server.resolver
prefix: graphql
type: span
brief: >
This document defines semantic conventions to apply when instrumenting the GraphQL implementation.
They map GraphQL resolvers to attributes on a Span.
attributes:
- id: selection.name
brief: "The name of the selection that is being resolved. Either the field name or an alias."
type: string
examples: "findBookById"
- id: selection.type # selection.field.type ?
brief: "The type of the field that is beeing resolved"
type: string
examples: "Book"
- id: selection.path
brief: "The path of the selection that is beeing resolved."
type: string
examples: "/foo/bar/0/baz"
- id: selection.field.name
brief: "The name of the field that is beeing resolved."
type: string
examples: "findBookById"
- id: selection.field.declaringType
brief: "The type that declares the field that is beeing resolved."
type: string
examples: "Query"
- id: selection.field.coordinate
brief: "The coordinate of the field that is beeing resolved."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All attributes should be defined in the attributes registry, not directly where it is used. https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md#1-modify-the-yaml-model

Then here you use ref to use them.

- id: selection.field.isDeprecated
brief: "Whether the field that is beeing resolved is deprecated."
type: bool
examples: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same span as you would do, but in the case of an error, you set the exception attributes where the error can be recorded (as well as the span status). See HTTP for example https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:graphql experts needed This issue or pull request is outside an area where general approvers feel they can approve
Projects
Development

Successfully merging this pull request may close these issues.

7 participants