Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post for OpenTelemetry Generative AI updates #5575

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
5c062d9
Initial draft
drewby Oct 25, 2024
e4735c4
Merge branch 'open-telemetry:main' into drewby/genai_blog
drewby Nov 9, 2024
5c9a04b
Add screenshots
drewby Nov 9, 2024
c3d7c9e
Update link for Python repo
drewby Nov 9, 2024
db18921
fix spelling error
drewby Nov 9, 2024
67f2bc9
Add link to Aspire Dashboard
drewby Nov 9, 2024
6a51e0c
Add link to Jaeger
drewby Nov 9, 2024
c66f6b4
Add words to cspell
drewby Nov 9, 2024
fdef792
Updates from PR review
drewby Nov 9, 2024
35c2487
Rename files
drewby Nov 9, 2024
1796b6e
Pin version
drewby Nov 9, 2024
8dd9a11
Update linting/spelling error
drewby Nov 9, 2024
4159b7d
Fix format error
drewby Nov 9, 2024
480235b
Updates to library intro and metric section
drewby Nov 9, 2024
aef7577
Update introduction
drewby Nov 10, 2024
16877ad
Use headers instead of bold
drewby Nov 10, 2024
c7e3ca9
Fix formatting
drewby Nov 10, 2024
9f248fe
Link to docs page
drewby Nov 11, 2024
9813dc8
Add links to spec and python projects
drewby Nov 11, 2024
c183cb6
Colon instead of period
drewby Nov 11, 2024
17ec754
Add issue and sig
drewby Nov 11, 2024
d421b70
Move text to flow better in outline
drewby Nov 11, 2024
915918b
Clarify library focus
drewby Nov 11, 2024
38b9619
Add comment about using Events
drewby Nov 11, 2024
4665137
Change Spans to Traces
drewby Nov 11, 2024
917e931
Specifics about the first Instrumentation Library
drewby Nov 12, 2024
35dca24
Use alert shortcode
drewby Nov 12, 2024
78607e1
Add link to instrumentation library
drewby Nov 13, 2024
be9608f
Merge branch 'open-telemetry:main' into drewby/genai_blog
drewby Nov 14, 2024
638e3b4
Move words from cspell to article header
drewby Nov 14, 2024
0303e36
Fix lint/link errors
drewby Nov 14, 2024
a1489fb
Update code sample
drewby Nov 14, 2024
730e1c1
fix format
drewby Nov 14, 2024
64cd559
Update code sample
drewby Nov 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .cspell/en-words.txt
drewby marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,6 @@ wordpress
WSGI
zend
zipkin
Liudmila
Molkova
GENAI
drewby marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
drewby marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
163 changes: 163 additions & 0 deletions content/en/blog/2024/otel-generative-ai/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
---
title: OpenTelemetry for Generative AI
linkTitle: OpenTelemetry for Generative AI
date: 2024-11-09
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this comment here to keep an eye on setting the date right when we finally publish. Do not resolve.

Suggested change
date: 2024-11-09
date: 2024-11-09

author: >-
[Drew Robbins](https://github.com/drewby) (Microsoft), [Liudmila
Molkova](https://github.com/lmolkova) (Microsoft)
issue: [#5581](https://github.com/open-telemetry/opentelemetry.io/issues/5581)
sig: SIG GenAI Observability
---
drewby marked this conversation as resolved.
Show resolved Hide resolved
drewby marked this conversation as resolved.
Show resolved Hide resolved

As organizations increasingly adopt Large Language Models (LLMs) and other
generative AI technologies, ensuring reliable performance, efficiency, and
safety is essential to meet user expectations, optimize resource costs, and
safeguard against unintended outputs. Effective observability for AI operations,
behaviors, and outcomes can help meet these goals. OpenTelemetry is being
enhanced to support these needs specifically for generative AI.

Two primary assets are in development to make this possible: **Semantic
Conventions** and an **Instrumentation Library**. The first instrumentation library targets OpenAI in Python.

[**Semantic Conventions**](https://opentelemetry.io/docs/concepts/semantic-conventions/)
establish standardized guidelines for how telemetry data is structured and
collected across platforms, defining inputs, outputs, and operational details.
For generative AI, these conventions streamline monitoring, troubleshooting, and
optimizing AI models by standardizing attributes such as model parameters,
response metadata, and token usage. This consistency supports better
observability across tools, environments, and APIs, helping organizations track
performance, cost, and safety with ease.

The
[**Instrumentation Library**](https://opentelemetry.io/docs/specs/otel/overview/#instrumentation-libraries)
is being developed within the
[OpenTelemetry Python Contrib](https://github.com/open-telemetry/opentelemetry-python-contrib) under [instrumentation-genai](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation-genai)

Check warning on line 34 in content/en/blog/2024/otel-generative-ai/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (genai) Suggestions: (GENAI, genia, geai, gena, gebai)
project to automate telemetry collection for generative AI applications. The
first release is a Python library for instrumenting OpenAI client calls, given
Python's widespread use in AI development and the popularity of OpenAI. Designed
to integrate seamlessly with OpenAI's API, this library captures spans and
events, gathering essential data like model inputs, response metadata, and token
usage in a structured format.

## Key Signals for Generative AI

The
[Semantic Conventions for Generative AI](https://github.com/open-telemetry/semantic-conventions/tree/v1.28.0/docs/gen-ai)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Semantic Conventions for Generative AI](https://github.com/open-telemetry/semantic-conventions/tree/v1.28.0/docs/gen-ai)
[Semantic Conventions for Generative AI](/docs/specs/semconv/gen-ai/)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were thinking to pin the version here to prevent potential broken links in the future. cc @lmolkova

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree on pinning as this has drifted in the past, unless there's a process in the website to find bad links and automatically raise PRs to retarget them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a process to find bad links automatically, also this is a blog post not a documentation page, so it's expected for them to decay, see #3493, cc @chalin

focus on capturing insights into AI model behavior through three primary
signals: [Traces](https://opentelemetry.io/docs/concepts/signals/traces/),
[Metrics](https://opentelemetry.io/docs/concepts/signals/metrics/), and
[Events](https://opentelemetry.io/docs/specs/otel/logs/event-api/).

Together, these signals provide a comprehensive monitoring framework, enabling
better cost management, performance tuning, and request tracing.

### Traces: Tracing Model Interactions

Traces track each model interaction’s lifecycle, covering input parameters (for
example, temperature, top_p) and response details like token count or errors.
They provide visibility into each request, aiding in identifying bottlenecks and
analyzing the impact of settings on model output.

### Metrics: Monitoring Usage and Performance

Metrics aggregate high-level indicators like request volume, latency, and token
counts, essential for managing costs and performance. This data is particularly
critical for API-dependent AI applications with rate limits and cost
considerations.

### Events: Capturing Detailed Interactions

Events log detailed moments during model execution, such as user prompts and
model responses, providing a granular view of model interactions. These insights
are invaluable for debugging and optimizing AI applications where unexpected
behaviors may arise.
drewby marked this conversation as resolved.
Show resolved Hide resolved

{{% alert title="Note" color="info" %}} Note that we decided to use the
newer Events API (https://opentelemetry.io/docs/specs/otel/logs/event-api/)
specification in the Semantic Conventions for Generative AI. The events API
allows for us to define specific
[semantic conventions](https://opentelemetry.io/docs/specs/semconv/general/events/)
for the user prompts and model responses that we capture. {{% /alert %}}

### Extending Observability with Vendor-Specific Attributes

The Semantic Conventions also define vendor-specific attributes for platforms
like OpenAI and Azure Inference API, ensuring telemetry captures both general
and provider-specific details. This added flexibility supports multi-platform
monitoring and in-depth insights.

## Building the Python Instrumentation Library for OpenAI

This Python-based library for OpenTelemetry captures key telemetry signals for
OpenAI models, providing developers with an out-of-the-box observability
solution tailored to AI workloads. The library,
[hosted within the OpenTelemetry Python Contrib repository](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/opentelemetry-instrumentation-openai-v2%3D%3D2.0b0/instrumentation-genai/opentelemetry-instrumentation-openai-v2),
automatically collects telemetry from OpenAI model interactions, including
request and response metadata and token usage.

As generative AI applications grow, additional instrumentation libraries for
other languages will follow, extending OpenTelemetry support across more tools
and environments. The current library’s focus on OpenAI highlights its
popularity and demand within AI development, making it a valuable initial
implementation.

### Example Usage
drewby marked this conversation as resolved.
Show resolved Hide resolved

Here’s an example of using the OpenTelemetry Python library to monitor a
generative AI application with the OpenAI client. Make sure you first install
the library:

```bash
pip install opentelemetry-instrumentation-openai-v2
```

Then include the following code in your Python application:
Copy link

@codefromthecrypt codefromthecrypt Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example is currently misleading, as it looks like it is zero code. The large setup logic is not visible in the blog. This is why I strongly don't agree with the optimization of time vs clean.

There are new facets people need to plumb for this to work as expected (e.g. log events). People in genai expect to see prompt/completion data. We should make it work (zero code approach) or make very visible the code required to have that happen.


```python
from openai import OpenAI
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor

OpenAIInstrumentor().instrument()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need to add the code that configure otel SDK (or add a comment pointing to docs on how to set it up).

The code snippet:

# NOTE: OpenTelemetry Python Logs and Events APIs are in beta 
from opentelemetry import trace, _logs, _events
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._events import EventLoggerProvider

from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter())
)

_logs.set_logger_provider(LoggerProvider())
_logs.get_logger_provider().add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter()))
_events.set_event_logger_provider(EventLoggerProvider())

an existing example that does something similar - https://github.com/open-telemetry/opentelemetry-python/blob/stable/docs/examples/logs/example.py

/cc @lzchen @xrmx in case they have some suggestions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of extra code. Should I link to that code example or is there a better doc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me add it to open-telemetry/opentelemetry-python-contrib#2988 and, once merged, let's just link it.

client = OpenAI()
response = client.chat.completions.create(
model="gpt-4-mini",
drewby marked this conversation as resolved.
Show resolved Hide resolved
messages=[{"role": "user", "content": "Write a short poem on OpenTelemetry."}],
)

# The library captures telemetry, including request and response metadata, token usage, and more.
```

With this simple instrumentation, one can begin capture traces from their
generative AI application. Here is an example from the
[Aspire Dashboard](https://learn.microsoft.com/dotnet/aspire/fundamentals/dashboard/standalone?tabs=bash)
for local debugging.

![Chat trace in Aspire Dashboard](aspire-dashboard-trace.png)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed both of these traces are not from the sample code, as they have spans that it wouldn't produce. Is this ok?


Here is a similar trace captured in
[Jaeger](https://www.jaegertracing.io/docs/next-release-v2/getting-started/#running):
Copy link
Member

@yurishkuro yurishkuro Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[Jaeger](https://www.jaegertracing.io/docs/next-release-v2/getting-started/#running):
[Jaeger](https://www.jaegertracing.io/docs/1.63/getting-started/#all-in-one):


![Chat trace in Jaeger](jaeger-trace.png)
drewby marked this conversation as resolved.
Show resolved Hide resolved

It's also easy to capture the content history of the chat for debugging and
improving your application. Simply set the environment variable
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` as follows:

```bash
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=True
```

This will turn on content capture which collects OpenTelemetry events containing
the payload:

![Content Capture Aspire Dashboard](aspire-dashboard-content-capture.png)

## Join Us in Shaping the Future of Generative AI Observability

Community collaboration is key to OpenTelemetry’s success. We invite developers,
AI practitioners, and organizations to contribute, share feedback, or
participate in discussions. Explore the OpenTelemetry Python Contrib project,
contribute code, or help shape observability for AI as it continues to evolve.
More information can be found at the
[Generative AI Observability project page](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md).
Copy link
Member

@gyliu513 gyliu513 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewby How about highlighting the current contributors at the end of the blog? This way, readers can quickly see who’s actively involved in this SIG.

Suggested change
[Generative AI Observability project page](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md).
[Generative AI Observability project page](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md), we now have contributors from [OpenLIT](https://openlit.io/), [Langtrace](https://www.langtrace.ai/), [Elastic](https://www.elastic.co/), [MicroSoft](https://www.microsoft.com/), [Traceloop](https://www.traceloop.com/), [IBM](https://www.ibm.com), [Scorecard](https://www.scorecard.io/), [Google](https://www.google.com/), [Amazon](https://aws.amazon.com/) etc., welcome to join the community!

Copy link

@Rutledge Rutledge Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to include Scorecard (scorecard.io) as a supporter and adopter of GenAI OTEL!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewby How about highlighting the current contributors at the end of the blog? This way, readers can quickly see who’s actively involved in this SIG.

It's a good suggestion and I'd be delighted to include how we've had great collaboration and contribution across all these companies.

@svrnm I'm not sure how this fits with the community/blog guidelines?

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading