-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog post for OpenTelemetry Generative AI updates #5575
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Liudmila Molkova <[email protected]>
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor | ||
|
||
OpenAIInstrumentor().instrument() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need to add the code that configure otel SDK (or add a comment pointing to docs on how to set it up).
The code snippet:
# NOTE: OpenTelemetry Python Logs and Events APIs are in beta
from opentelemetry import trace, _logs, _events
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._events import EventLoggerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(OTLPSpanExporter())
)
_logs.set_logger_provider(LoggerProvider())
_logs.get_logger_provider().add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter()))
_events.set_event_logger_provider(EventLoggerProvider())
an existing example that does something similar - https://github.com/open-telemetry/opentelemetry-python/blob/stable/docs/examples/logs/example.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of extra code. Should I link to that code example or is there a better doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me add it to open-telemetry/opentelemetry-python-contrib#2988 and, once merged, let's just link it.
content/en/blog/2024/otel-generative-ai/aspire_dashboard_trace.png
Outdated
Show resolved
Hide resolved
Co-authored-by: Liudmila Molkova <[email protected]>
Co-authored-by: Liudmila Molkova <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool one! Added some comments.
The below is about if we can make it zero code or not, and what's remaining to do so: I want to run I have a similar example, I've tried locally, and I don't see a way to implicitly configure the logging provider, yet. I'm not sure if we want to make a hybrid to reduce the amount of code or just leave the explicit tracing and logging stuff in until logging can be env configured. cc @anuraaga and @xrmx in case I got below wrong also requirements
env
Best I could manage was to add hooks only for the log/event stuff import os
from openai import OpenAI
# NOTE: OpenTelemetry Python Logs and Events APIs are in beta
from opentelemetry import _logs, _events
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._events import EventLoggerProvider
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter
_logs.set_logger_provider(LoggerProvider())
_logs.get_logger_provider().add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter()))
_events.set_event_logger_provider(EventLoggerProvider())
def main():
client = OpenAI()
messages = [
{
"role": "user",
"content": "Answer in up to 3 words: Which ocean contains the falkland islands?",
},
]
model = os.getenv("CHAT_MODEL", "gpt-4o-mini")
chat_completion = client.chat.completions.create(model=model, messages=messages)
print(chat_completion.choices[0].message.content)
if __name__ == "__main__":
main() then, I get a warning about overriding the event provider, but at least the events do show up $ dotenv run -- opentelemetry-instrument python main.py
Overriding of current EventLoggerProvider is not allowed
Indian Ocean |
not my call, but I think putting best foot forward is getting the "zero code" approach in the blog. Having otel infra code in the foreground of the project is not ideal, nor is having something look like it might work without code, then to try the demo and find out it doesn't work. I would go for best==zero code, and push for a patch. It is a small change to get us over the line and worth doing., |
pip install opentelemetry-instrumentation-openai-v2 | ||
``` | ||
|
||
Then include the following code in your Python application: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this example is currently misleading, as it looks like it is zero code. The large setup logic is not visible in the blog. This is why I strongly don't agree with the optimization of time vs clean.
There are new facets people need to plumb for this to work as expected (e.g. log events). People in genai expect to see prompt/completion data. We should make it work (zero code approach) or make very visible the code required to have that happen.
I see that the issue is fixed and pending a release of python SDK. @drewby I will leave it up to you if we wait for that or finalize this before. |
participate in discussions. Explore the OpenTelemetry Python Contrib project, | ||
contribute code, or help shape observability for AI as it continues to evolve. | ||
More information can be found at the | ||
[Generative AI Observability project page](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewby How about highlighting the current contributors at the end of the blog? This way, readers can quickly see who’s actively involved in this SIG.
[Generative AI Observability project page](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md). | |
[Generative AI Observability project page](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md), we now have contributors from [OpenLIT](https://openlit.io/), [Langtrace](https://www.langtrace.ai/), [Elastic](https://www.elastic.co/), [MicroSoft](https://www.microsoft.com/), [Traceloop](https://www.traceloop.com/), [IBM](https://www.ibm.com), [Scorecard](https://www.scorecard.io/), [Google](https://www.google.com/), [Amazon](https://aws.amazon.com/) etc., welcome to join the community! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to include Scorecard (scorecard.io) as a supporter and adopter of GenAI OTEL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewby How about highlighting the current contributors at the end of the blog? This way, readers can quickly see who’s actively involved in this SIG.
It's a good suggestion and I'd be delighted to include how we've had great collaboration and contribution across all these companies.
@svrnm I'm not sure how this fits with the community/blog guidelines?
[Aspire Dashboard](https://learn.microsoft.com/dotnet/aspire/fundamentals/dashboard/standalone?tabs=bash) | ||
for local debugging. | ||
|
||
![Chat trace in Aspire Dashboard](aspire-dashboard-trace.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed both of these traces are not from the sample code, as they have spans that it wouldn't produce. Is this ok?
@codefromthecrypt I have updated the code example. Can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs more work and another person to double-check. I checked personally noticing the images don't match the main code. It failed in jaeger, but I don't think it is about jaeger as the failure also happens in otel-tui. I think the manual setup has a glitch in it and we shouldn't publish this blog until the code here matches the images and also it doesn't produce exceptions exporting logs. If it is my mistake, please let me know what I did wrong.
I'm wondering if someone else can double-check this example as-is because it failed for me against jaeger v2 like so:
$ OPENAI_API_KEY=not-tellin python main.py
Overriding of current EventLoggerProvider is not allowed
Exception while exporting logs.
Traceback (most recent call last):
File "/Users/adriancole/oss/otel-ollama/examples/python/opentelemetry/openai/.venv/lib/python3.12/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 307, in _export_batch
self._exporter.export(self._log_records[:idx]) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/oss/otel-ollama/examples/python/opentelemetry/openai/.venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/_log_exporter/__init__.py", line 111, in export
return self._export(batch)
^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/oss/otel-ollama/examples/python/opentelemetry/openai/.venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 300, in _export
request=self._translate_data(data),
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/oss/otel-ollama/examples/python/opentelemetry/openai/.venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/_log_exporter/__init__.py", line 108, in _translate_data
return encode_logs(data)
^^^^^^^^^^^^^^^^^
File "/Users/adriancole/oss/otel-ollama/examples/python/opentelemetry/openai/.venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 37, in encode_logs
return ExportLogsServiceRequest(resource_logs=_encode_resource_logs(batch))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/oss/otel-ollama/examples/python/opentelemetry/openai/.venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 71, in _encode_resource_logs
pb2_log = _encode_log(sdk_log)
^^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/oss/otel-ollama/examples/python/opentelemetry/openai/.venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 57, in _encode_log
body=_encode_value(log_data.log_record.body),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/adriancole/oss/otel-ollama/examples/python/opentelemetry/openai/.venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/__init__.py", line 90, in _encode_value
raise Exception(f"Invalid type {type(value)} of value {value}")
Exception: Invalid type <class 'NoneType'> of value None
I used the following requirements.txt
openai~=1.54.3
# 1.28.1 is required for Log Events API/SDK
opentelemetry-sdk~=1.28.1
opentelemetry-exporter-otlp-proto-grpc~=1.28.1
opentelemetry-distro~=0.49b1
opentelemetry-instrumentation-openai-v2~=2.0b0
and I used exactly the code as listed here:
from opentelemetry import trace, _logs, _events
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._events import EventLoggerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(OTLPSpanExporter())
)
_logs.set_logger_provider(LoggerProvider())
_logs.get_logger_provider().add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter()))
_events.set_event_logger_provider(EventLoggerProvider())
from openai import OpenAI
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
OpenAIInstrumentor().instrument()
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a short poem on OpenTelemetry."}],
)
For my jaeger setup I needed to use the following config because you can no longer listen on all points via ENV variable. I actually just wanted to override the listen ports, but don't know how to do that so I merged the receiver config with a file in the repo. cc @yurishkuro
service:
extensions: [jaeger_storage, jaeger_query, remote_sampling, healthcheckv2, expvar]
pipelines:
traces:
receivers: [otlp, jaeger, zipkin]
processors: [batch]
exporters: [jaeger_storage_exporter]
telemetry:
resource:
service.name: jaeger
metrics:
level: detailed
address: 0.0.0.0:8888
logs:
level: info
# TODO Initialize telemetery tracer once OTEL released new feature.
# https://github.com/open-telemetry/opentelemetry-collector/issues/10663
extensions:
jaeger_query:
storage:
traces: some_storage
jaeger_storage:
backends:
some_storage:
memory:
max_traces: 100000
remote_sampling:
# We can either use file or adaptive sampling strategy in remote_sampling
file:
path: ./cmd/jaeger/sampling-strategies.json
# adaptive:
# sampling_store: some_store
# initial_sampling_probability: 0.1
http:
grpc:
healthcheckv2:
use_v2: true
http:
endpoint: "0.0.0.0:13133"
grpc:
expvar:
port: 27777
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
jaeger:
protocols:
grpc:
thrift_binary:
thrift_compact:
thrift_http:
zipkin:
processors:
batch:
exporters:
jaeger_storage_exporter:
trace_storage: some_storage
![Chat trace in Aspire Dashboard](aspire-dashboard-trace.png) | ||
|
||
Here is a similar trace captured in | ||
[Jaeger](https://www.jaegertracing.io/docs/next-release-v2/getting-started/#running): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Jaeger](https://www.jaegertracing.io/docs/next-release-v2/getting-started/#running): | |
[Jaeger](https://www.jaegertracing.io/docs/1.63/getting-started/#all-in-one): |
I've updated the code sample to something similar to yours. Would love to reduce the amount of code, but also want to make sure it works. Do you think we need to include the requirements and/or pip install instructions for openai, opentelemetry, etc? |
So, the last code I pasted works fine, but only if you also set Through trial and error, what I figured out is that if you have the boilerplate python to configure logging, but you don't also set
|
so current status is I can get the example working on otel-tui, but only if I export that variable: $ OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true OPENAI_API_KEY=sk-not_telling python main.py
Overriding of current EventLoggerProvider is not allowed That said, I get a failure on jaeger v2, but @yurishkuro might want to double-check me as maybe I messed something up. $ OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true OPENAI_API_KEY=sk-not_telling python main.py
Overriding of current EventLoggerProvider is not allowed
Failed to export logs to localhost:4317, error code: StatusCode.UNIMPLEMENTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewby FYI I re-pasted the code from your latest commit and it doesn't change the error when content capture is set to false.
The main thing that's a discrepancy are the traces. The code only produces a single span, and both examples have multiple spans. Probably least confusing to reproduce the traces based on the code in the example, so people don't wonder why their trace has only one vs the three in the images. WDYT?
on pip install vs requirements, it is fine either way as long as the end results work. The main issue is that pip install is only doing the opentelemetry-instrumentation-openai-v2
part, so it is missing the other dependencies I mentioned in my requirements.txt
Make sense?
As mentioned earlier, best case is we can use the zero code bootstrap thing which reduces the amount of commands and visible code. So the above are mostly around the manual trace infra code approach.
@@ -122,29 +122,47 @@ appropriate: | |||
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 | |||
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf | |||
OTEL_SERVICE_NAME=python-opentelemetry-openai | |||
OPENAI_API_KEY=<replace_with_your_openai_api_key> | |||
|
|||
# Set to false or remove to disable log events | |||
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main thing here is that if you set this to false, it will crash (at least for me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
open-telemetry/opentelemetry-python#4276 or similar will fix this, though not sure if all backends expect a log with no body or not.
from opentelemetry._events import set_event_logger_provider | ||
from opentelemetry.sdk.trace.export import SimpleSpanProcessor | ||
from opentelemetry.sdk._logs.export import SimpleLogRecordProcessor | ||
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while grpc is ok I'm not sure if all vendors support that. Also it requires platform deps to build (gcc which is slow to install). Should we switch this to http instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now, it is because aspire only supports grpc https://learn.microsoft.com/en-us/dotnet/aspire/fundamentals/dashboard/standalone?tabs=bash#configure-opentelemetry-sdk
appropriate: | ||
|
||
```bash | ||
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR; IMHO whatever we paste in a bash codeblock, people will expect to work by pasting on their own. This doesn't work, yet even if we fix the pip deps due to not exporting the variables.
maybe worth removing these two OTEL_EXPORTER
variables and leave it to defaults based on the dependency like opentelemetry-exporter-otlp-proto-grpc
or opentelemetry-exporter-otlp-proto-http
. wdyt?
Then, we have only the service name, api key and capture content ones. Also, these need to be exported or set inline. If you don't the python command won't work later. Many people put these into a .env file and use dotenv to load it.
e.g.
$ pip install "python-dotenv[cli]"
$ dotenv run -- python main.py
However, we can also just inline them like
$ OTEL_SERVICE_NAME=python-opentelemetry-openai OPENAI_API_KEY=sk-not_tellin OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true python main.py
follow-up for jaeger jaegertracing/documentation#778 |
here's example code which would stay in draft until the upcoming release of 1.28.2 and any feedback open-telemetry/opentelemetry-python-contrib#3006 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's some food for thought on jaeger which only accepts span data. Since the demo can send logs and traces, configuration care is needed.
Notably, OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=false
doesn't disable log export, rather it just pares down the log events such that they have no body. So, this means you will still get errors as it attempts to send log messages to jaeger where they aren't implemented.
One way is to make an ENV that comments out everything here, so that it works with trace-only backends, but still shows how to use it in aspire. That said, this makes more steps for folks to see the prompt/completion data and that's a big part of this instrumentation. Another way is to leave the demo setup in a way that is optimized for collectors that accept logs, and mention this implies commenting this part out if yours doesn't.
# Comment out the following if your backend doesn't support OTLP logs
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true
OTEL_LOGS_EXPORTER=otlp_proto_http
Regardless, when the code settles someone should give second eyes to the Aspire approach which appears to accept all signals. I've not yet.
Title: OpenTelemetry for Generative AI
This blog post introduces enhancements to OpenTelemetry specifically tailored for generative AI technologies, focusing on the development of Semantic Conventions and the Python Instrumentation Library.
Samples are in Python
SIG: GenAI Observability
Sponsors: @tedsuo @lmolkova
Closes: #5581