Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.14](backport #3963) Create Monitoring Azure OpenAI docs #4008

Merged
merged 1 commit into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

Elastic Observability offers powerful monitoring solutions to keep your Azure environments reliable and efficient, providing deep insights into the performance of your applications, services, and infrastructure components.

Learn how to use the Elastic Observability solution to observe and monitor a broad range of Azure resources and applications.
Learn how to use the Elastic Observability solution to observe and monitor a broad range of Azure resources and applications.

- <<monitor-azure-elastic-agent>>
- <<monitor-azure>>
Expand All @@ -24,3 +24,4 @@ include::monitor-azure-beats.asciidoc[]

include::monitor-azure-native.asciidoc[]

include::monitor-azure-openai.asciidoc[]
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
[discrete]
[[azure-openai-apm]]
=== Step 6: Monitor Microsoft Azure OpenAI APM with OpenTelemetry

The Azure OpenAI API provides useful data to help monitor and understand your code.
Using OpenTelemetry, you can ingest this data into Elastic {observability}.
From there, you can view and analyze your data to monitor the cost and performance of your applications.

For this tutorial, we'll be using an https://github.com/mdbirnstiehl/AzureOpenAIAPMmonitoringOtel[example Python application] and the Python OpenTelemetry libraries to instrument the application and send data to {observability}.

[discrete]
[[azure-openai-apm-env-var]]
==== Set your environment variables

To start collecting APM data for your Azure OpenAI applications, gather the OpenTelemetry OTLP exporter endpoint and authentication header from your {ecloud} instance:

. From the {kib} homepage, click **Add integrations**.
. Select the **APM** integration.
. Scroll down to **APM Agents** and select the **OpenTelemetry** tab.
. Make note of the configuration values for the following configuration settings:
* `OTEL_EXPORTER_OTLP_ENDPOINT`
* `OTEL_EXPORTER_OTLP_HEADERS`

With the configuration values from the APM integration and your https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python#retrieve-key-and-endpoint[Azure OpenAI API key and endpoint], set the following environment variables using the export command on the command line:

[source,bash]
----
export AZURE_OPENAI_API_KEY="your-Azure-OpenAI-API-key"
export AZURE_OPENAI_ENDPOINT="your-Azure-OpenAI-endpoint"
export OPENAI_API_VERSION="your_api_version"
export OTEL_EXPORTER_OTLP_AUTH_HEADER="your-otel-exporter-auth-header"
export OTEL_EXPORTER_OTLP_ENDPOINT="your-otel-exporter-endpoint"
----

[discrete]
[[azure-openai-apm-python-libraries]]
==== Download Python libraries

Install the following Python libraries using these commands:

[source,bash]
----
pip3 install opentelemetry-api
pip3 install opentelemetry-sdk
pip3 install opentelemetry-exporter-otlp
pip3 install opentelemetry-instrumentation
pip3 install opentelemetry-instrumentation-requests
pip3 install openai
pip3 install flask
----

[discrete]
[[azure-openai-apm-instrument]]
==== Instrument the application

The following code is from the https://github.com/mdbirnstiehl/AzureOpenAIAPMmonitoringOtel[example application]. In a real use case, you would add the import statements to your code.

The app we're using in this tutorial is a simple example that calls Azure OpenAI APIs with the following message: “How do I send my APM data to Elastic Observability?”:

[source,python]
----

from openai import AzureOpenAI
import openai
from flask import Flask
import monitor # Import the module
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
import urllib
import os

from opentelemetry import trace
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# Service name is required for most backends
resource = Resource(attributes={
SERVICE_NAME: "your-service-name"
})

provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=os.getenv('OTEL_EXPORTER_OTLP_ENDPOINT'),
headers="Authorization=Bearer%20"+os.getenv('OTEL_EXPORTER_OTLP_AUTH_HEADER')))

provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
RequestsInstrumentor().instrument()



# Initialize Flask app and instrument it
app = Flask(__name__)

# Set OpenAI API key
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version=os.getenv("OPENAI_API_VERSION"),
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

@app.route("/completion")
@tracer.start_as_current_span("do_work")
def completion():
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "How do I send my APM data to Elastic Observability?"}
],
max_tokens=20,
temperature=0
)

return(response.choices[0].message.content.strip())

if __name__ == "__main__":
app.run(host="localhost", port=8000, debug=True)
----

The https://github.com/mdbirnstiehl/AzureOpenAIAPMmonitoringOtel/blob/main/monitor.py[`monitor.py` file] in the example application instruments the application and can be used to instrument your own applications.

The `monitor.py` code uses monkey patching, a technique in Python that dynamically modifies the behavior of a class or module at runtime by modifying its attributes or methods, to modify the behavior of the `chat.completions` call so we can add the response metrics to the OpenTelemetry spans:

[source,python]
----
def count_completion_requests_and_tokens(func):
@wraps(func)
def wrapper(*args, **kwargs):
counters['completion_count'] += 1
response = func(*args, **kwargs)

token_count = response.usage.total_tokens
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
cost = calculate_cost(response)
strResponse = json.dumps(response, default=str)

# Set OpenTelemetry attributes
span = trace.get_current_span()
if span:
span.set_attribute("completion_count", counters['completion_count'])
span.set_attribute("token_count", token_count)
span.set_attribute("prompt_tokens", prompt_tokens)
span.set_attribute("completion_tokens", completion_tokens)
span.set_attribute("model", response.model)
span.set_attribute("cost", cost)
span.set_attribute("response", strResponse)
return response
return wrapper

# Monkey-patch the openai.Completion.create function
openai.chat.completions.create = count_completion_requests_and_tokens(openai.chat.completions.create)
----

Adding this data to our span lets us send it to our OTLP endpoint, so you can search for the data in {observability} and build dashboards and visualizations.

Implementing the following function allows you to calculate the cost of a single request to the OpenAI APIs.

[source,python]
----
def calculate_cost(response):
if response.model in ['gpt-4', 'gpt-4-0314']:
cost = (response.usage.prompt_tokens * 0.03 + response.usage.completion_tokens * 0.06) / 1000
elif response.model in ['gpt-4-32k', 'gpt-4-32k-0314']:
cost = (response.usage.prompt_tokens * 0.06 + response.usage.completion_tokens * 0.12) / 1000
elif 'gpt-3.5-turbo' in response.model:
cost = response.usage.total_tokens * 0.002 / 1000
elif 'davinci' in response.model:
cost = response.usage.total_tokens * 0.02 / 1000
elif 'curie' in response.model:
cost = response.usage.total_tokens * 0.002 / 1000
elif 'babbage' in response.model:
cost = response.usage.total_tokens * 0.0005 / 1000
elif 'ada' in response.model:
cost = response.usage.total_tokens * 0.0004 / 1000
else:
cost = 0
return cost
----

To download the example application and try it for yourself, go to the https://github.com/mdbirnstiehl/AzureOpenAIAPMmonitoringOtel[GitHub repo].

[discrete]
[[azure-openai-view-apm-data]]
==== View APM data from OpenTelemetry in {kib}

After ingesting your data, you can filter and explore it using Discover in {kib}.
Go to **Discover** from the {kib} menu under **Analytics**.
You can then filter by the fields sent to {observability} by OpenTelemetry, including:

* `numeric_labels.completion_count`
* `numeric_labels.completion_tokens`
* `numeric_labels.cost`
* `numeric_labels.prompt_tokens`
* `numeric_labels.token_count`

[role="screenshot"]
image::images/azure-openai-apm-discover.png[screenshot of the discover main page]

Then, use these fields to create visualizations and build dashboards. Refer to the {kibana-ref}/dashboard.html[Dashboard and visualizations] documentation for more information.

[role="screenshot"]
image::images/azure-openai-apm-dashboard.png[screenshot of the Azure OpenAI APM dashboard]
Loading