Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEL Python does not always flush metrics to awsemf #851

Open
sarwaan001 opened this issue Aug 15, 2023 · 2 comments
Open

OTEL Python does not always flush metrics to awsemf #851

sarwaan001 opened this issue Aug 15, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@sarwaan001
Copy link

Describe the bug
OTEL Python Layer does not always flush metrics at the end of lambda invocation.

Steps to reproduce

  1. Deploy a lambda with the following python code:
    handler.py
"""Sample Lambda for testing"""
from opentelemetry.metrics import get_meter
from opentelemetry import trace

trace.get_tracer_provider()
tracer = trace.get_tracer(__name__)

meter = get_meter(__name__)

counter = meter.create_counter(name="invocation_counter", description="A counter metric", unit="invocations")


def lambda_handler(event, _):
    """Sample Lambda for testing"""
    counter.add(1)
    return {"status_code": 200}

config.yaml

#collector.yaml in the root directory
#Set an environemnt variable 'OPENTELEMETRY_COLLECTOR_CONFIG_FILE' to '/var/task/collector.yaml'

receivers:
  otlp:
    protocols:
      grpc:
      http:
exporters:
  logging:
    verbosity: detailed
  awsxray:
  awsemf:
    namespace: ${env:OTEL_NAMESPACE}
    dimension_rollup_option: 1
    resource_to_telemetry_conversion:
      enabled: false
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [awsxray]
    metrics:
      receivers: [otlp]
      exporters: [logging,awsemf]

Ensure that the following configuration for the lambda is set:

  • Environment
    -- AWS_LAMBDA_EXEC_WRAPPER: /opt/otel-instrument
    -- OPENTELEMETRY_COLLECTOR_CONFIG_FILE: /var/task/config.yaml
    -- OTEL_INSTRUMENTATION_AWS_LAMBDA_FLUSH_TIMEOUT: 900
    -- OTEL_NAMESPACE: SampleNamespace
    -- OTEL_PROPAGATORS: xray
    -- OTEL_PYTHON_ID_GENERATOR: xray
  • Runtime - 3.9
  • Architecture - x86_64
  • handler: handler.lambda_handler
  • layers: arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-18-0:1

Ensure the lamdba has the following permissions:

  • xray:PutTelemetryRecords
  • xray:PutTraceSegments
  • cloudwatch:GetMetricData
  • cloudwatch:GetMetricStatistics
  • cloudwatch:GetMetricStream
  • cloudwatch:PutMetricData
  • cloudwatch:PutMetricStream
  • cloudwatch:StartMetricStreams
  • logs:CreateLogGroup
  • logs:CreateLogStream
  • logs:PutLogEvents
  1. Obtain the lambda arn
  2. Ensure that you are logged in to aws cli
  3. Create the following pytest and replace the lambda arn with the lamdba that was just created.
    test.py
"""
    Tests the following Lambda by invoking the lambda 100 times and expecting the counter to return 100.
"""
import boto3
import json
from datetime import datetime
import time
def test_sample_lambda():
    lambda_arn = "<insert lambda arn>"

    lambda_client = boto3.client('lambda')
    event = json.dumps({})

    start_time = datetime.now()

    for i in range(100):
        response = lambda_client.invoke(
            FunctionName=lambda_arn,
            InvocationType='Event',
            LogType='None',
            Payload=event
        )
        assert response['StatusCode'] == 202
    
    # Wait 2 minutes for metrics to propagate + wait for last lambda
    time.sleep(2*60 + 2)

    cloudwatch_client = boto3.client('cloudwatch')

    metric_data = cloudwatch_client.get_metric_data(
        MetricDataQueries = [
            {
                'Id': 'integration_test',
                'MetricStat': {
                    'Metric': {
                        'Namespace': "SampleNamespace",
                        'MetricName': "invocation_counter",
                        'Dimensions': [{'Name': 'OTelLib', 'Value': 'handler'}]
                    },
                    'Period': 300,
                    'Stat': "Sum",
                }
            }
        ],
        StartTime=start_time,
        EndTime=datetime.now(),
    )

    otel_values = sum(metric_data['MetricDataResults'][0]['Values'])

    assert otel_values == 100

ensure you have boto3 installed

  1. run pytest

What did you expect to see?
There should be 100 values in cloudwatch. pytest should pass

What did you see instead?
Less than 100 values sent to cloudwatch, sometimes 100 on warm lambdas and the test passes.

What version of collector/language SDK version did you use?
arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python-amd64-ver-1-18-0:1

What language layer did you use?
Python

Additional context
I believe that sometimes the lambda layer does not flush emf metrics before the lambda freezes.

@sarwaan001 sarwaan001 added the bug Something isn't working label Aug 15, 2023
@stevemao
Copy link

stevemao commented Feb 3, 2024

I do not see anything going to awsemf at all. I am able to see logs when using logging exporter with the same code.

@serkan-ozal
Copy link
Contributor

serkan-ozal commented Sep 2, 2024

Hi @sarwaan001, I see that you set the flush timeout to 900 ms and I think his might not be enough (for functions will small memory limit) on coldstart because total flush timeout is shared between traces first and then metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants