Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenTelemetry logs in S3 source #5028

Closed
danhli opened this issue Oct 7, 2024 · 1 comment · Fixed by #5030
Closed

Support OpenTelemetry logs in S3 source #5028

danhli opened this issue Oct 7, 2024 · 1 comment · Fixed by #5030
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@danhli
Copy link
Contributor

danhli commented Oct 7, 2024

Is your feature request related to a problem? Please describe.
It would be nice to provide native support for OTLP JSON logs in S3 source. OTLP JSON logs confirms to the OTLP specification and are stored as JSON files (sample file). When the original log messages are in JSON format, the OTel collector would store the log messages in escaped JSON format under logRecords/body/stringValue.

For example:
"logRecords": [
  {
    "body": {
      "stringValue": "{\"key1\":\"val1\",\"key2\":\"val2\"}"
    }
  }
]

The log format is currently supported in otel_logs_source when the OTel JSON logs are ingested through http endpoints. However, if the OTLP JSON logs are stored in files in a S3 bucket, there isn't a way to parse the escaped JSON.

Describe the solution you'd like
Add a new codec option otel_logs in the S3 source configuration. With the following pipeline configuration, the OTLP JSON logs will be ingested into OpenSearch with one document per log record and all the fields in the original log messages, e.g. key1, key2, and the attributes will be stored in separate document.

For example:
version : "2"
s3-log-pipeline:
  source:
    s3:
      acknowledgments : true
      notification_type : "sqs"
      compression : "none"
      codec:
        otel_logs:
          format: "json"
      workers : 3
      sqs:
        queue_url : ""
        maximum_messages : 10
        visibility_timeout : "60s"
        visibility_duplication_protection : true
      aws:
        region : "us-west-2"
        sts_role_arn : ""
  processor:
    - parse_json:
        source : "/body"
    - delete_entries:
        with_keys : ["body"]

Expected result

  • Each log record is stored in a separate document in an OpenSearch index
  • Each document contains all the attributes that are shared by all the log records, i.e. the attributes under resourceLogs/resource

For example:
{
  "_index": "my-index",
  "_id": "qyamVZIBXzpFSvHE1m14",
  "_score": 1,
  "_source": {
    "traceId": "",
    "spanId": "",
    "schemaUrl": "https://opentelemetry.io/schemas/1.6.1",
    "key1": "val1",
    "key2": "val2",
    "key3": "ke3",
    "resource.attributes.service.name": "my.service"
  }
}

danhli added a commit to danhli/data-prepper that referenced this issue Oct 8, 2024
danhli added a commit to danhli/data-prepper that referenced this issue Oct 8, 2024
danhli added a commit to danhli/data-prepper that referenced this issue Oct 8, 2024
danhli added a commit to danhli/data-prepper that referenced this issue Oct 8, 2024
@danhli danhli changed the title Support OTLP JSON logs in S3 source Support OpenTelemetry logs in S3 source Oct 8, 2024
@oeyh oeyh added enhancement New feature or request and removed untriaged labels Oct 8, 2024
@oeyh oeyh assigned oeyh and danhli and unassigned oeyh Oct 8, 2024
@oeyh oeyh added this to the v2.10 milestone Oct 8, 2024
danhli added a commit to danhli/data-prepper that referenced this issue Oct 9, 2024
danhli added a commit to danhli/data-prepper that referenced this issue Oct 11, 2024
dlvenable pushed a commit that referenced this issue Oct 11, 2024
Support otel_logs codec in S3 source (#5028)

Signed-off-by: Daniel Li <[email protected]>
@github-project-automation github-project-automation bot moved this from Unplanned to Done in Data Prepper Tracking Board Oct 11, 2024
@dlvenable
Copy link
Member

We took the approach of implementing this as:

codec:
  otel_logs:
    format: json

This will allow us to also support a binary version in the future.

codec:
  otel_logs:
    format: binary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

3 participants