Skip to content

Commit

Permalink
Modify aws-lambda plugin README
Browse files Browse the repository at this point in the history
Signed-off-by: Srikanth Govindarajan <[email protected]>
  • Loading branch information
srikanthjg committed Nov 2, 2024
1 parent 5108528 commit 478d7a9
Showing 1 changed file with 115 additions and 57 deletions.
172 changes: 115 additions & 57 deletions data-prepper-plugins/aws-lambda/README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,75 @@
# AWS Lambda Processor and Sink for Data Prepper

# Lambda Processor
This document provides the configuration details and usage instructions for integrating AWS Lambda with Data Prepper, both as a processor and as a sink.

This plugin enables you to send data from your Data Prepper pipeline directly to AWS Lambda functions for further processing.
----------------------------------------------------------------------------------------
## AWS Lambda Processor
Configuration
The aws_lambda processor allows you to invoke an AWS Lambda function in your Data Prepper pipeline to process events. This can be used for synchronous or asynchronous invocations based on your requirements.

## Usage
```aidl
lambda-pipeline:
...
processor:
- aws_lambda:
aws:
region: "us-east-1"
sts_role_arn: "<arn>"
function_name: "uploadToS3Lambda"
max_retries: 3
invocation_type: "RequestResponse"
payload_model: "batch_event"
batch:
key_name: "osi_key"
threshold:
event_count: 10
event_collect_timeout: 15s
maximum_size: 3mb
Configuration Fields:

```
Field | Type | Required | Description
-------------------- | ------- | -------- | ----------------------------------------------------------------------------
function_name | String | Yes | The name of the AWS Lambda function to invoke.
invocation_type | String | Yes | Specifies the invocation type: either request-response or event.
aws.region | String | Yes | The AWS region where the Lambda function is located.
aws.sts_role_arn | String | No | ARN of the role to assume before invoking the Lambda function.
max_retries | Integer | No | Maximum number of retries if the invocation fails. Default is 3.
batch | Object | No | Optional batch settings for the Lambda invocations.
lambda_when | String | No | Conditional expression to determine when to invoke the Lambda processor.
response_codec | Object | No | Codec configuration for parsing Lambda responses.
tags_on_match_failure| List | No | A List of Strings that specifies the tags to be set in the event when lambda fails to match or an unknown exception occurs while matching.
sdk_timeout | Duration| No | Defines the time, sdk maintains the connection to the client before timing out
response_events_match| boolean | No | Defines the way Data Prepper treats the response from Lambda
```

Example Configuration:
```
processors:
- aws_lambda:
function_name: "my-lambda-function"
invocation_type: "request-response"
response_events_match: false
aws:
region: "us-east-1"
sts_role_arn: "arn:aws:iam::123456789012:role/my-lambda-role"
max_retries: 3
batch:
key_name: "events"
threshold:
event_count: 100
maximum_size: "5mb"
event_collect_timeout: PT10S
lambda_when: "event['status'] == 'process'"
```

## Usage
Invocation Type:
- request-response: Waits for the Lambda function's response before continuing.
- event: Invokes the function asynchronously without waiting for a response.
Batching: If batching is enabled by default, events are grouped together and sent in bulk to reduce Lambda invocations. The threshold within batch defines the number of events, size limit, or timeout for batching.
Codec: Currently both request and response codecs are json. Processor response requires lambda to send back a `Json Array` only.
tags_on_match_failure: A List of Strings that specifies the tags to be set in the event when lambda fails to match or an unknown exception occurs while matching. This tag may be used in conditional expressions in other parts of the configuration

## Behaviour
When the AWS Lambda processor in Data Prepper is configured for batching, it groups multiple events together into a single request based on the batch thresholds (event count, size, or time). The entire batch is sent to the Lambda function as a single payload.

`invocation_type` as request-response is used when the response from aws lambda comes back to dataprepper.
Lambda Response Handling:
response_events_match configuration defines how the relationship of each events in a batch as a part of request to lambda and the response from lambda.
- True: Lambda typically returns a JSON array containing the results for each event in the batch. Data Prepper will map this array back to the individual events, ensuring that each event in the batch gets the corresponding part of the response from the array.
- False: Lambda could return one or multiple events back in the response for all events in a batch. but they will not be corelated back to the original events.
Here correlation means that that the original events metadata etc will be carry forwarded to the response events.
If response_events_match is set to true, the expectation are:
1) User should return same number of response events as requests
2) Order should be maintained

In batch options, an implicit batch threshold option is that if events size is 3mb, we flush it.
`payload_model` this is used to define how the payload should be constructed from a dataprepper event by converting it to corresponding json.
`payload_model` as batch_event is used when the output needs to be formed as a batch of multiple events, and a key(key_name) will be associated with the set of events.
`payload_model` as single_event is used when the output each event is sent to lambda.
if batch option is not mentioned along with payload_model: batch_event , then batch will assume default options as follows.
default batch options:
batch_key: "events"
threshold:
event_count: 10
maximum_size: 3mb
event_collect_timeout: 15s

## Limitations
- payload limitation: 6mb payload limit
- response codec - supports only json codec


## Developer Guide
Expand All @@ -49,34 +82,59 @@ The following command runs the integration tests:
```

----------------------------------------------------------------------------------------

# Lambda Sink
## AWS Lambda Sink

This plugin enables you to send data from your Data Prepper pipeline directly to AWS Lambda functions for further processing.
```
Field | Type | Required | Description
----------------- | ------- | -------- | ----------------------------------------------------------------------------
function_name | String | Yes | The name of the AWS Lambda function to invoke.
invocation_type | String | No | Specifies the invocation type:event by default. RequestResponse is NOT supported in sink
aws.region | String | Yes | The AWS region where the Lambda function is located.
aws.sts_role_arn | String | No | ARN of the role to assume before invoking the Lambda function.
max_retries | Integer | No | Maximum number of retries if the invocation fails. Default is 3.
batch | Object | No | Optional batch settings for Lambda invocations.
lambda_when | String | No | Conditional expression to determine when to invoke the Lambda sink.
dlq | Object | No | Dead-letter queue (DLQ) configuration for failed invocations.
```

## Usage
```aidl
lambda-pipeline:
...
sink:
- aws_lambda:
aws:
region: "us-east-1"
sts_role_arn: "<arn>"
function_name: "uploadToS3Lambda"
max_retries: 3
batch:
key_name: "osi_key"
threshold:
event_count: 3
maximum_size: 6mb
event_collect_timeout: 15s
dlq:
s3:
bucket: test-bucket
key_path_prefix: dlq/
Example Configuration:
```
sink:
- aws_lambda:
function_name: "my-lambda-sink"
invocation_type: "event"
aws:
region: "us-west-2"
sts_role_arn: "arn:aws:iam::123456789012:role/my-lambda-sink-role"
max_retries: 5
batch:
key_name: "events"
threshold:
event_count: 50
maximum_size: "3mb"
event_collect_timeout: PT5S
lambda_when: "event['type'] == 'log'"
dlq:
region: "us-east-1"
sts_role_arn: "arn:aws:iam::123456789012:role/my-sqs-role"
bucket: "<<your-dlq-bucket-name>>"
```

Usage
Invocation Type:
- event: Invokes the function asynchronously without waiting for a response.
- request-response: Not supported in sink
Batching: Batching is enabled by default, events are grouped together based on the defined threshold in the batch configuration.
Dead-Letter Queue (DLQ): A DLQ can be configured to handle failures in Lambda invocations. If the invocation fails after retries, the failed events will be sent to the specified DLQ


## Additional Notes
IAM Role Assumption: Both the processor and sink can assume a specified IAM role (aws.sts_role_arn) before invoking Lambda functions. This allows for more secure handling of AWS resources.
Concurrency Considerations: When using the event invocation type, be mindful of Lambda concurrency limits to avoid throttling.
For further details on AWS Lambda integration with Data Prepper, refer to the AWS Lambda documentation: https://docs.aws.amazon.com/lambda

## Developer Guide

The integration tests for this plugin do not run as part of the Data Prepper build.
Expand All @@ -85,4 +143,4 @@ The following command runs the integration tests:
```
./gradlew :data-prepper-plugins:aws-lambda:integrationTest -Dtests.sink.lambda.region="us-east-1" -Dtests.sink.lambda.functionName="lambda_test_function" -Dtests.sink.lambda.sts_role_arn="arn:aws:iam::123456789012:role/dataprepper-role
```
```

0 comments on commit 478d7a9

Please sign in to comment.