Releases: opensearch-project/data-prepper
Releases · opensearch-project/data-prepper
2.10.1
2.10.0
2024-10-15 Version 2.10.0
Features
- Kafka source: support SASL/SCRAM mechanisms (#4241)
- OpenSearch Bulk API Source (#248)
- Support AWS Kinesis Data Streams as a Source (#1082)
- Support OpenTelemetry logs in S3 source (#5028)
Enhancements
- Use HTML in JsonPropertyDescription instead of Markdown (#4984)
- Variable drain time when shutting down via shutdown API (#4966)
- Make max connections and acquire timeout configurable on S3 sink client (#4949)
- Support BigDecimal data type in expressions (#4817)
- Caching implementation of EventKeyFactory (#4843)
- Json codec changes with specific json input codec config (#5054)
Bug Fixes
- [BUG] Close Opensearch RestHighLevelClient in OpenSearchClientRefresher on shutdown and initialization failure (#4770)
Security
- CVE-2024-6345 (High) detected in setuptools-68.0.0-py3-none-any.whl (#4940)
- CVE-2023-46136 (High) detected in Werkzeug-2.2.3-py3-none-any.whl (#4938)
- CVE-2024-34069 (High) detected in Werkzeug-2.2.3-py3-none-any.whl (#4938)
- CVE-2024-37891 (Medium) detected in urllib3-2.0.7-py3-none-any.whl (#4937)
- CVE-2024-35195 (Medium) detected in requests-2.31.0-py3-none-any.whl (#4939)
- CVE-2024-5569 (Low) detected in zipp-3.15.0-py3-none-any.whl ([#4936]#4936))
Maintenance
2.9.0
2024-08-28 Version 2.9.0
Features
- Support sets and set operations in Data Prepper expressions (#3854)
- Add startsWith expression function (#4840)
- Support default route option for Events that match no other route (#4615)
- Delete input for processors which expand the event (#3968)
- Dynamic Rule Detection (#4600)
- Kafka Source should support message headers (#4565)
- Aggregate processor : add option to allow raw events (#4598)
- Add support for start and end times in count and histogram aggregate actions (#4614)
- Add an option to count unique values of specified key(s) to CountAggregateAction (#4644)
- Flatten processor: option for keys wihout brackets (#4616)
- Modify Key Value processor to support string literal grouping (#4599)
- Make AWS credential management available in data-prepper-config.yaml (#2570)
Enhancements
- Support enhanced configuration of the Kafka source and buffer loggers (#4126)
- Update the rename_keys and delete_entries processors to use EventKey (#4636)
- Update the mutate string processors to use the EventKey. (#4649)
- OpenSearch Sink add support for sending pipeline parameter in BulkRequest (#4609)
- Add support for Kafka headers and timestamp in the Kafka Source (#4566)
Bug Fixes
- [BUG] Visibility duplication protection fails when using S3 source for large files and receiving 10 messages from SQS queue (#4812)
- [BUG] ChangeVisibilityTimeout call failure during pipeline shutdown. (#4575)
- [BUG] Service-map relationship should be created regardless of missing traceGroupName (#4821)
- [BUG] Unable to create stateful processors with multiple workers. (#4660)
- [BUG] Routes: regex doesn't work (#4763)
- [BUG] Grok plugin CLOUDFRONT_ACCESS_LOG pattern does not compile (#4604)
- [BUG] The user_agent processor throws exceptions with multiple threads. (#4618)
- [BUG] DynamoDB source export converts Numbers ending in 0 to scientific notation (#3840)
- Fix null document in DLQ object (#4814)
- Fix KeyValue Processor value grouping bug (#4606)
Security
- CVE-2024-6345 (High) detected in setuptools-68.0.0-py3-none-any.whl (#4738)
- CVE-2024-39689 (High) detected in certifi-2023.7.22-py3-none-any.whl (#4715)
- CVE-2024-5569 (Low) detected in zipp-3.15.0-py3-none-any.whl (#4714)
- CVE-2024-3651 (High) detected in idna-3.3-py3-none-any.whl (#4713)
- CVE-2024-35195 (Medium) detected in requests-2.31.0-py3-none-any.whl (#4562)
- CVE-2024-37891 (Medium) detected in urllib3-2.0.7-py3-none-any.whl (#4641)
Maintenance
2.8.1
2.8.0
2024-05-16 Version 2.8.0
Features
- Support Full load and CDC from AWS DocumentDB [#4534] (#4534)
- Support conditional expression to evaluate based on the data type for a given field (#4478 #4523, #4500))
- Allow using event fields in s3 sink object_key [#3310] (#3310)
- Support ndjson with a codec [#2700] (#2700)
- Support S3 bucket ownership validation on the S3 sink (#4468)
- Support encoding JSON (#832 #4514)
- Support for Event Json input and output codecs (#4436)
- Add support for dynamic bucket and default bucket in S3 sink (#4402)
- Add support to export/full load MongoDB/DocumentDB collection with
_id
field of different data type (#4503)
Enhancements
- HTTP data chunking support for kafka buffer (#4475)
- ENH: automatic credential refresh in kafka source (#4258)
- Add creation and aggregation of dynamic S3 groups based on events (#4346)
- Truncate Processor: Add support to truncate all fields in an event (#4317)
- Provide validations of AWS accountIds (#4398)
- Better metrics on OpenSearch document errors (#4344)
- Better metrics for OpenSearch duplicate documents (#4343)
- Address route and subpipeline for pipeline tranformation (#4528)
- Add support for BigDecimal in ConvertType processor (#4316)
- Checkpoint records at an interval for TPS case when AckSet is enabled (#4526)
- Write stream events that timeout to write to internal buffer in separate thread (#4524)
- Key value processor enhancements (#4521)
- Add bucket owner support to s3 sink (#4504)
- Initial work to support core data types in Data Prepper (#4496)
- Changing logging level for config transformation and fixing rule (#4466)
- Add folder-based partitioning for s3 scan source (#4455)
- Pipeline Configuration Transformation (#4446)
- Added support for multiple workers in S3 Scan Source (#4439)
- Bootstrap the RuleEngine package (#4442)
- Make s3 partition size configurable and add unit test for S3 partition creator classes (#4437)
- Remove creating S3 prefix path partition upfront (#4432)
- Change s3 sink client to async client (#4425)
- Create new codec for each s3 group in s3 sink (#4410)
- Validate the AWS account Id in the S3 source using a new annotation (#4400)
- Add server connections metric to http and otel sources (#4393)
- Log the User-Agent when Data Prepper shuts down from POST /shutdown (#4390)
- Add aggregate_threshold with maximum_size to s3 sink (#4385)
- Refactor PipelinesDataFlowModelParser to take in an InputStream instead of a file path (#4289)
- Add support to use old ddb stream image for REMOVE events (#4275)
Bug Fixes
- Fix count aggregation exemplar data (#4341)
- Revert HTTP data chunking changes for kafka buffer done in PR 4266 (#4329)
- Fix Router performance issue (#4327)
- Do not require field_split_characters to not be empty for key_value processor (#4358)
- Do not write empty lists of DlqObject to the DLQ (#4403)
- Fix transient test failure for subpipelines (#4479)
- Fix JacksonEvent to propagate ExternalOriginalTime if its set at the time of construction (#4489)
- FIX: null certificate value should be valid in opensearch connection (#4494)
- [BUG]Incorrect Behavior of Obfuscate Processor with Predefined Pattern "%{CREDIT_CARD_NUMBER}" (#4340)
- [BUG] Empty DLQ entries when version conflicts occur (#4301)
- [BUG] otel sources should show a more clear exception when receiving data that cannot be processed based on the configured compression type (#4022)
- [BUG] : unable to set field_delimiter_regex (#2946)
- Fix aggregate processor local mode (#4529)
- Add
long
as a target type forconvert_entry_type
processor (#4120) - Fix write json basic test (#4527)
- Fix depth field in template (#4509)
- Fix for S3PartitionCreatorScheduler ConcurrentModification Exception (#4473)
- Fix acknowledgements in DynamoDB (#4419)
- Fix DocumentDB source S3PathPrefix null or empty (#4472)
- Fix an issue that exception messages are masked (#4416)
- Fix bug where using upsert or update without routing parameter caused… (#4397)
- Fix bug in s3 sink dynamic bucket and catch invalid bucket message (#4413)
- Fix flaky PipelineConfigurationFileReaderTest (#4386)
- Aggregate Processor: local mode should work when there is no when condition (#4380)
Security
- CVE-2024-22201 on http2-common 9.4.51 version - autoclosed (#4452)
- CVE-2023-22102 (High) detected in mysql-connector-j-8.0.33.jar - autoclosed (#3920)
Maintenance
- Gradle 8.7 (#4417)
- Adds a Gradle convention plugin for Maven publication (#4421)
- MAINT: allow latest schema version if not specified in confluent schema (#4453)
- Publish expression and logstash-configuration to Maven (#4474)
- Create unit test report as html (#4384)
- Update Stream Ack Manager unit test and code refactor (#4383)
- Grpc exception handler: Modified to return BADREQUEST for some internal errors (#4387)
- Remove unexpected event handle message (#4388)
- Bump parquet version to 1.14.0. (#4520)
- Clear system property to disable s3 scan when stream worker exits, set s3 sink threshold to 15 s...
2.7.0
2024-03-27 Version 2.7.0
Features
- Add a GeoIP processor. (#253, #3941, #3942)
- Flatten json processor (#4128)
- Add select_entries processor (#4147)
- Decompress processor (#4016)
- Support parsing of XML fields in Events (#4165, #4024)
- Processor for parsing Amazon Ion documents (#3730)
- Append values to lists in an event (#4129)
- MapToList processor (#3935)
- Date processor to convert from epoch_second, epoch_milli, or epoch_nano (#2929, #4076)
- Support reading of old image for delete events on DynamoDB source (#4261)
- Add string truncate processor to the family of mutate string processor (#3925)
- Add join function (#4075)
Enhancements
- Support format expressions for routing in the opensearch sink (#3833)
- Allow . and @ characters to be part of json pointer in expressions (#4130)
- Support maximum request length configurations in the HTTP and OTel sources (#3931)
- Provide a config option to do node local aggregation (#4306)
- Allow peer forwarder to skip sending events to remote peer (#3996)
- Include encrypted data key in Kafka buffer message. (#3655)
- Support larger message sizes in Kafka Buffer (#3916)
- Modify S3 Source to allow multiple SQS workers (#4239)
- Add support for tracking performance of individual Events in the grok processor (#4196)
- Support codec on the file source to help with testing (#4018)
- Provide a delay processor to put a delay in the processor for debugging and testing (#3938)
- Support ByteCount in plugin parser (#3191)
- Add Buffer Latency Metric (#4237)
- Adds an append mode to the file sink (#3687)
Bug Fixes
- Attempting to evaluate if a key is null throws an Exception if the value is a List for conditional expressions (#4109)
- Data Prepper process threads stop when processors throw exceptions (#4103)
- Upsert action requires existing document in OpenSearch (#4036)
- Many Grok failures do not tag events (#4031)
- Using update, upsert, or delete actions without specifying document_id crashes the pipeline with NPE (#3988)
- OpenSearch Sink upsert action fails to create new document if it doesn't exist already (#3934)
- DynamoDb source global state not found for export (#3579)
- Missing Configuration details in Kafka documentation (#3157)
- File Source fails to process large files. (#707)
- Add key_value_when conditional to key_value processor (#4246)
- Adds Kafka producer metrics for buffer usage (#4139)
- Throw a more useful error when the S3 source is unable to determine bucket ownership (#4021)
- Add sts_header_overrides to s3 dlq configuration (#3845)
- Delay reading from the Kafka buffer as long as the circuit breaker is open (#4135)
- Use timer for sink latency metrics (#4174)
- Fix bug where process worker would shut down if a processor drops all events (#4262)
- Send acknowledgements to source when events are forwarded to remote peer (#4305)
- Injecting timestamp in index name that is not a suffix throws IllegalArgumentException (#3957)
Security
- Fixes CVE-2024-29133 (#4314)
- Fixes CVE-2024-29131 (#4313)
- Fixes CVE-2023-52428 (#4296)
- Fixes CVE-2024-23944 (#4290)
- Fixes CVE-2023-51775 (#4282)
- Fixes CVE-2024-22201 (#4186)
- Fixes CVE-2024-25710 (#4164)
- Fixes CVE-2024-26308 (#4163)
- Fixes CVE-2024-21634 (#3926)
- Fixes CVE-2023-50570 (#3870)
- Fixes CVE-2023-3635 (#3068)
Maintenance
2.6.2
2024-02-19 Version 2.6.2
Enhancements
- Add 4xx aggregate metric and shard progress metric for dynamodb source (#3913)
Bug Fixes
- S3 Scan has potential to filter out objects with the same timestamp (#4123)
- Kafka buffer attempts to create a topic when disabled (#4111)
- Grok processor match requests continue after timeout (#4026)
- Serialization error during peer-forwarding (#3981)
- BlockingBuffer.bufferUsage metric does not include records in-flight (#3936)
- Null Pointer Exception in Key Value Processor (#3928)
- Incomplete route set leads to duplicates when E2E ack is enabled. (#3866)
- Data Prepper is losing connections from S3 pool (#3809)
- Key value processor will throw NPE if source key does not exist in the Event (#3496)
- Exception in substitute string processor shuts down processor work but not pipeline (#2956)
- Add 4xx aggregate metric and shard progress metric for dynamodb source (#3921)
Security
- Fix GHSA-6g3j-p5g6-992f from OpenSearch jar (#3837)
- Fix CVE-2023-41329 (Medium) detected in wiremock-3.0.1.jar (#3954)
- Fix CVE-2023-51074 (Medium) detected in json-path-2.8.0.jar (#3919)
- Fix CVE-2023-50572 (Medium) detected in jline-3.9.0.jar, jline-3.22.0.jar (#3871)
- Require Mozilla Rhino 1.7.12 to fix SNYK-JAVA-ORGMOZILLA-1314295. (#3839)
2.6.1
2023-12-07 Version 2.6.1
Enhancements
- Add aggregate metrics for ddb source export and stream (#3728)
Bug Fixes
- Update and upsert bulk actions do not include changes from document_root_key, exclude_keys, etc. (#3745)
- S3 source processes SQS notification when S3 folder is created (#3727)
Security
- Fix CVE-2023-6378 and CVE-2023-6481 by updating logback to 1.4.14 (#3729, #3817)
- Require nimbus-jose-jwt 9.37.1 to fix CVE-2021-31684 and CVE-2023-1370 (#3731)
- Updates example analytics-service to Spring Boot 3.1.6 fixing CVE-2023-34055 (#3732)
2.6.0
2023-11-28 Version 2.6.0
Features
- Support DynamoDB as a source. (#2932)
- Use Kafka as a buffer (#3322)
- Support dynamically changing the visibility timeout for S3 Source with SQS queue (#2485)
- Create or update Amazon OpenSearch Serverless network policy (#3577)
- Sink level metric for end to end latency (#3494)
Enhancements
- Use Amazon Linux as base Docker image (#3505)
- Allow the Kafka buffer (and others that do not require the heap) to bypass the heap circuit breaker (#3616)
- Improve gRPC request exception logging (#3621)
- Configure the delay in the random string source (#3601)
- Add
distribution_version
flag toopensearch
source (#3636)
Bug Fixes
- Data Prepper is writing empty DLQ objects (#3644)
- Bulk Operation Retry Strategy should print cause of error (#3504)
- ISM index rollover actions fail because of missing setting for otel-v1-apm-span-* indices (#3506)
- AWS
opensearch
source error:ElasticsearchVersionInfo.buildFlavor
(#3640) - No permissions for writing to Amazon OpenSearch Serverless collection only shows errors after
max_retries
limit is reached (#3508) - Bulk Operation Retry Strategy should print cause of error (#3504)
- NullPointer exception in
DefaultKafkaClusterConfigSupplier
get API (#3528) - Fix bug so global read-only items do not expire from TTL in DynamoDB source coordination store (#3703)
- Check if failedDeleteCount is positive before logging an SQS error (#3686)
- Docker image jre-jammy contains Berkeley DB (#3543)
- Race condition in DefaultEventHandle (#3617)
Security
- CVE-2023-44981 (Critical) detected in multiple libraries (#3491)
- CVE-2023-36478 (High) detected in http2-hpack-11.0.12.jar, jetty-http-11.0.12.jar (#3490)
- CVE-2023-4586 (High) detected in netty-handler-4.1.100.Final.jar (#3443)
- CVE-2023-5072 (High) detected in json-20230618.jar (#3522)
- CVE-2023-39410 (High) detected in avro-1.11.0.jar (#3430)
- CVE-2023-4043 (High) detected in parsson-1.1.2.jar (#3588)
- CVE-2023-46122 (High) detected in io_2.13-1.9.1.jar (#3547)
- CVE-2023-46136 (High) detected in Werkzeug-2.2.3-py3-none-any.whl (#3552)
- CVE-2023-26048 (Medium) detected in jetty-server-11.0.12.jar (#2533)
- CVE-2023-26049 (Medium) detected in jetty-http-11.0.12.jar, jetty-server-11.0.12.jar (#2532)
- CVE-2023-40167 (Medium) detected in jetty-http-11.0.12.jar (#3359)
- CVE-2023-36479 (Medium) detected in jetty-servlets-11.0.12.jar (#3367)
- WS-2023-0236 (Low) detected in jetty-xml-11.0.12.jar (#3072)
Maintenance
- Update to the Gradle 8.x version which supports Java 21. Gradle 8.3 is supporting up to Java 20. (#3330)
- Start building Data Prepper on Java 21 (#3329)
- Integration tests to validate data going to OpenSearch (#3678)
- Unit tests fail on Windows machine (#3459)
- Fix disabled E2E ack integration tests in PipelinesWithAcksIT.java (#3472)
- Remove the
@Deprecated
fromRecord
(#3536) - Remove all unnecessary projects in the 2.6 branch (#3605)
- Update end-to-end tests to run from the released Docker image (#3566)
2.5.0
2023-10-09 Version 2.5.0
Features
- Support OpenSearch as source. (#1985)
- Support translate processor. (#1914)
- Support dissect processor. (#3362)
- Support AWS secrets in pipeline and Data Prepper config as an experimental feature. (#2780)
Enhancements
- Support update, upsert, delete bulk actions in OpenSearch sink. (#3109)
- Support inline index templates in OpenSearch sink. (#3365)
- Add retry to Kafka consumer in source. (#3399)
- Support OpenTelemetry SeverityText for logs. (#3280)
- Merging PipelineDataflowModel instead of pipeline YAML files. ([#3289]#3289)
- Support recursive feature in KeyValue processor. (#888)
Bug Fixes
- Fix NullPointerException in S3 scan when bucket kay has null value. (#3316)
- Fix a bug where S3 source does not stop on pipeline shutdown. (#3341)
- Fix exemplar list in Histogram and Count aggregations. (#3364)
Security
- Fix CVE-2023-44487, HTTP/2 reset floods. (#3474)
- Fix CVE-2023-4586. (#3443)
- Fix CVE-2023-39410. (#3430)