Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Parquet Codec requires path_prefix #3201

Closed
engechas opened this issue Aug 21, 2023 · 0 comments · Fixed by #3205
Closed

[BUG] Parquet Codec requires path_prefix #3201

engechas opened this issue Aug 21, 2023 · 0 comments · Fixed by #3205
Assignees
Labels
bug Something isn't working
Milestone

Comments

@engechas
Copy link
Collaborator

Describe the bug
DataPrepper throws an exception on start up if path_prefix is not defined under the parquet codec.

Caused by: org.opensearch.dataprepper.model.plugin.InvalidPluginConfigurationException: Plugin parquet in pipeline null is configured incorrectly: pathPrefix must not be null
	at org.opensearch.dataprepper.plugin.PluginConfigurationConverter.convert(PluginConfigurationConverter.java:73) ~[data-prepper-core-2.4.0-SNAPSHOT.jar:?]
	at org.opensearch.dataprepper.plugin.DefaultPluginFactory.getConstructionContext(DefaultPluginFactory.java:115) ~[data-prepper-core-2.4.0-SNAPSHOT.jar:?]
	at org.opensearch.dataprepper.plugin.DefaultPluginFactory.loadPlugin(DefaultPluginFactory.java:74) ~[data-prepper-core-2.4.0-SNAPSHOT.jar:?]
	at org.opensearch.dataprepper.plugins.sink.s3.S3Sink.<init>(S3Sink.java:63) ~[s3-sink-2.4.0-SNAPSHOT.jar:?]
	... 41 more

The S3 sink configuration has path_prefix defined under the object_key_options object. I think the correct behavior would be to remove path_prefix from the parquet codec configuration

To Reproduce
Create an S3 sink pipeline using the parquet codec without path_prefix defined in the codec config

Example S3 sink config

sink:
    - s3:
        aws:
          region: "us-west-2"
          sts_role_arn: "<my role>"
        bucket: "my-sink-bucket"
        object_key:
          path_prefix: "perf-sink"
        threshold:
          event_collect_timeout: 600s
          maximum_size: 128mb
        codec:
          parquet:
            schema: ...

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. Ubuntu 20.04 LTS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@engechas engechas added bug Something isn't working untriaged labels Aug 21, 2023
@dlvenable dlvenable self-assigned this Aug 21, 2023
@dlvenable dlvenable added this to the v2.4 milestone Aug 21, 2023
dlvenable added a commit to dlvenable/data-prepper that referenced this issue Aug 22, 2023
…ill keep untested and errant code paths out of the project. Resolves opensearch-project#3201.

Signed-off-by: David Venable <[email protected]>
dlvenable added a commit to dlvenable/data-prepper that referenced this issue Aug 22, 2023
…ill keep untested and errant code paths out of the project. Resolves opensearch-project#3201.

Signed-off-by: David Venable <[email protected]>
dlvenable added a commit to dlvenable/data-prepper that referenced this issue Aug 22, 2023
…ill keep untested and errant code paths out of the project. Resolves opensearch-project#3201.

Signed-off-by: David Venable <[email protected]>
dlvenable added a commit that referenced this issue Aug 22, 2023
…ill keep untested and errant code paths out of the project. Resolves #3201. (#3205)

Signed-off-by: David Venable <[email protected]>
@github-project-automation github-project-automation bot moved this from Unplanned to Done in Data Prepper Tracking Board Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants