Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Data prepper cannot start if otel-v1-apm-span index exists #3342

Closed
juergen-walter opened this issue Sep 15, 2023 · 3 comments · Fixed by #3560
Closed

[BUG] Data prepper cannot start if otel-v1-apm-span index exists #3342

juergen-walter opened this issue Sep 15, 2023 · 3 comments · Fixed by #3560
Assignees
Labels
bug Something isn't working
Milestone

Comments

@juergen-walter
Copy link
Contributor

Describe the bug

Data Prepper pods cannot (re)start (specifically the opensearch plugin) when there is a index named like an alias that is managed by data prepper. This bug occurs when no index alias exists (for example, if all OTel span indices were deleted), ingestion to data prepper is still ongoing and then data prepper restarts.

To Reproduce
Steps to reproduce the behavior:

  1. Setup opensearch and data prepper
  2. Ingest OTel traces/spans
  3. Delete all otel-v1-apm-span-.* indices (otel-v1-apm-span alias is removed automatically when deleting the indices)
  4. Ingest OTel traces/spans (which creates otel-v1-apm-span index)
  5. Try to restart data prepper pods. Pods fail to start with the following error log (full stack trace below)
    An index exists with the same name as the reserved index alias name [otel-v1-apm-span], please delete or migrate the existing index

Expected behavior

Data prepper should be able to start in situations where customers did not do anything obviously wrong.
One mitigation idea would be to automatically rename otel-v1-apm-span index to otel-v1-apm-span-000001 and add an otel-v1-apm-span alias pointing to the renamed index.

Stack trace

2023-09-13T12:55:45,750 [raw-pipeline-sink-worker-8-thread-1] ERROR org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink - Failed to initialize OpenSearch sink due to a configuration error.
org.opensearch.dataprepper.model.plugin.InvalidPluginConfigurationException: An index exists with the same name as the reserved index alias name [otel-v1-apm-span], please delete or migrate the existing index
        at org.opensearch.dataprepper.plugins.sink.opensearch.index.AbstractIndexManager.checkAndCreateIndex(AbstractIndexManager.java:258) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.index.AbstractIndexManager.setupIndex(AbstractIndexManager.java:198) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.doInitializeInternal(OpenSearchSink.java:180) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.doInitialize(OpenSearchSink.java:145) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.model.sink.AbstractSink.initialize(AbstractSink.java:49) ~[data-prepper-api-2.3.2.jar:?]
        at org.opensearch.dataprepper.pipeline.Pipeline.isReady(Pipeline.java:195) ~[data-prepper-core-2.3.2.jar:?]
        at org.opensearch.dataprepper.pipeline.Pipeline.lambda$execute$2(Pipeline.java:243) ~[data-prepper-core-2.3.2.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
2023-09-13T12:55:45,756 [raw-pipeline-sink-worker-8-thread-1] ERROR org.opensearch.dataprepper.pipeline.common.PipelineThreadPoolExecutor - Pipeline [raw-pipeline] process worker encountered a fatal exception, cannot proceed further
java.util.concurrent.ExecutionException: java.lang.RuntimeException: An index exists with the same name as the reserved index alias name [otel-v1-apm-span], please delete or migrate the existing index
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?]
        at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
        at org.opensearch.dataprepper.pipeline.common.PipelineThreadPoolExecutor.afterExecute(PipelineThreadPoolExecutor.java:70) [data-prepper-core-2.3.2.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1137) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.RuntimeException: An index exists with the same name as the reserved index alias name [otel-v1-apm-span], please delete or migrate the existing index
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.doInitialize(OpenSearchSink.java:152) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.model.sink.AbstractSink.initialize(AbstractSink.java:49) ~[data-prepper-api-2.3.2.jar:?]
        at org.opensearch.dataprepper.pipeline.Pipeline.isReady(Pipeline.java:195) ~[data-prepper-core-2.3.2.jar:?]
        at org.opensearch.dataprepper.pipeline.Pipeline.lambda$execute$2(Pipeline.java:243) ~[data-prepper-core-2.3.2.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        ... 2 more
Caused by: org.opensearch.dataprepper.model.plugin.InvalidPluginConfigurationException: An index exists with the same name as the reserved index alias name [otel-v1-apm-span], please delete or migrate the existing index
        at org.opensearch.dataprepper.plugins.sink.opensearch.index.AbstractIndexManager.checkAndCreateIndex(AbstractIndexManager.java:258) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.index.AbstractIndexManager.setupIndex(AbstractIndexManager.java:198) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.doInitializeInternal(OpenSearchSink.java:180) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.plugins.sink.opensearch.OpenSearchSink.doInitialize(OpenSearchSink.java:145) ~[opensearch-2.3.2.jar:?]
        at org.opensearch.dataprepper.model.sink.AbstractSink.initialize(AbstractSink.java:49) ~[data-prepper-api-2.3.2.jar:?]
        at org.opensearch.dataprepper.pipeline.Pipeline.isReady(Pipeline.java:195) ~[data-prepper-core-2.3.2.jar:?]
        at org.opensearch.dataprepper.pipeline.Pipeline.lambda$execute$2(Pipeline.java:243) ~[data-prepper-core-2.3.2.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        ... 2 more

Environment:

  • Version 2.1.1 (data prepper)
@KarstenSchnitter
Copy link
Collaborator

I had a quick look into the OpenSearch sink. My understanding is, that when the alias is lost, the bulk operation will create an index with the next batch to be indexed in OpenSearch. The OpenSearch Bulk API [1] has the option to require_alias, that would prevent this index creation. This can be configured in the BulkRequest.Builder, which is created in the OpenSearchSink by new BulkRequest.Builder() without further configuration:

if (isEstimateBulkSizeUsingCompression && isRequestCompressionEnabled) {
      final int maxLocalCompressionsForEstimation = openSearchSinkConfig.getIndexConfiguration().getMaxLocalCompressionsForEstimation();
      bulkRequestSupplier = () -> new JavaClientAccumulatingCompressedBulkRequest(new BulkRequest.Builder(), bulkSize, maxLocalCompressionsForEstimation);
    } else if (isEstimateBulkSizeUsingCompression) {
      LOG.warn("Estimate bulk request size using compression was enabled but request compression is disabled. " +
              "Estimating bulk request size without compression.");
      bulkRequestSupplier = () -> new JavaClientAccumulatingUncompressedBulkRequest(new BulkRequest.Builder());
    } else {
      bulkRequestSupplier = () -> new JavaClientAccumulatingUncompressedBulkRequest(new BulkRequest.Builder());
    }
}

I suggest to make require_alias configurable using true for at least the tracing setup, which is built with an alias and rotating index strategy. If OpenSearch rejects the bulk request because of a missing alias, the initialisation code that sets up alias and indices should be run to recreate the proper structures.

[1] https://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/

@juergen-walter
Copy link
Contributor Author

juergen-walter commented Oct 12, 2023

User experience is quite ugly: Users would assume complete loss of data as Opensearch Dashboards observability plugin shows "Trace Analytics not set up" because otel-v1-apm-span does not match otel-v1-apm-span-* pattern (which otel-v1-apm-span-000001 would match).

This is a huge blocker for us and the proposal by @KarstenSchnitter is looks quite nice: Would appreciate to get some feedback.

@dlvenable dlvenable added this to the v2.6 milestone Oct 18, 2023
@dlvenable
Copy link
Member

@KarstenSchnitter , @juergen-walter , Thank you both for looking into this issue. If I understand the situation, it is:

  1. Data Prepper runs and successfully indexes documents.
  2. The index and index alias are deleted.
  3. Data Prepper continues to run, but now _bulk requests are creating an index (not alias) with the name of the alias.
  4. Restarting Data Prepper fails because the name of the alias is now used as an index.

I think the solution you proposed to use require_alias seems a good solution here. I ran a quick test and this will result in a failure to write the documents to the index at all. It may also be appropriate to use this failure condition (index not found) to re-create the index alias.

From an implementation perspective I think we'd want to connect this flag to the IsmPolicyManagementStrategy interface. This interface has two implementations - one for aliases and one for non-aliased indexes. So this should be what determines whether or not to set require_alias to true or false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants