Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename os source rate/job_count to interval/count, acquire UNASSIGNED partitions before CLOSED partitions #3327

Merged
merged 2 commits into from
Sep 16, 2023

Conversation

graytaylor0
Copy link
Member

@graytaylor0 graytaylor0 commented Sep 12, 2023

Description

  • rename opensearch source scheduling rate and job_count to interval and count
  • Change the order that source coordination stores acquire partitions to check for UNASSIGNED partitions before checking CLOSED. This has the following benefits
    1. Pipelines that do not use the CLOSED functionality (like s3 scan) will not waste a query to the store whenever a partition is acquired
    2. CLOSED partitions will not starve UNASSIGNED partitions when the reOpenAt time is small, and instead will guarantee that all UNASSIGNED partitions are processed before CLOSED partitions
  • Remove query from the OpenSearch source configuration, as it is unused in the code

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… partitions before CLOSED partitions

Signed-off-by: Taylor Gray <[email protected]>

@Min(1)
@JsonProperty("job_count")
private int jobCount = 1;
@JsonProperty("count")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some other noun modifier we can use instead of "job"? Count is vague and it is hard to understand what is happening here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really represents the the number of times each index will be processed. Do you have a name suggestion? index_processing_count seems a little verbose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be verbose, but I can understand it. Perhaps index_read_count? Or maybe processing_count to avoid the "per-index" idea.

Copy link
Member Author

@graytaylor0 graytaylor0 Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is per index though. I like index_read_count

@dlvenable dlvenable merged commit dca14b4 into opensearch-project:main Sep 16, 2023
24 checks passed
asifsmohammed pushed a commit to asifsmohammed/data-prepper that referenced this pull request Sep 27, 2023
… partitions before CLOSED partitions (opensearch-project#3327)

* Rename os source rate/job_count to interval/count, acquire UNASSIGNED partitions before CLOSED partitions
Signed-off-by: Taylor Gray <[email protected]>

* Rename count to index_read_count

Signed-off-by: Taylor Gray <[email protected]>

---------

Signed-off-by: Taylor Gray <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants