-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled batch supervisor #17353
Open
abhishekrb19
wants to merge
68
commits into
apache:master
Choose a base branch
from
abhishekrb19:scheduled_batch_supervisor
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Scheduled batch supervisor #17353
abhishekrb19
wants to merge
68
commits into
apache:master
from
abhishekrb19:scheduled_batch_supervisor
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
Area - Batch Ingestion
Area - Querying
Area - Dependencies
Area - Ingestion
Area - MSQ
For multi stage queries - https://github.com/apache/druid/issues/12262
labels
Oct 15, 2024
indexing-service/src/main/java/org/apache/druid/indexing/batch/BatchSupervisor.java
Fixed
Show fixed
Hide fixed
indexing-service/src/main/java/org/apache/druid/indexing/batch/BatchSupervisorSpec.java
Fixed
Show fixed
Hide fixed
indexing-service/src/main/java/org/apache/druid/indexing/batch/ScheduledBatchScheduler.java
Fixed
Show fixed
Hide fixed
indexing-service/src/main/java/org/apache/druid/indexing/batch/ScheduledBatchSupervisor.java
Fixed
Show fixed
Hide fixed
...xing-service/src/main/java/org/apache/druid/indexing/batch/ScheduledBatchSupervisorSpec.java
Fixed
Show fixed
Hide fixed
Please enter the commit message for your changes. Lines starting
…g with other server clients.
…okerServiceModule.
4 tasks
…work as intended.
abhishekrb19
added a commit
that referenced
this pull request
Dec 6, 2024
… reusability (#17542) This PR contains non-functional / refactoring changes of the following classes in the sql module: 1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain 2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java 3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic. 4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker. 5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well. This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.
The jobsExecutor is reponsible for submitting jobs to the broker for all scheduled batch supervisors. The cronExecutor was simply decoupling the submitting of jobs from the actual scheduled running of jobs. The service client library already has async threads, so we remove the cronExecutor to keep things simple as things are handled in an async manner by the service client. If we ever observe and see evidence of bottlenecks around task submission, etc, we should be able to make jobsExecutor multiple threaded instead of single threaded.
...service/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchSupervisor.java
Dismissed
Show dismissed
Hide dismissed
...ervice/src/main/java/org/apache/druid/indexing/scheduledbatch/ScheduledBatchTaskManager.java
Fixed
Show fixed
Hide fixed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Area - Batch Ingestion
Area - Dependencies
Area - Ingestion
Area - MSQ
For multi stage queries - https://github.com/apache/druid/issues/12262
Area - Querying
Design Review
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change introduces a scheduled batch supervisor in Druid. The supervisor periodically wakes up to submit an MSQ ingest query, allowing users to automate batch ingestion directly within Druid. Think of it as simple batch task workflows natively integrated into Druid, though it doesn't replace more sophisticated workflow management systems like Apache Airflow. This is an experimental feature.
Summary of changes:
The
scheduled_batch
supervisor can be configured as follows:The supervisor will submit the
REPLACE
sql query repeatedly every 5 minutes. The supervisor supports two types of cron scheduler configurations:unix
.*/5 * * * *
to schedule the SQL task every 5 minutes.@daily
,@hourly
,@monthly
, etc.quartz
.0 0 0 ? 3,6,9,12 MON-FRI
to schedule tasks at midnight on weekdays during March, June, September, and December.Key points:
query
along with any context in thespec
. This structure is identical to what the MSQ task engine accepts.spec
as-is on its schedule.Some implementation details:
indexing-service
module now depends on thedruid-sql
module. This allows the scheduled batch supervisor, running on the Overlord, to communicate with the Broker to:a. Validate and parse the user-specified query.
b. Submit MSQ queries to the
/druid/v2/sql/task/
endpoint.ScheduledBatchScheduler
is injected in the Overlord, which is responsible for scheduling and unscheduling all scheduled batch instances.BrokerClient
implementation has been added, leveraging theServiceClient
functionality.SqlTaskStatus
and its unit testSqlTaskStatusTest
have been moved from the msq module to the sql module so it can be reused by the BrokerClient implementation in the sql module.ExplainPlanInformation
class, which is used to deserialize the explain plan response from the Broker.The status API response for the supervisor contains the scheduler state along with active and completed tasks:
Future Improvements:
This PR has: