-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver Pod Creation Delay When Submitting Spark Jobs with Kubeflow Spark Operator #2374
Comments
We are also facing a similar issue, where SparkApplication waits around 10m to schedule via SparkOperator |
Are there any custom helm values? |
@ramsinghtmdc How many Spark jobs submitted in the same time? |
I suggest to configure proper values For a large consistent operator we are using spark operator controller pod: Worker queue config Controller config |
@ChenYi015 More detailed documentation about these properties/values and their use cases would be beneficial. While the value file includes some one-liners, it takes time to understand which combination of worker configurations aligns with specific node group sizes. Clearer guidance would save a lot of effort! @bnetzi The default value of
What does this property used for? Since every driver will be launching multiple executors, why are we tracking only one? |
@nitishtw - As far as I understand, the only benefit from tracking executors is that you get in the spark application object their status as well. For our needs 1 is sufficient to see easily cases where the executors failed to start, more than that is just noise. |
@bnetzi tried same configuration you mentioned above; my spark operator is running on a All jobs directly go to the In spark operator logs, it says that it failed to find the jar dependencies -
Is this expected? Is some kind of throttling happening here while fetching from maven central repo? |
What question do you want to ask?
Thanks.
Additional context
No response
Have the same question?
Give it a 👍 We prioritize the question with most 👍
The text was updated successfully, but these errors were encountered: