Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Explore shuffle config heuristics in the plugin/auto-tuner to reduce spilling but increase throughput #12122

Open
revans2 opened this issue Feb 12, 2025 · 1 comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@revans2
Copy link
Collaborator

revans2 commented Feb 12, 2025

Is your feature request related to a problem? Please describe.
Spark has lots of configs related to shuffle. The following configs are intended to give some control over the size of the data that each task can receive.

spark.sql.shuffle.partitions
spark.sql.adaptive.coalescePartitions.initialPartitionNum
spark.sql.adaptive.advisoryPartitionSizeInBytes
spark.sql.adaptive.coalescePartitions.minPartitionSize
spark.sql.adaptive.coalescePartitions.minPartitionNum
spark.sql.adaptive.coalescePartitions.parallelismFirst

Ideally we want to be able to give the GPU lots of data and not have to worry too much about running out of GPU memory, which can cause spilling and increase the run time of a query. But we know that spilling when we get lots of data is going to be inevitable in some situations.

So the plan is to.

  1. Finish implementing [FEA] triple buffering/pipelineing for SQL #11343 which should hopefully reduce/eliminate the round robin memory pressure problem.
  2. See if we override/augment the AQE planning that coalesces partitions. The idea would be that we could look at the plan and know which nodes in the plan cache a significant amount of data on the GPU. Which operators increase the size of the data on the GPU, and which might decrease it. With that we can then, in theory, adjust the target shuffle size to avoid overloading GPU memory.

Please note that there are a lot of heuristics that would need to be deployed as a part of this. Specifically size estimation for various stages of the plan. #12121 would be a great addition to this, but we probably need an okay set of heuristics to start out with as we cannot guarantee that these will be available.

@revans2 revans2 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Feb 12, 2025
@binmahone
Copy link
Collaborator

Besides configs, will we also think about exploiting hints in sql (https://spark.apache.org/docs/3.5.4/sql-ref-syntax-qry-select-hints.html) ? In the context of Auto tuner we already have real metrics for the Vanilla Spark run, it might be beneficial if we modify the query by adding a lot of hints to it and then run on GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants