-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect GPU resource specifications #172
Conversation
WalkthroughThe changes involve modifications to the Changes
Sequence Diagram(s)sequenceDiagram
participant Job as JobExecutorInterface
participant Executor as Executor
participant SLURM as SLURM System
Job->>Executor: Submit Job
Executor->>Executor: Check for GPU resources
alt GPU resources found
Executor->>SLURM: Submit with --gres=gpu:<count>
else No GPU resources
Executor->>SLURM: Submit without GPU
end
SLURM-->>Executor: Job ID
Executor-->>Job: Return Job ID or error
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)
195-196
: Refactor Suggestion: Usestartswith()
Method for String Prefix CheckFor better readability and to handle strings shorter than 7 characters without error, use the
startswith()
method instead of slicing when checking the GPU model prefix.Apply this diff:
- if gpu[:7] == "nvidia-": + if gpu.startswith("nvidia-"):
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)
189-197
: LGTM! GPU resource handling looks good.The implementation correctly handles GPU resource specifications and follows SLURM's GRES format requirements. Good job on supporting both generic "gpu" and NVIDIA-specific "nvidia_gpu" resource keys.
Consider adding input validation and documentation:
gpu_count = job.resources.get("gpu", job.resources.get("nvidia_gpu", -1)) +# Validate gpu_count if specified +if gpu_count != -1 and gpu_count <= 0: + raise ValueError(f"Invalid GPU count specified: {gpu_count}. Must be positive.") if gpu_count != -1:Also consider adding a comment explaining the dual key support:
+# Support both generic 'gpu' and backward-compatible 'nvidia_gpu' resource keys gpu_count = job.resources.get("gpu", job.resources.get("nvidia_gpu", -1))
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
snakemake_executor_plugin_slurm/__init__.py
(1 hunks)
🔇 Additional comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)
188-197
: Verify GPU allocation behavior with SLURM.
The GPU resource handling is well-integrated with the existing SLURM workflow. However, it would be good to verify the behavior when combined with other resource specifications.
✅ Verification successful
GPU resource handling is correctly integrated with SLURM specifications
The implementation follows SLURM's standard resource specification patterns:
- Correctly handles both generic GPU requests (
gpu
count) and NVIDIA-specific requests (nvidia_gpu
) - Properly formats
--gres=gpu
specifications with optional GPU model - Integrates well with other resource specifications (placed between node specification and task count)
- Follows SLURM's expected syntax for GPU resource allocation
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any existing GPU-related configurations or constraints in SLURM
# Check if any partitions are configured for GPU access
echo "=== Checking GPU partitions ==="
sinfo -o "%P %G" | grep -i "gpu"
# Check for any GPU-related constraints
echo "=== Checking GPU constraints ==="
sinfo -o "%P %F" | grep -i "gpu"
# Look for any existing GPU resource specifications in job templates or configurations
echo "=== Checking for GPU resource patterns in configuration files ==="
fd -e conf -e cfg . | xargs rg -l "gres.*gpu"
Length of output: 535
Script:
#!/bin/bash
# Let's analyze the codebase for GPU-related resource handling patterns
# Check for GPU resource handling patterns in Python files
echo "=== Checking GPU resource handling patterns ==="
rg -A 3 -B 3 "gres.*gpu"
# Look for other GPU-related resource configurations
echo "=== Checking other GPU-related configurations ==="
rg -A 3 -B 3 "gpu_|nvidia_gpu"
# Check for SLURM resource specification patterns
echo "=== Checking SLURM resource specification patterns ==="
rg -A 3 -B 3 "(--gres|--gpus)"
Length of output: 3248
Thank you for this PR.
I will only approve this particular PR, if it becomes a) generic (dropped nvidia specialities) and b) supports a |
Closed in favour of #173 |
Snakemake supports specification of required GPU resources in the Snakefile, i.e.
Before this patch, slurm executor ignored these specifications and unless the user manually made sure this doesn't happen, the jobs would run on CPU nodes.
This is relatively easy to fix, because like Snakemake, SLURM supports per-job GPU resource specification.
This patch ensures that GPU requirements from the Snakefile are relayed to SLURM via
Summary by CodeRabbit
New Features
Bug Fixes