Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: improved GPU job support #173

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

feat!: improved GPU job support #173

wants to merge 15 commits into from

Conversation

cmeesters
Copy link
Member

@cmeesters cmeesters commented Nov 25, 2024

In the light of more and more accelerator applications (AI, base mapping, ...) the fall-back onto slurm_extra becomes a bit tedious to use. Hence, the resource support for gres.

Addresses issue #52 (and to a minor extent: #18 and #104). Supersedes PR #172 .

Summary by CodeRabbit

  • New Features

    • Enhanced handling of SLURM job submissions for GPU resources.
    • Introduced a new function for constructing GRES strings based on job specifications.
    • Added documentation section on "GPU Jobs" to guide users on requesting GPU resources.
  • Bug Fixes

    • Added stricter checks for GPU resource specifications.
    • Improved error handling and reporting for job submission processes.

Copy link
Contributor

coderabbitai bot commented Nov 25, 2024

Walkthrough

The changes enhance the SLURM job submission process within the Snakemake executor plugin by improving the handling of GPU resources and Generic Resource (GRES) specifications. A new method, set_gres_string, is introduced for constructing GRES strings based on job specifications. The logic for determining task numbers has been updated to differentiate between GPU and non-GPU jobs, and error handling has been refined to provide clearer feedback on submission failures.

Changes

File Change Summary
snakemake_executor_plugin_slurm/__init__.py - Modified run_job method to handle GPU and GRES resources
- Added set_gres_string function for GRES string construction
- Updated get_cpu_setting method for flexible CPU allocation
- Enhanced error handling for SLURM job submissions
snakemake_executor_plugin_slurm/utils.py - Introduced set_gres_string function with validation for GRES and GPU inputs
- Added import for JobExecutorInterface
docs/further.md - Added new section on "GPU Jobs" detailing GPU resource requests and sample workflow profiles

Possibly related PRs

  • feat: custom log file behaviour #159: The changes in this PR also modify the run_job method in the Executor class within snakemake_executor_plugin_slurm/__init__.py, enhancing its functionality related to SLURM job management, which is directly related to the modifications made in the main PR.

Suggested reviewers

  • fgvieira
  • johanneskoester
  • dlaehnemann

Poem

🐰 In SLURM's realm, where tasks align,
A GRES string crafted, oh so fine!
With GPUs dancing, resources in play,
Errors now clear, come what may!
A rabbit's delight, in code we trust,
For smooth job runs, it's a must! 🖥️

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
docs/further.md (2)

121-121: Enhance the GRES resource documentation with more details and examples

While the current description is accurate, it would be helpful to expand it with:

  • Common GRES types (e.g., gpu, mps, fpga)
  • Multiple resource syntax (e.g., gpu:tesla:2)
  • Resource property specifications
  • Links to SLURM's GRES documentation

Consider updating the table entry to something like:

-| `--gres`  | `gres` | generic resource specification, e.g. `gres=gpu:1`, allows for arbitray strings after the `=` |
+| `--gres`  | `gres` | Generic Resource Scheduling specification. Examples: `gres=gpu:1` (1 GPU), `gres=gpu:tesla:2` (2 Tesla GPUs), `gres=mps:100` (100% MPS). See [SLURM GRES Guide](https://slurm.schedmd.com/gres.html) for details. |

Line range hint 123-130: Add a dedicated GRES example for AI/ML workloads

Given the PR's focus on GRES support for AI applications, it would be valuable to add a specific example showing how to request GPU resources.

Consider adding this example after the current generic example:

 rule:
     input: ...
     output: ...
     resources:
         partition=<partition name>
         runtime=<some number>
+
+# Example: GPU-accelerated deep learning training
+rule train_model:
+    input: "data/training_set.h5"
+    output: "models/trained_model.pt"
+    resources:
+        # Request 2 NVIDIA A100 GPUs
+        gres="gpu:a100:2",
+        # Ensure sufficient CPU cores and memory for data preprocessing
+        cpus_per_task=8,
+        mem_mb=32000
snakemake_executor_plugin_slurm/__init__.py (1)

Line range hint 313-315: Consider making status_attempts configurable

The number of status check attempts is hardcoded. Consider making it configurable through the ExecutorSettings class to allow users to adjust based on their cluster's characteristics.

+    status_check_attempts: int = field(
+        default=5,
+        metadata={
+            "help": "Number of attempts to check job status before giving up",
+            "env_var": False,
+            "required": False,
+        },
+    )

-        status_attempts = 5
+        status_attempts = self.workflow.executor_settings.status_check_attempts
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 531ebc6 and b58dc97.

📒 Files selected for processing (2)
  • docs/further.md (1 hunks)
  • snakemake_executor_plugin_slurm/__init__.py (1 hunks)
🔇 Additional comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

Line range hint 238-251: LGTM: Robust error handling implementation

The error handling for job submission is comprehensive, covering both return code errors and submission failure messages, with clear error reporting.


Line range hint 174-198: LGTM: Comprehensive resource management

The implementation handles various SLURM resources effectively with appropriate warnings and fallback mechanisms. The code properly manages:

  • Runtime constraints
  • Memory allocation (both per-job and per-CPU)
  • Constraint specifications

snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved
@cmeesters
Copy link
Member Author

@CodeRabbit no, this PR only adds gres as a possible resource specification. As stated by you, other PRs address other features.

Copy link
Contributor

coderabbitai bot commented Nov 25, 2024

Tip

For best results, initiate chat on the files or code changes.

@cmeesters: Understood, thank you for the clarification. I acknowledge that this PR only adds gres as a possible resource specification, and other features are addressed in other PRs.


✏️ Learnings added
Learnt from: cmeesters
URL: https://github.com/snakemake/snakemake-executor-plugin-slurm/pull/173

Timestamp: 2024-11-25T21:10:22.150Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

161-170: LGTM! Well-implemented GRES support.

The implementation correctly validates the GRES format and provides clear error messages. The regex pattern accurately matches SLURM's GRES syntax requirements.

Consider adding code comments with common GRES examples to help users:

 if job.resources.get("gres"):
+    # Common GRES examples:
+    # gpu:1         - Request 1 GPU of any type
+    # gpu:tesla:2   - Request 2 Tesla GPUs
+    # mps:100       - Request 100% of MPS (CUDA Multi-Process Service)
     # Validate GRES format (e.g., "gpu:1", "gpu:tesla:2")
     gres = job.resources.gres
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b58dc97 and 4f9035f.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (1 hunks)

Copy link
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. In addition, I would however suggest to further support two resources gpu: int and gpu_model: str, which can be internally translated into a gres parametrization (while an explicit gres specification takes always precedence). This way, we are consistent with an emerging common practice, which is currently also being implemented for kubernetes here: snakemake/snakemake-executor-plugin-kubernetes#31

@cmeesters
Copy link
Member Author

@johanneskoester SLURM's generic approach is to write --gres=gpu:<hardwaremodel>:n in this case. Alas, it allows for two different kinds of plugins, to alter this behaviour. This is why I first settled for the slurm_extra string as a catch-all. Meanwhile, more and more admins settled for the generic approach. Anyone on a deviating cluster will still have to use slurm_extra.

I'm fine with providing two more resources for the sake of consistency. However, we need to keep an eye on ease, too.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

161-169: Enhance GRES format validation and resource specification flexibility

The current regex enforces [a-zA-Z0-9]+:([a-zA-Z0-9]+:)?\d+, disallowing underscores or hyphens in resource names. SLURM setups sometimes include underscores (e.g., gpu:tesla_v100:2). Consider extending the pattern to capture these valid resource naming conventions. Also, use caution if resource values could come from untrusted sources; adding shell quoting enhances safety.

- if not re.match(r"^[a-zA-Z0-9]+:([a-zA-Z0-9]+:)?\d+$", gres):
+ if not re.match(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_]+){0,1}:\d+$", gres):
    raise WorkflowError(
        f"Invalid GRES format: {gres}. Expected format: "
        "'<name>:<number>' or '<name>:<type>:<number>'"
    )

170-183: Consider supporting GPU without specifying a model

Right now, if gpu is set but gpu_model is missing, this block does nothing. You might allow a simpler gpu:<number> usage for users who don’t need a specific GPU model. If that’s not intended, it might help to clarify via error or docstring that both fields are required.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f9035f and 350577c.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (1 hunks)
🧰 Additional context used
📓 Learnings (1)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

@cmeesters
Copy link
Member Author

@coderabbitai jolly good remark in your last comment. I hope the last commit is an improvement. Now, the code base is growing too big, IMO, and needs to be outsourced to a different module.

Copy link
Contributor

coderabbitai bot commented Jan 2, 2025

Tip

For best results, initiate chat on the files or code changes.

@cmeesters: Thank you for the update! Your latest commit does appear to address the prior remark, and it’s great to see improvement. Regarding the code base growing too large, would you like to open a separate PR or GitHub issue to track refactoring to a dedicated module? I’m happy to assist as needed. Let me know how you’d like to proceed.


🧠 Learnings used
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)

86-87: Consider using a more strict regex pattern for GRES validation.

The current GRES regex pattern allows consecutive underscores and potentially invalid combinations. Consider using a more restrictive pattern:

-    gres_re = re.compile(r"^[a-zA-Z0-9_]+:([a-zA-Z0-9_]+:)?\d+$")
+    gres_re = re.compile(r"^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*:(?:[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*:)?\d+$")

173-184: Consider improving error handling and variable scope.

The implementation is correct but could be improved:

  1. Initialize gres_string at the beginning of the method to avoid potential undefined variable issues
  2. Add more context to the GPU integer validation error message
+    gres_string = ""  # Initialize at the beginning
     if job.resources.get("gpu"):
         if job.resources.get("gres"):
             raise WorkflowError(
                 "GRES and GPU are set. Please only set one of them."
             )
         if not isinstance(job.resources.gpu, int):
             raise WorkflowError(
-                "The 'gpu' resource must be an integer. "
-                f"Got: {job.resources.gpu} ({type(job.resources.gpu)})."
+                f"The 'gpu' resource must be an integer specifying the number of GPUs. "
+                f"Got: {job.resources.gpu} of type {type(job.resources.gpu).__name__}."
             )

186-198: Enhance GPU model validation error message.

The GPU model validation is correct, but the error message could be more helpful.

     if not Executor.gpu_model_re.match(job.resources.gpu_model):
         raise WorkflowError(
-            f"Invalid GPU model format: {job.resources.gpu_model}. "
-            "Expected format: '<name>'"
+            f"Invalid GPU model format: {job.resources.gpu_model}. "
+            "GPU model must contain only alphanumeric characters and underscores. "
+            "Example: 'tesla' or 'a100'"
         )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 350577c and 8d22386.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (2 hunks)
🧰 Additional context used
📓 Learnings (1)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
🔇 Additional comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

164-172: LGTM! Well-implemented GRES validation.

The GRES validation is thorough and provides clear error messages.


199-201: LGTM! Clean conditional assignment of GRES string.

The final GRES string assignment is well-structured and handles all cases correctly.

@cmeesters
Copy link
Member Author

@johanneskoester the code ought to be ok-ish. As mentioned, I rather refactor in the future to avoid an extended single file. That aside, please review. I would prefer to merge the main into this PR after PR #159 is done, because of the various changes in the docs and the help strings, which will only lead to merge conflicts, if done twice. (And I have plenty of other things on my mind, too.)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
snakemake_executor_plugin_slurm/__init__.py (3)

117-118: LGTM! Consider adding pattern documentation.

The regex patterns are well-defined and efficiently compiled at class level.

Consider adding docstrings to explain the expected formats:

 class Executor(RemoteExecutor):
+    # Validates GRES format (e.g., "gpu:1", "gpu:tesla:2")
     gres_re = re.compile(r"^[a-zA-Z0-9_]+:([a-zA-Z0-9_]+:)?\d+$")
+    # Validates GPU model names (e.g., "tesla", "a100")
     gpu_model_re = re.compile(r"^[a-zA-Z0-9_]+$")

234-262: Fix variable naming inconsistency.

The code uses both gpus and gpu in different contexts. On line 294, it checks for gpus but the resource is defined as gpu.

Apply this change for consistency:

-        gpu_job = job.resources.get("gpus") or "gpu" in job.resources.get("gres", "")
+        gpu_job = job.resources.get("gpu") or "gpu" in job.resources.get("gres", "")

309-311: Enhance comment clarity for CPU allocation.

The current comment could be more descriptive about the CPU allocation strategy.

-        # we need to set cpus-per-task OR cpus-per-gpu, the function
-        # will return a string with the corresponding value
+        # Determine CPU allocation strategy:
+        # - For GPU jobs: sets --cpus-per-gpu based on job resources
+        # - For non-GPU jobs: sets --cpus-per-task based on job resources
+        # The get_cpus_per_task function returns the appropriate SLURM parameter
         call += f" {get_cpus_per_task(job, gpu_job)}"
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d22386 and 2b50601.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (4 hunks)
🧰 Additional context used
📓 Learnings (1)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: formatting
  • GitHub Check: testing
🔇 Additional comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

225-233: LGTM! Robust GRES validation implementation.

The GRES validation logic is thorough with clear error messages and follows SLURM's specifications.

@cmeesters
Copy link
Member Author

@johanneskoester and @coderabbitai - this PR is now depended on snakemake/snakemake-executor-plugin-slurm-jobstep#28 . A change in the cpu-settings function of the jobstep-Executor module was required, too.

Copy link
Contributor

coderabbitai bot commented Jan 13, 2025

Tip

For best results, initiate chat on the files or code changes.

@cmeesters: Thank you for informing about the dependency on PR #28 in the snakemake-executor-plugin-slurm-jobstep repository and the required changes in the cpu-settings function. I've noted this information for future reference.


✏️ Learnings added
Learnt from: cmeesters
URL: https://github.com/snakemake/snakemake-executor-plugin-slurm/pull/173

Timestamp: 2025-01-13T09:54:22.793Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b50601 and 0e18736.

📒 Files selected for processing (2)
  • snakemake_executor_plugin_slurm/__init__.py (4 hunks)
  • snakemake_executor_plugin_slurm/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake_executor_plugin_slurm/init.py
🧰 Additional context used
📓 Learnings (1)
📓 Common learnings
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
🪛 Ruff (0.8.2)
snakemake_executor_plugin_slurm/utils.py

67-67: Undefined name WorkflowError

(F821)


76-76: Undefined name WorkflowError

(F821)


79-79: Undefined name WorkflowError

(F821)


87-87: Undefined name WorkflowError

(F821)


90-90: Undefined name WorkflowError

(F821)


99-99: Undefined name WorkflowError

(F821)


106-106: Undefined name WorkflowError

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: testing
  • GitHub Check: formatting

snakemake_executor_plugin_slurm/utils.py Outdated Show resolved Hide resolved
snakemake_executor_plugin_slurm/utils.py Outdated Show resolved Hide resolved
@cmeesters cmeesters changed the title feat: added the job resource 'gres' to be considered feat!: improved GPU job support Jan 13, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
snakemake_executor_plugin_slurm/utils.py (1)

51-63: Enhance function documentation with format examples

While the regular expressions are well-defined, the docstring could be more helpful by including example formats and explaining the relationship between gres, gpu, and gpu_model parameters.

Add examples to the docstring:

 def set_gres_string(job: JobExecutorInterface) -> str:
     """
     Function to set the gres string for the SLURM job
     based on the resources requested in the job.
+
+    Examples:
+        - gres="gpu:1"  # Request 1 GPU
+        - gres="gpu:tesla:2"  # Request 2 Tesla GPUs
+        - gpu=2  # Request 2 GPUs
+        - gpu=1, gpu_model="tesla"  # Request 1 Tesla GPU
     """
docs/further.md (2)

93-94: Fix grammatical issue in documentation

The sentence structure needs improvement.

Apply this diff:

-SLURM allows to specify GPU request with the `--gres` or `--gpus` flags, Snakemake takes a similar approach. Resources can be asked for with
+SLURM allows you to specify GPU requests using the `--gres` or `--gpus` flags, and Snakemake takes a similar approach. Resources can be requested using:
🧰 Tools
🪛 LanguageTool

[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution. ### GPU Jobs SLURM allows to specify GPU request with the --gresor--gpu...

(ALLOW_TO)


102-114: Enhance documentation examples

While the current examples are good, they could be more comprehensive to cover all supported scenarios.

Consider adding these additional examples:

 ```YAML
 set-resources:
     gres_request_rule:
         gres: "gpu:1"
 
     multi_gpu_rule:
         gpus: 2
         cpus_per_gpu: 4
 
     no_cpu_gpu_rule:
         gpus: 1
         cpus_per_gpu: -1
+
+    gpu_model_rule:
+        gpu_model: "tesla"
+        gpu: 2
+        cpus_per_gpu: 2
+
+    gres_advanced_rule:
+        gres: "gpu:tesla:4"

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 0e1873603d3b3bce48e9ba09ee7579fe4737191b and df62a6f8d243999ff8e80f4c9ce1ab95767bab05.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `docs/further.md` (1 hunks)
* `snakemake_executor_plugin_slurm/utils.py` (2 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>📓 Learnings (2)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds gres as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.


Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.


</details>
<details>
<summary>snakemake_executor_plugin_slurm/utils.py (1)</summary>

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds gres as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.


</details>

</details><details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>docs/further.md</summary>

[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution`.  ### GPU Jobs  SLURM allows to specify GPU request with the `--gres` or `--gpu...

(ALLOW_TO)

---

[uncategorized] ~95-~95: Loose punctuation mark.
Context: ...esources can be asked for with  - `gres`, e.g. `gres=gpu:1` or `gres=gpu:tesla:2`...

(UNLIKELY_OPENING_PUNCTUATION)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (1)</summary>

* GitHub Check: testing

</details>

<details>
<summary>🔇 Additional comments (2)</summary><blockquote>

<details>
<summary>snakemake_executor_plugin_slurm/utils.py (2)</summary>

`4-10`: **LGTM! Imports are well-organized and complete.**

All necessary imports are present, including the previously missing WorkflowError import.

---

`64-74`: _:warning: Potential issue_

**Fix potential control flow issue with gres_string**

The gres_string is set when job.resources.get("gres") is True, but there's no else clause. This could lead to undefined variable errors in subsequent code.

Apply this diff to fix the issue:
```diff
     if job.resources.get("gres"):
         # Validate GRES format (e.g., "gpu:1", "gpu:tesla:2")
         gres = job.resources.gres
         if not gres_re.match(gres):
             raise WorkflowError(
                 f"Invalid GRES format: {gres}. Expected format: "
                 "'<name>:<number>' or '<name>:<type>:<number>' "
                 "(e.g., 'gpu:1' or 'gpu:tesla:2')"
             )
         gres_string = f" --gres={job.resources.gres}"
+    else:
+        gres_string = ""

Likely invalid or redundant comment.

Comment on lines +75 to +100
if job.resources.get("gpus"):
# ensure that gres is not set, if gpu and gpu_model are set
if job.resources.get("gres"):
raise WorkflowError("GRES and GPU are set. Please only set one" " of them.")
# validate GPU format
if not gpus_re.match(str(job.resources.gpu)):
raise WorkflowError(
f"Invalid GPU format: {job.resources.gpu}. "
"Expected format: '<number>' or '<name>:<number>' "
"(e.g., '1' or 'tesla:2')"
)
gres_string = f" --gpus={job.resources.gpu}"
elif job.resources.get("gpu_model") and job.resources.get("gpu"):
# validate GPU model format
if not gpu_model_re.match(job.resources.gpu_model):
raise WorkflowError(
f"Invalid GPU model format: {job.resources.gpu_model}. "
"Expected format: '<name>' "
"(e.g., 'tesla')"
)
gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"
elif job.resources.get("gpu_model") and not job.resources.get("gpu"):
raise WorkflowError(
"GPU model is set, but no GPU number is given. " "Please set 'gpu' as well."
)
return gres_string if job.resources.get("gres") or job.resources.get("gpu") else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Simplify GPU resource handling logic

The current implementation has overlapping conditions and could be simplified using a more structured approach.

Consider this refactored version:

-    if job.resources.get("gpus"):
-        # ensure that gres is not set, if gpu and gpu_model are set
-        if job.resources.get("gres"):
-            raise WorkflowError("GRES and GPU are set. Please only set one" " of them.")
-        # validate GPU format
-        if not gpus_re.match(str(job.resources.gpu)):
-            raise WorkflowError(
-                f"Invalid GPU format: {job.resources.gpu}. "
-                "Expected format: '<number>' or '<name>:<number>' "
-                "(e.g., '1' or 'tesla:2')"
-            )
-        gres_string = f" --gpus={job.resources.gpu}"
-    elif job.resources.get("gpu_model") and job.resources.get("gpu"):
-        # validate GPU model format
-        if not gpu_model_re.match(job.resources.gpu_model):
-            raise WorkflowError(
-                f"Invalid GPU model format: {job.resources.gpu_model}. "
-                "Expected format: '<name>' "
-                "(e.g., 'tesla')"
-            )
-        gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"
-    elif job.resources.get("gpu_model") and not job.resources.get("gpu"):
-        raise WorkflowError(
-            "GPU model is set, but no GPU number is given. " "Please set 'gpu' as well."
-        )
+    if not job.resources.get("gres"):
+        if job.resources.get("gpu_model"):
+            if not job.resources.get("gpu"):
+                raise WorkflowError(
+                    "GPU model is set, but no GPU number is given. Please set 'gpu' as well."
+                )
+            if not gpu_model_re.match(job.resources.gpu_model):
+                raise WorkflowError(
+                    f"Invalid GPU model format: {job.resources.gpu_model}. "
+                    "Expected format: '<name>' (e.g., 'tesla')"
+                )
+            gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"
+        elif job.resources.get("gpu"):
+            if not gpus_re.match(str(job.resources.gpu)):
+                raise WorkflowError(
+                    f"Invalid GPU format: {job.resources.gpu}. "
+                    "Expected format: '<number>' or '<name>:<number>' "
+                    "(e.g., '1' or 'tesla:2')"
+                )
+            gres_string = f" --gpus={job.resources.gpu}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if job.resources.get("gpus"):
# ensure that gres is not set, if gpu and gpu_model are set
if job.resources.get("gres"):
raise WorkflowError("GRES and GPU are set. Please only set one" " of them.")
# validate GPU format
if not gpus_re.match(str(job.resources.gpu)):
raise WorkflowError(
f"Invalid GPU format: {job.resources.gpu}. "
"Expected format: '<number>' or '<name>:<number>' "
"(e.g., '1' or 'tesla:2')"
)
gres_string = f" --gpus={job.resources.gpu}"
elif job.resources.get("gpu_model") and job.resources.get("gpu"):
# validate GPU model format
if not gpu_model_re.match(job.resources.gpu_model):
raise WorkflowError(
f"Invalid GPU model format: {job.resources.gpu_model}. "
"Expected format: '<name>' "
"(e.g., 'tesla')"
)
gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"
elif job.resources.get("gpu_model") and not job.resources.get("gpu"):
raise WorkflowError(
"GPU model is set, but no GPU number is given. " "Please set 'gpu' as well."
)
return gres_string if job.resources.get("gres") or job.resources.get("gpu") else ""
if not job.resources.get("gres"):
if job.resources.get("gpu_model"):
if not job.resources.get("gpu"):
raise WorkflowError(
"GPU model is set, but no GPU number is given. Please set 'gpu' as well."
)
if not gpu_model_re.match(job.resources.gpu_model):
raise WorkflowError(
f"Invalid GPU model format: {job.resources.gpu_model}. "
"Expected format: '<name>' (e.g., 'tesla')"
)
gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"
elif job.resources.get("gpu"):
if not gpus_re.match(str(job.resources.gpu)):
raise WorkflowError(
f"Invalid GPU format: {job.resources.gpu}. "
"Expected format: '<number>' or '<name>:<number>' "
"(e.g., '1' or 'tesla:2')"
)
gres_string = f" --gpus={job.resources.gpu}"
return gres_string if job.resources.get("gres") or job.resources.get("gpu") else ""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants