feat!: improved GPU job support #173

cmeesters · 2024-11-25T20:57:50Z

In the light of more and more accelerator applications (AI, base mapping, ...) the fall-back onto slurm_extra becomes a bit tedious to use. Hence, the resource support for gres.

Addresses issue #52 (and to a minor extent: #18 and #104). Supersedes PR #172 .

Summary by CodeRabbit

New Features
- Enhanced handling of SLURM job submissions for GPU resources.
- Introduced a new function for constructing GRES strings based on job specifications.
- Added documentation section on "GPU Jobs" to guide users on requesting GPU resources.
Bug Fixes
- Added stricter checks for GPU resource specifications.
- Improved error handling and reporting for job submission processes.

coderabbitai · 2024-11-25T20:57:56Z

Walkthrough

The changes enhance the SLURM job submission process within the Snakemake executor plugin by improving the handling of GPU resources and Generic Resource (GRES) specifications. A new method, set_gres_string, is introduced for constructing GRES strings based on job specifications. The logic for determining task numbers has been updated to differentiate between GPU and non-GPU jobs, and error handling has been refined to provide clearer feedback on submission failures.

Changes

File	Change Summary
`snakemake_executor_plugin_slurm/__init__.py`	- Modified `run_job` method to handle GPU and GRES resources - Added `set_gres_string` function for GRES string construction - Updated `get_cpu_setting` method for flexible CPU allocation - Enhanced error handling for SLURM job submissions
`snakemake_executor_plugin_slurm/utils.py`	- Introduced `set_gres_string` function with validation for GRES and GPU inputs - Added import for `JobExecutorInterface`
`docs/further.md`	- Added new section on "GPU Jobs" detailing GPU resource requests and sample workflow profiles

Possibly related PRs

feat: custom log file behaviour #159: The changes in this PR also modify the run_job method in the Executor class within snakemake_executor_plugin_slurm/__init__.py, enhancing its functionality related to SLURM job management, which is directly related to the modifications made in the main PR.

Suggested reviewers

fgvieira
johanneskoester
dlaehnemann

Poem

🐰 In SLURM's realm, where tasks align,
A GRES string crafted, oh so fine!
With GPUs dancing, resources in play,
Errors now clear, come what may!
A rabbit's delight, in code we trust,
For smooth job runs, it's a must! 🖥️

Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)

docs/further.md (2)
121-121: Enhance the GRES resource documentation with more details and examples

While the current description is accurate, it would be helpful to expand it with:

Common GRES types (e.g., gpu, mps, fpga)

Multiple resource syntax (e.g., gpu:tesla:2)

Resource property specifications

Links to SLURM's GRES documentation

Consider updating the table entry to something like:
-| `--gres`  | `gres` | generic resource specification, e.g. `gres=gpu:1`, allows for arbitray strings after the `=` |
+| `--gres`  | `gres` | Generic Resource Scheduling specification. Examples: `gres=gpu:1` (1 GPU), `gres=gpu:tesla:2` (2 Tesla GPUs), `gres=mps:100` (100% MPS). See [SLURM GRES Guide](https://slurm.schedmd.com/gres.html) for details. |
Line range hint 123-130: Add a dedicated GRES example for AI/ML workloads

Given the PR's focus on GRES support for AI applications, it would be valuable to add a specific example showing how to request GPU resources.

Consider adding this example after the current generic example:
 rule:
     input: ...
     output: ...
     resources:
         partition=<partition name>
         runtime=<some number>
+
+# Example: GPU-accelerated deep learning training
+rule train_model:
+    input: "data/training_set.h5"
+    output: "models/trained_model.pt"
+    resources:
+        # Request 2 NVIDIA A100 GPUs
+        gres="gpu:a100:2",
+        # Ensure sufficient CPU cores and memory for data preprocessing
+        cpus_per_task=8,
+        mem_mb=32000
snakemake_executor_plugin_slurm/__init__.py (1)
Line range hint 313-315: Consider making status_attempts configurable

The number of status check attempts is hardcoded. Consider making it configurable through the ExecutorSettings class to allow users to adjust based on their cluster's characteristics.
+    status_check_attempts: int = field(
+        default=5,
+        metadata={
+            "help": "Number of attempts to check job status before giving up",
+            "env_var": False,
+            "required": False,
+        },
+    )

-        status_attempts = 5
+        status_attempts = self.workflow.executor_settings.status_check_attempts

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 531ebc6 and b58dc97.

📒 Files selected for processing (2)

docs/further.md (1 hunks)
snakemake_executor_plugin_slurm/__init__.py (1 hunks)

🔇 Additional comments (2)

snakemake_executor_plugin_slurm/__init__.py (2)

Line range hint 238-251: LGTM: Robust error handling implementation

The error handling for job submission is comprehensive, covering both return code errors and submission failure messages, with clear error reporting.

Line range hint 174-198: LGTM: Comprehensive resource management

The implementation handles various SLURM resources effectively with appropriate warnings and fallback mechanisms. The code properly manages:

Runtime constraints
Memory allocation (both per-job and per-CPU)
Constraint specifications

snakemake_executor_plugin_slurm/__init__.py

cmeesters · 2024-11-25T21:09:56Z

@CodeRabbit no, this PR only adds gres as a possible resource specification. As stated by you, other PRs address other features.

coderabbitai · 2024-11-25T21:10:23Z

Tip

For best results, initiate chat on the files or code changes.

@cmeesters: Understood, thank you for the clarification. I acknowledge that this PR only adds gres as a possible resource specification, and other features are addressed in other PRs.

✏️ Learnings added

Learnt from: cmeesters
URL: https://github.com/snakemake/snakemake-executor-plugin-slurm/pull/173

Timestamp: 2024-11-25T21:10:22.150Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

snakemake_executor_plugin_slurm/__init__.py (1)
161-170: LGTM! Well-implemented GRES support.

The implementation correctly validates the GRES format and provides clear error messages. The regex pattern accurately matches SLURM's GRES syntax requirements.

Consider adding code comments with common GRES examples to help users:
 if job.resources.get("gres"):
+    # Common GRES examples:
+    # gpu:1         - Request 1 GPU of any type
+    # gpu:tesla:2   - Request 2 Tesla GPUs
+    # mps:100       - Request 100% of MPS (CUDA Multi-Process Service)
     # Validate GRES format (e.g., "gpu:1", "gpu:tesla:2")
     gres = job.resources.gres

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b58dc97 and 4f9035f.

📒 Files selected for processing (1)

snakemake_executor_plugin_slurm/__init__.py (1 hunks)

johanneskoester

Sounds reasonable. In addition, I would however suggest to further support two resources gpu: int and gpu_model: str, which can be internally translated into a gres parametrization (while an explicit gres specification takes always precedence). This way, we are consistent with an emerging common practice, which is currently also being implemented for kubernetes here: snakemake/snakemake-executor-plugin-kubernetes#31

cmeesters · 2024-11-29T09:14:50Z

@johanneskoester SLURM's generic approach is to write --gres=gpu:<hardwaremodel>:n in this case. Alas, it allows for two different kinds of plugins, to alter this behaviour. This is why I first settled for the slurm_extra string as a catch-all. Meanwhile, more and more admins settled for the generic approach. Anyone on a deviating cluster will still have to use slurm_extra.

I'm fine with providing two more resources for the sake of consistency. However, we need to keep an eye on ease, too.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

snakemake_executor_plugin_slurm/__init__.py (2)
161-169: Enhance GRES format validation and resource specification flexibility

The current regex enforces [a-zA-Z0-9]+:([a-zA-Z0-9]+:)?\d+, disallowing underscores or hyphens in resource names. SLURM setups sometimes include underscores (e.g., gpu:tesla_v100:2). Consider extending the pattern to capture these valid resource naming conventions. Also, use caution if resource values could come from untrusted sources; adding shell quoting enhances safety.
- if not re.match(r"^[a-zA-Z0-9]+:([a-zA-Z0-9]+:)?\d+$", gres):
+ if not re.match(r"^[a-zA-Z0-9_]+(:[a-zA-Z0-9_]+){0,1}:\d+$", gres):
    raise WorkflowError(
        f"Invalid GRES format: {gres}. Expected format: "
        "'<name>:<number>' or '<name>:<type>:<number>'"
    )
170-183: Consider supporting GPU without specifying a model

Right now, if gpu is set but gpu_model is missing, this block does nothing. You might allow a simpler gpu:<number> usage for users who don’t need a specific GPU model. If that’s not intended, it might help to clarify via error or docstring that both fields are required.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f9035f and 350577c.

📒 Files selected for processing (1)

snakemake_executor_plugin_slurm/__init__.py (1 hunks)

🧰 Additional context used

📓 Learnings (1)

📓 Common learnings

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

cmeesters · 2025-01-02T19:59:52Z

@coderabbitai jolly good remark in your last comment. I hope the last commit is an improvement. Now, the code base is growing too big, IMO, and needs to be outsourced to a different module.

coderabbitai · 2025-01-02T20:00:33Z

Tip

For best results, initiate chat on the files or code changes.

@cmeesters: Thank you for the update! Your latest commit does appear to address the prior remark, and it’s great to see improvement. Regarding the code base growing too large, would you like to open a separate PR or GitHub issue to track refactoring to a dedicated module? I’m happy to assist as needed. Let me know how you’d like to proceed.

🧠 Learnings used

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

snakemake_executor_plugin_slurm/__init__.py (3)

86-87: Consider using a more strict regex pattern for GRES validation.

The current GRES regex pattern allows consecutive underscores and potentially invalid combinations. Consider using a more restrictive pattern:

-    gres_re = re.compile(r"^[a-zA-Z0-9_]+:([a-zA-Z0-9_]+:)?\d+$")
+    gres_re = re.compile(r"^[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*:(?:[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*:)?\d+$")

173-184: Consider improving error handling and variable scope.

The implementation is correct but could be improved:

Initialize gres_string at the beginning of the method to avoid potential undefined variable issues
Add more context to the GPU integer validation error message

+    gres_string = ""  # Initialize at the beginning
     if job.resources.get("gpu"):
         if job.resources.get("gres"):
             raise WorkflowError(
                 "GRES and GPU are set. Please only set one of them."
             )
         if not isinstance(job.resources.gpu, int):
             raise WorkflowError(
-                "The 'gpu' resource must be an integer. "
-                f"Got: {job.resources.gpu} ({type(job.resources.gpu)})."
+                f"The 'gpu' resource must be an integer specifying the number of GPUs. "
+                f"Got: {job.resources.gpu} of type {type(job.resources.gpu).__name__}."
             )

186-198: Enhance GPU model validation error message.

The GPU model validation is correct, but the error message could be more helpful.

     if not Executor.gpu_model_re.match(job.resources.gpu_model):
         raise WorkflowError(
-            f"Invalid GPU model format: {job.resources.gpu_model}. "
-            "Expected format: '<name>'"
+            f"Invalid GPU model format: {job.resources.gpu_model}. "
+            "GPU model must contain only alphanumeric characters and underscores. "
+            "Example: 'tesla' or 'a100'"
         )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 350577c and 8d22386.

📒 Files selected for processing (1)

snakemake_executor_plugin_slurm/__init__.py (2 hunks)

🧰 Additional context used

📓 Learnings (1)

📓 Common learnings

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

🔇 Additional comments (2)

snakemake_executor_plugin_slurm/__init__.py (2)

164-172: LGTM! Well-implemented GRES validation.

The GRES validation is thorough and provides clear error messages.

199-201: LGTM! Clean conditional assignment of GRES string.

The final GRES string assignment is well-structured and handles all cases correctly.

cmeesters · 2025-01-03T10:11:42Z

@johanneskoester the code ought to be ok-ish. As mentioned, I rather refactor in the future to avoid an extended single file. That aside, please review. I would prefer to merge the main into this PR after PR #159 is done, because of the various changes in the docs and the help strings, which will only lead to merge conflicts, if done twice. (And I have plenty of other things on my mind, too.)

…-gpu - needs adjustment in jobstep executor code, too.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

snakemake_executor_plugin_slurm/__init__.py (3)
117-118: LGTM! Consider adding pattern documentation.

The regex patterns are well-defined and efficiently compiled at class level.

Consider adding docstrings to explain the expected formats:
 class Executor(RemoteExecutor):
+    # Validates GRES format (e.g., "gpu:1", "gpu:tesla:2")
     gres_re = re.compile(r"^[a-zA-Z0-9_]+:([a-zA-Z0-9_]+:)?\d+$")
+    # Validates GPU model names (e.g., "tesla", "a100")
     gpu_model_re = re.compile(r"^[a-zA-Z0-9_]+$")
234-262: Fix variable naming inconsistency.

The code uses both gpus and gpu in different contexts. On line 294, it checks for gpus but the resource is defined as gpu.

Apply this change for consistency:
-        gpu_job = job.resources.get("gpus") or "gpu" in job.resources.get("gres", "")
+        gpu_job = job.resources.get("gpu") or "gpu" in job.resources.get("gres", "")
309-311: Enhance comment clarity for CPU allocation.

The current comment could be more descriptive about the CPU allocation strategy.
-        # we need to set cpus-per-task OR cpus-per-gpu, the function
-        # will return a string with the corresponding value
+        # Determine CPU allocation strategy:
+        # - For GPU jobs: sets --cpus-per-gpu based on job resources
+        # - For non-GPU jobs: sets --cpus-per-task based on job resources
+        # The get_cpus_per_task function returns the appropriate SLURM parameter
         call += f" {get_cpus_per_task(job, gpu_job)}"

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d22386 and 2b50601.

📒 Files selected for processing (1)

snakemake_executor_plugin_slurm/__init__.py (4 hunks)

🧰 Additional context used

📓 Learnings (1)

📓 Common learnings

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: formatting
GitHub Check: testing

🔇 Additional comments (1)

snakemake_executor_plugin_slurm/__init__.py (1)

225-233: LGTM! Robust GRES validation implementation.

The GRES validation logic is thorough with clear error messages and follows SLURM's specifications.

…nwieldy

cmeesters · 2025-01-13T09:54:08Z

@johanneskoester and @coderabbitai - this PR is now depended on snakemake/snakemake-executor-plugin-slurm-jobstep#28 . A change in the cpu-settings function of the jobstep-Executor module was required, too.

coderabbitai · 2025-01-13T09:54:24Z

Tip

For best results, initiate chat on the files or code changes.

@cmeesters: Thank you for informing about the dependency on PR #28 in the snakemake-executor-plugin-slurm-jobstep repository and the required changes in the cpu-settings function. I've noted this information for future reference.

✏️ Learnings added

Learnt from: cmeesters
URL: https://github.com/snakemake/snakemake-executor-plugin-slurm/pull/173

Timestamp: 2025-01-13T09:54:22.793Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b50601 and 0e18736.

📒 Files selected for processing (2)

snakemake_executor_plugin_slurm/__init__.py (4 hunks)
snakemake_executor_plugin_slurm/utils.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

snakemake_executor_plugin_slurm/init.py

🧰 Additional context used

📓 Learnings (1)

📓 Common learnings

Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds `gres` as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.

🪛 Ruff (0.8.2)

snakemake_executor_plugin_slurm/utils.py

67-67: Undefined name WorkflowError

(F821)

76-76: Undefined name WorkflowError

(F821)

79-79: Undefined name WorkflowError

(F821)

87-87: Undefined name WorkflowError

(F821)

90-90: Undefined name WorkflowError

(F821)

99-99: Undefined name WorkflowError

(F821)

106-106: Undefined name WorkflowError

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: testing
GitHub Check: formatting

snakemake_executor_plugin_slurm/utils.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

snakemake_executor_plugin_slurm/utils.py (1)
51-63: Enhance function documentation with format examples

While the regular expressions are well-defined, the docstring could be more helpful by including example formats and explaining the relationship between gres, gpu, and gpu_model parameters.

Add examples to the docstring:
 def set_gres_string(job: JobExecutorInterface) -> str:
     """
     Function to set the gres string for the SLURM job
     based on the resources requested in the job.
+
+    Examples:
+        - gres="gpu:1"  # Request 1 GPU
+        - gres="gpu:tesla:2"  # Request 2 Tesla GPUs
+        - gpu=2  # Request 2 GPUs
+        - gpu=1, gpu_model="tesla"  # Request 1 Tesla GPU
     """
docs/further.md (2)
93-94: Fix grammatical issue in documentation

The sentence structure needs improvement.

Apply this diff:
-SLURM allows to specify GPU request with the `--gres` or `--gpus` flags, Snakemake takes a similar approach. Resources can be asked for with
+SLURM allows you to specify GPU requests using the `--gres` or `--gpus` flags, and Snakemake takes a similar approach. Resources can be requested using:
🧰 Tools

🪛 LanguageTool

[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution. ### GPU Jobs SLURM allows to specify GPU request with the --gresor--gpu...

(ALLOW_TO)

102-114: Enhance documentation examples

While the current examples are good, they could be more comprehensive to cover all supported scenarios.

Consider adding these additional examples:
 ```YAML
 set-resources:
     gres_request_rule:
         gres: "gpu:1"
 
     multi_gpu_rule:
         gpus: 2
         cpus_per_gpu: 4
 
     no_cpu_gpu_rule:
         gpus: 1
         cpus_per_gpu: -1
+
+    gpu_model_rule:
+        gpu_model: "tesla"
+        gpu: 2
+        cpus_per_gpu: 2
+
+    gres_advanced_rule:
+        gres: "gpu:tesla:4"
</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: .coderabbit.yaml**
**Review profile: CHILL**
**Plan: Pro**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 0e1873603d3b3bce48e9ba09ee7579fe4737191b and df62a6f8d243999ff8e80f4c9ce1ab95767bab05.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `docs/further.md` (1 hunks)
* `snakemake_executor_plugin_slurm/utils.py` (2 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>📓 Learnings (2)</summary>

<details>
<summary>📓 Common learnings</summary>
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds gres as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2025-01-13T09:54:22.950Z
Learning: PR #173 (adding gres resource specification) depends on PR #28 in snakemake-executor-plugin-slurm-jobstep repository, as changes were required in the cpu-settings function of the jobstep-Executor module.
</details>
<details>
<summary>snakemake_executor_plugin_slurm/utils.py (1)</summary>
Learnt from: cmeesters
PR: snakemake/snakemake-executor-plugin-slurm#0
File: :0-0
Timestamp: 2024-11-25T21:10:22.280Z
Learning: PR #173 adds gres as a possible resource specification in the Snakemake executor plugin for SLURM; other features are addressed in separate PRs.
</details>

</details><details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>docs/further.md</summary>

[grammar] ~93-~93: Did you mean “specifying”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...tribution`.  ### GPU Jobs  SLURM allows to specify GPU request with the `--gres` or `--gpu...

(ALLOW_TO)

---

[uncategorized] ~95-~95: Loose punctuation mark.
Context: ...esources can be asked for with  - `gres`, e.g. `gres=gpu:1` or `gres=gpu:tesla:2`...

(UNLIKELY_OPENING_PUNCTUATION)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (1)</summary>

* GitHub Check: testing

</details>

<details>
<summary>🔇 Additional comments (2)</summary><blockquote>

<details>
<summary>snakemake_executor_plugin_slurm/utils.py (2)</summary>

`4-10`: **LGTM! Imports are well-organized and complete.**

All necessary imports are present, including the previously missing WorkflowError import.

---

`64-74`: _:warning: Potential issue_

**Fix potential control flow issue with gres_string**

The gres_string is set when job.resources.get("gres") is True, but there's no else clause. This could lead to undefined variable errors in subsequent code.

Apply this diff to fix the issue:
```diff
     if job.resources.get("gres"):
         # Validate GRES format (e.g., "gpu:1", "gpu:tesla:2")
         gres = job.resources.gres
         if not gres_re.match(gres):
             raise WorkflowError(
                 f"Invalid GRES format: {gres}. Expected format: "
                 "'<name>:<number>' or '<name>:<type>:<number>' "
                 "(e.g., 'gpu:1' or 'gpu:tesla:2')"
             )
         gres_string = f" --gres={job.resources.gres}"
+    else:
+        gres_string = ""
Likely invalid or redundant comment.

coderabbitai · 2025-01-13T13:07:35Z

snakemake_executor_plugin_slurm/utils.py

+    if job.resources.get("gpus"):
+        # ensure that gres is not set, if gpu and gpu_model are set
+        if job.resources.get("gres"):
+            raise WorkflowError("GRES and GPU are set. Please only set one" " of them.")
+        # validate GPU format
+        if not gpus_re.match(str(job.resources.gpu)):
+            raise WorkflowError(
+                f"Invalid GPU format: {job.resources.gpu}. "
+                "Expected format: '<number>' or '<name>:<number>' "
+                "(e.g., '1' or 'tesla:2')"
+            )
+        gres_string = f" --gpus={job.resources.gpu}"
+    elif job.resources.get("gpu_model") and job.resources.get("gpu"):
+        # validate GPU model format
+        if not gpu_model_re.match(job.resources.gpu_model):
+            raise WorkflowError(
+                f"Invalid GPU model format: {job.resources.gpu_model}. "
+                "Expected format: '<name>' "
+                "(e.g., 'tesla')"
+            )
+        gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"
+    elif job.resources.get("gpu_model") and not job.resources.get("gpu"):
+        raise WorkflowError(
+            "GPU model is set, but no GPU number is given. " "Please set 'gpu' as well."
+        )
+    return gres_string if job.resources.get("gres") or job.resources.get("gpu") else ""


🛠️ Refactor suggestion

Simplify GPU resource handling logic

The current implementation has overlapping conditions and could be simplified using a more structured approach.

Consider this refactored version:

- if job.resources.get("gpus"): - # ensure that gres is not set, if gpu and gpu_model are set - if job.resources.get("gres"): - raise WorkflowError("GRES and GPU are set. Please only set one" " of them.") - # validate GPU format - if not gpus_re.match(str(job.resources.gpu)): - raise WorkflowError( - f"Invalid GPU format: {job.resources.gpu}. " - "Expected format: '<number>' or '<name>:<number>' " - "(e.g., '1' or 'tesla:2')" - ) - gres_string = f" --gpus={job.resources.gpu}" - elif job.resources.get("gpu_model") and job.resources.get("gpu"): - # validate GPU model format - if not gpu_model_re.match(job.resources.gpu_model): - raise WorkflowError( - f"Invalid GPU model format: {job.resources.gpu_model}. " - "Expected format: '<name>' " - "(e.g., 'tesla')" - ) - gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}" - elif job.resources.get("gpu_model") and not job.resources.get("gpu"): - raise WorkflowError( - "GPU model is set, but no GPU number is given. " "Please set 'gpu' as well." - ) + if not job.resources.get("gres"): + if job.resources.get("gpu_model"): + if not job.resources.get("gpu"): + raise WorkflowError( + "GPU model is set, but no GPU number is given. Please set 'gpu' as well." + ) + if not gpu_model_re.match(job.resources.gpu_model): + raise WorkflowError( + f"Invalid GPU model format: {job.resources.gpu_model}. " + "Expected format: '<name>' (e.g., 'tesla')" + ) + gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}" + elif job.resources.get("gpu"): + if not gpus_re.match(str(job.resources.gpu)): + raise WorkflowError( + f"Invalid GPU format: {job.resources.gpu}. " + "Expected format: '<number>' or '<name>:<number>' " + "(e.g., '1' or 'tesla:2')" + ) + gres_string = f" --gpus={job.resources.gpu}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if job.resources.get("gpus"):

# ensure that gres is not set, if gpu and gpu_model are set

if job.resources.get("gres"):

raise WorkflowError("GRES and GPU are set. Please only set one" " of them.")

# validate GPU format

if not gpus_re.match(str(job.resources.gpu)):

raise WorkflowError(

f"Invalid GPU format: {job.resources.gpu}. "

"Expected format: '<number>' or '<name>:<number>' "

"(e.g., '1' or 'tesla:2')"

)

gres_string = f" --gpus={job.resources.gpu}"

elif job.resources.get("gpu_model") and job.resources.get("gpu"):

# validate GPU model format

if not gpu_model_re.match(job.resources.gpu_model):

raise WorkflowError(

f"Invalid GPU model format: {job.resources.gpu_model}. "

"Expected format: '<name>' "

"(e.g., 'tesla')"

)

gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"

elif job.resources.get("gpu_model") and not job.resources.get("gpu"):

raise WorkflowError(

"GPU model is set, but no GPU number is given. " "Please set 'gpu' as well."

)

return gres_string if job.resources.get("gres") or job.resources.get("gpu") else ""

if not job.resources.get("gres"):

if job.resources.get("gpu_model"):

if not job.resources.get("gpu"):

raise WorkflowError(

"GPU model is set, but no GPU number is given. Please set 'gpu' as well."

)

if not gpu_model_re.match(job.resources.gpu_model):

raise WorkflowError(

f"Invalid GPU model format: {job.resources.gpu_model}. "

"Expected format: '<name>' (e.g., 'tesla')"

)

gres_string = f" --gpus:{job.resources.gpu_model}:{job.resources.gpu}"

elif job.resources.get("gpu"):

if not gpus_re.match(str(job.resources.gpu)):

raise WorkflowError(

f"Invalid GPU format: {job.resources.gpu}. "

"Expected format: '<number>' or '<name>:<number>' "

"(e.g., '1' or 'tesla:2')"

)

gres_string = f" --gpus={job.resources.gpu}"

return gres_string if job.resources.get("gres") or job.resources.get("gpu") else ""

feat: added the job resource 'gres' to be considered

b58dc97

cmeesters requested review from johanneskoester and dlaehnemann November 25, 2024 20:57

cmeesters mentioned this pull request Nov 25, 2024

Respect GPU resource specifications #172

Closed

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

snakemake_executor_plugin_slurm/__init__.py Outdated Show resolved Hide resolved

feat: added regex based validator for gres

4f9035f

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

johanneskoester requested changes Nov 29, 2024

View reviewed changes

feat: supporting selection from gpu:int and gpu_module, too

350577c

coderabbitai bot reviewed Jan 2, 2025

View reviewed changes

feat: providing better gpu model validation and selection

71eb4fe

cmeesters added 2 commits January 2, 2025 21:01

fix: bug associated with not reading my own last commit

1a5c1b1

fix: satisfiying the linter

8d22386

coderabbitai bot reviewed Jan 3, 2025

View reviewed changes

cmeesters mentioned this pull request Jan 7, 2025

get_cpus_per_task() assigns 1 to cpus_per_task when not specified snakemake/snakemake-executor-plugin-slurm-jobstep#27

Open

cmeesters added 5 commits January 8, 2025 14:09

fix: resolved merge conflict

97aca06

feat: using , if gpu resources are selected

00ab13c

feat: auto distinction between ntasks and ntasks_per_gpu

cb94eb9

feat: first half of auto selection between cpus-per-task and cpus-per…

3f36f86

…-gpu - needs adjustment in jobstep executor code, too.

fix: missing colon

2b50601

coderabbitai bot reviewed Jan 10, 2025

View reviewed changes

feat: moved gres/gpu support to utils file - else the executor gets u…

0e18736

…nwieldy

coderabbitai bot reviewed Jan 13, 2025

View reviewed changes

snakemake_executor_plugin_slurm/utils.py Outdated Show resolved Hide resolved

snakemake_executor_plugin_slurm/utils.py Outdated Show resolved Hide resolved

cmeesters added 2 commits January 13, 2025 10:57

fix: added missing WorkflowError import

8914b0f

fix: removed duplicate check

3bcbe05

cmeesters mentioned this pull request Jan 13, 2025

feat: cpu-function altered to support cpus-per-gpu, too snakemake/snakemake-executor-plugin-slurm-jobstep#28

Open

cmeesters changed the title ~~feat: added the job resource 'gres' to be considered~~ feat!: improved GPU job support Jan 13, 2025

docs: added docs for the new features

df62a6f

coderabbitai bot reviewed Jan 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: improved GPU job support #173

feat!: improved GPU job support #173

cmeesters commented Nov 25, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 25, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

cmeesters commented Nov 25, 2024

coderabbitai bot commented Nov 25, 2024

coderabbitai bot left a comment

johanneskoester left a comment

cmeesters commented Nov 29, 2024

coderabbitai bot left a comment

cmeesters commented Jan 2, 2025

coderabbitai bot commented Jan 2, 2025

coderabbitai bot left a comment

cmeesters commented Jan 3, 2025

coderabbitai bot left a comment

cmeesters commented Jan 13, 2025

coderabbitai bot commented Jan 13, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Jan 13, 2025

feat!: improved GPU job support #173

Are you sure you want to change the base?

feat!: improved GPU job support #173

Conversation

cmeesters commented Nov 25, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 25, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Finishing Touches

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

cmeesters commented Nov 25, 2024

coderabbitai bot commented Nov 25, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

johanneskoester left a comment

Choose a reason for hiding this comment

cmeesters commented Nov 29, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

cmeesters commented Jan 2, 2025

coderabbitai bot commented Jan 2, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

cmeesters commented Jan 3, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

cmeesters commented Jan 13, 2025

coderabbitai bot commented Jan 13, 2025

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 13, 2025

Choose a reason for hiding this comment

cmeesters commented Nov 25, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 25, 2024 •

edited

Loading