Skip to content

Commit

Permalink
fix environment variable export bug for MultiNodeRunner (#5878)
Browse files Browse the repository at this point in the history
In some multi-node environment like SLURM,there are some environment
vars that contain special chars and can trigger errors when being
exported.

For example, there is a var `SLURM_JOB_CPUS_PER_NODE=64(x2)` when
requesting two nodes with 64 cpus using SLURM.
Using `runner.add_export` to export this var will add a command `export
SLURM_JOB_CPUS_PER_NODE=64(x2)` when launching subprocesses, while this
will cause a bash error since `(` is a key word of bash, like:
```
[2024-08-07 16:56:24,651] [INFO] [runner.py:568:main] cmd = pdsh -S -f 1024 -w server22,server27 export PYTHONPATH=/public/home/grzhang/code/CLIP-2;  export SLURM_JOB_CPUS_PER_NODE=64(x2); ...
server22: bash: -c: 行 0: 未预期的符号“(”附近有语法错误
```
This PR simply wrap the environment vars with a pair of `"` to make sure
they are treated as string.

Co-authored-by: Logan Adams <[email protected]>
  • Loading branch information
TideDra and loadams authored Sep 7, 2024
1 parent 2a647c5 commit fc22d96
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion deepspeed/launcher/multinode_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def get_cmd(self, environment, active_resources):
"""Return the command to execute on node"""

def add_export(self, key, var):
self.exports[key.strip()] = var.strip()
self.exports[key.strip()] = f"\"{var.strip()}\""

def parse_user_args(self):
return self.args.user_args
Expand Down

0 comments on commit fc22d96

Please sign in to comment.