Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot set certain XLA_ARGS for PythonProcess #36

Open
samlobel opened this issue Mar 8, 2023 · 2 comments
Open

Cannot set certain XLA_ARGS for PythonProcess #36

samlobel opened this issue Mar 8, 2023 · 2 comments

Comments

@samlobel
Copy link

samlobel commented Mar 8, 2023

When using local_mp, each process that uses jax spawns a huge amount of threads. I'm running 128 actors, and each one spawns ~500 threads, meaning the program spawns over 50,000 threads!

This puts me over the ulimit for my university cluster, and I suspect isn't performant. The recommended solution is to set XLA_FLAGS="--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1". But for some reason this isn't working with PythonProcess. Here's my PythonProcess for each of my nodes:

      PythonProcess(env={
        "CUDA_VISIBLE_DEVICES": str(-1),
        "XLA_FLAGS": "--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1",
      })

Which results in the error bash: line 1: XLA_FLAGS=--xla_cpu_multi_thread_eigen=false intra_op_parallelism_threads=1: command not found in each process that uses a local resource with those envs. Why is the environment variable being treated as a command here? I've talso ried enclosing the value in quotes which did not work. Thank you!

@samlobel
Copy link
Author

samlobel commented Mar 8, 2023

I've confirmed the problem is the inclusion of spaces.

      PythonProcess(env={
        "CUDA_VISIBLE_DEVICES": str(-1),
        "DUMMY_ARG": "isspace theproblem",
      })

errors similarly

@samlobel
Copy link
Author

For anyone else who wants to set XLA_FLAGS, I found a workaround solution that involves editing your site_packages. I'm using the "tmux launcher", (filelaunchpad/launch/run_locally/local_tmux_launcher) which internally calls the (undocumented) subprocess.list2cmdline function on a list that looks like ["env1=val1", "env2=val2", "/path/to/python", "command_name.py"]. Ideally this turns into a command like env1=val1 env2=val2 /path/to/python command_name.py. But, if there are spaces in any of the env values, then it puts quotes around the key/val: env_1=env1 "env2=spaced value" /path/to/python command_name.py. This doesn't set the environment variable env2, but instead tries to run env2=spaced val as a bash command.

Maybe that's desired behavior by the subprocess.list2cmdline but it prevents you from setting env variables with spaces in them. So, I just edited it to strip the quotation marks: cmd = cmd.replace('"', ""). And, used backslash escaping on the spaces inside of the XLA_FLAGS value.

Would be great to get this fixed or documented as XLA_FLAGS must be a common use case for launchpad!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant