Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An os.fork() error with the multithreaded JAX #686

Open
Zonkil9 opened this issue Aug 16, 2024 · 8 comments
Open

An os.fork() error with the multithreaded JAX #686

Zonkil9 opened this issue Aug 16, 2024 · 8 comments

Comments

@Zonkil9
Copy link

Zonkil9 commented Aug 16, 2024

Hi, I stumbled upon an error while running the command airsenal_run_pipeline. Everything goes well until:

[...]

Fitting player model for FWD ...
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
2024-08-16 17:54:55.109484: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.109483: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.109485: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.109483: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.109958: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.110125: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.110294: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-08-16 17:54:55.110774: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:116] Non-OK-status: cuda::ToStatus(cuCtxSetCurrent(cuda_context->context()), "Failed setting context")
Status: INTERNAL: CUDA error: Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
==================================================
PREDICTED TOP 5 PLAYERS FOR GAMEWEEK(S) [1, 2, 3]:
==================================================
GK:
1. David Raya Martin, 0.00pts (£5.5m, ARS)
2. Alisson Ramses Becker, 0.00pts (£5.5m, LIV)
3. Ederson Santana de Moraes, 0.00pts (£5.5m, MCI)
4. Stefan Ortega Moreno, 0.00pts (£5.5m, MCI)
5. Emiliano Martínez Romero, 0.00pts (£5.0m, AVL)
-------------------------
DEF:
1. Trent Alexander-Arnold, 0.00pts (£7.0m, LIV)
2. Benjamin White, 0.00pts (£6.5m, ARS)
3. Gabriel dos Santos Magalhães, 0.00pts (£6.0m, ARS)
4. William Saliba, 0.00pts (£6.0m, ARS)
5. Riccardo Calafiori, 0.00pts (£6.0m, ARS)
-------------------------
MID:
1. Mohamed Salah, 0.00pts (£12.5m, LIV)
2. Cole Palmer, 0.00pts (£10.5m, CHE)
3. Bukayo Saka, 0.00pts (£10.0m, ARS)
4. Son Heung-min, 0.00pts (£10.0m, TOT)
5. Kevin De Bruyne, 0.00pts (£9.5m, MCI)
-------------------------
FWD:
1. Erling Haaland, 0.00pts (£15.0m, MCI)
2. Ollie Watkins, 0.00pts (£9.0m, AVL)
3. Alexander Isak, 0.00pts (£8.5m, NEW)
4. Kai Havertz, 0.00pts (£8.0m, ARS)
5. Ivan Toney, 0.00pts (£7.5m, BRE)
-------------------------
Prediction complete..
Generating a squad..

[...]

Additional info:

This error does not occur when I run commands one after another: airsenal_run_optimization --weeks_ahead 3 and airsenal_run_prediction --weeks_ahead 3.

Also, I installed JAX for CUDA 12.6 with

pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

and of course newest CUDA 12.6 from NVIDIA repo. My GPU is NVIDIA MX450.

@jack89roberts
Copy link
Contributor

Hi @Zonkil9 , thanks for reporting. It is a bit fiddly to get multiprocessing / jax / sqlalchemy playing nicely together. Does it make a difference if you run without cuda/GPU, because I also wouldn't be surprised if that causes issues (and I vaguely remember it may actually make AIrsenal run slower, it's not really optimised for GPU). The reason you may be seeing a difference between the pipeline script and the individual scripts is the pipeline defaults to using all threads available on your system, whilst the others default to 4 I think.

@Zonkil9
Copy link
Author

Zonkil9 commented Aug 17, 2024

You are right - the code runs slower on GPU than on CPU. I'll just reverse to the single-threaded JAX on the CPU.

Also, I noticed a slight difference when I ran predictions for 38 fixtures. The predicted optimal players were the same, but there were around 0.5 absolute differences in points for the players.

@jack89roberts
Copy link
Contributor

Also, I noticed a slight difference when I ran predictions for 38 fixtures. The predicted optimal players were the same, but there were around 0.5 absolute differences in points for the players.

This is strange, there is some randomness in the predictions but 0.5pts is quite a lot. Did you mean the difference between predicting for 3 weeks and optimising for 3 weeks vs. predicting for 38 weeks and optimising for 3 weeks, or something along those lines?

@Zonkil9
Copy link
Author

Zonkil9 commented Aug 21, 2024

In order to compute on the CPU, I ran the following commands:

airsenal_update_db
export JAX_PLATFORMS=cpu
airsenal_run_prediction --weeks_ahead 37

and I got this:

==================================================
PREDICTED TOP 5 PLAYERS FOR GAMEWEEK(S) [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]:
==================================================
GK:
1. Alisson Ramses Becker, 153.32pts (£5.5m, LIV)
2. David Raya Martin, 148.51pts (£5.5m, ARS)
3. André Onana, 144.44pts (£5.0m, MUN)
4. Bernd Leno, 134.71pts (£5.0m, FUL)
5. José Malheiro de Sá, 134.46pts (£4.5m, WOL)
-------------------------
DEF:
1. Joško Gvardiol, 179.78pts (£6.0m, MCI)
2. Andrew Robertson, 170.23pts (£6.0m, LIV)
3. Pedro Porro, 169.51pts (£5.5m, TOT)
4. Virgil van Dijk, 148.78pts (£6.0m, LIV)
5. Rúben Gato Alves Dias, 144.81pts (£5.5m, MCI)
-------------------------
MID:
1. Mohamed Salah, 258.24pts (£12.5m, LIV)
2. Kevin De Bruyne, 240.34pts (£9.5m, MCI)
3. Son Heung-min, 217.66pts (£10.0m, TOT)
4. Cole Palmer, 214.14pts (£10.5m, CHE)
5. Bukayo Saka, 191.27pts (£10.0m, ARS)
-------------------------
FWD:
1. Erling Haaland, 269.93pts (£15.0m, MCI)
2. Alexander Isak, 204.36pts (£8.5m, NEW)
3. Kai Havertz, 168.23pts (£8.0m, ARS)
4. Rodrigo Muniz Carvalho, 167.37pts (£6.0m, FUL)
5. Ollie Watkins, 166.03pts (£9.0m, AVL)
-------------------------

In order to compute on GPU, I opened a new terminal session and ran:

airsenal_update_db
sudo nvidia-smi
airsenal_run_prediction --weeks_ahead 37

and I got this result:

==================================================
PREDICTED TOP 5 PLAYERS FOR GAMEWEEK(S) [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]:
==================================================
GK:
1. Alisson Ramses Becker, 153.39pts (£5.5m, LIV)
2. David Raya Martin, 148.59pts (£5.5m, ARS)
3. André Onana, 143.81pts (£5.0m, MUN)
4. Bernd Leno, 134.83pts (£5.0m, FUL)
5. José Malheiro de Sá, 134.50pts (£4.5m, WOL)
-------------------------
DEF:
1. Joško Gvardiol, 180.07pts (£6.0m, MCI)
2. Andrew Robertson, 170.22pts (£6.0m, LIV)
3. Pedro Porro, 169.04pts (£5.5m, TOT)
4. Virgil van Dijk, 148.80pts (£6.0m, LIV)
5. Rúben Gato Alves Dias, 145.06pts (£5.5m, MCI)
-------------------------
MID:
1. Mohamed Salah, 257.96pts (£12.5m, LIV)
2. Kevin De Bruyne, 240.55pts (£9.5m, MCI)
3. Son Heung-min, 217.58pts (£10.0m, TOT)
4. Cole Palmer, 214.06pts (£10.5m, CHE)
5. Bukayo Saka, 191.23pts (£10.0m, ARS)
-------------------------
FWD:
1. Erling Haaland, 270.16pts (£15.0m, MCI)
2. Alexander Isak, 204.46pts (£8.5m, NEW)
3. Kai Havertz, 168.18pts (£8.0m, ARS)
4. Rodrigo Muniz Carvalho, 167.34pts (£6.0m, FUL)
5. Ollie Watkins, 165.87pts (£9.0m, AVL)
-------------------------

As you can see, there are differences in scores for particular players, usually around 0.2 points. But they can be larger, see, e.g., Pedro Porro.

@jack89roberts
Copy link
Contributor

How about between two runs of it on CPU? I can see GPU adding more randomness potentially, but it's interesting so thanks for sending!

@Zonkil9
Copy link
Author

Zonkil9 commented Aug 21, 2024

The results of the two runs on the CPU are exactly the same. They are identical to those I posted above.

@jack89roberts
Copy link
Contributor

Cool. that puts it in the realm of the discussion here (and elsewhere for GPUs more generally): jax-ml/jax#10674

@Zonkil9
Copy link
Author

Zonkil9 commented Aug 21, 2024

Interesting! So I tried that:

airsenal_update_db
sudo nvidia-smi
export XLA_FLAGS=--xla_gpu_deterministic_ops=true
airsenal_run_prediction --weeks_ahead 37

And... my GPU became unimaginably slow! After 20 minutes of computations, I was just only on:

warmup:   5%| | 69/1500

I gave up for now... 😆

EDIT.

So finally it finished:

==================================================
PREDICTED TOP 5 PLAYERS FOR GAMEWEEK(S) [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]:
==================================================
GK:
1. Alisson Ramses Becker, 153.54pts (£5.5m, LIV)
2. David Raya Martin, 148.70pts (£5.5m, ARS)
3. André Onana, 144.28pts (£5.0m, MUN)
4. Bernd Leno, 134.81pts (£5.0m, FUL)
5. José Malheiro de Sá, 134.43pts (£4.5m, WOL)
-------------------------
DEF:
1. Joško Gvardiol, 179.78pts (£6.0m, MCI)
2. Andrew Robertson, 170.45pts (£6.0m, LIV)
3. Pedro Porro, 169.20pts (£5.5m, TOT)
4. Virgil van Dijk, 149.01pts (£6.0m, LIV)
5. Rúben Gato Alves Dias, 144.86pts (£5.5m, MCI)
-------------------------
MID:
1. Mohamed Salah, 258.32pts (£12.5m, LIV)
2. Kevin De Bruyne, 240.15pts (£9.5m, MCI)
3. Son Heung-min, 217.20pts (£10.0m, TOT)
4. Cole Palmer, 214.03pts (£10.5m, CHE)
5. Bukayo Saka, 191.20pts (£10.0m, ARS)
-------------------------
FWD:
1. Erling Haaland, 269.66pts (£15.0m, MCI)
2. Alexander Isak, 204.69pts (£8.5m, NEW)
3. Kai Havertz, 168.13pts (£8.0m, ARS)
4. Rodrigo Muniz Carvalho, 167.28pts (£6.0m, FUL)
5. Ollie Watkins, 166.04pts (£9.0m, AVL)
-------------------------

The results are closer to the CPU but not the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants