Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'airsnal_run_pipeline' command fails to finish. #632

Open
morialo3 opened this issue Sep 19, 2023 · 3 comments
Open

'airsnal_run_pipeline' command fails to finish. #632

morialo3 opened this issue Sep 19, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@morialo3
Copy link

morialo3 commented Sep 19, 2023

Running airsnal_run_pipeline give an error when it reaches "point predictions":

Fitting player model for FWD ...
Points prediction for player Kevin De Bruyne
gameweek: 6 vs NFO at home
Expected points: 0.00
gameweek: 7 vs WOL away
Expected points: 0.00
gameweek: 8 vs ARS away
Expected points: 0.00
Points prediction for player Harry Kane
gameweek: 6 vs ARS away
Expected points: 0.00
gameweek: 7 vs LIV at home
Expected points: 0.00
gameweek: 8 vs LUT away
Expected points: 0.00
2023-09-19 13:21:22.286525: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_I2023-09-19 13:21:22.286584: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.286579: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
NITIALIZED: initialization error
2023-09-19 13:21:22.286723: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.288209: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.288729: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_I2023-09-19 13:21:22.288752: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
NITIALIZED: initialization error
Points prediction for player Trent Alexander-Arnold
gameweek: 6 vs WHU at home
Expected points: 0.00
gameweek: 7 vs TOT away
Expected points: 0.00
gameweek: 8 vs BHA away
Expected points: 0.00
2023-09-19 13:21:22.292415: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.292877: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
Points prediction for player Ivan Toney
gameweek: 6 vs EVE at home
Expected points: 0.00
gameweek: 7 vs NFO away
Expected points: 0.00
gameweek: 8 vs MUN away
Expected points: 0.00
2023-09-19 13:21:22.296673: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.296852: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.299751: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.424797: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.475967: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.487848: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
2023-09-19 13:21:22.489218: F external/xla/xla/stream_executor/cuda/cuda_driver.cc:149] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error

then the process just continues to just to stop and throw a trace without completing he task:

Traceback (most recent call last):
File "/home/amro/miniconda3/envs/airsenalenv/bin/airsenal_run_pipeline", line 8, in
sys.exit(run_pipeline())
^^^^^^^^^^^^^^
File "/home/amro/miniconda3/envs/airsenalenv/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amro/miniconda3/envs/airsenalenv/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/amro/miniconda3/envs/airsenalenv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amro/miniconda3/envs/airsenalenv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amro/miniconda3/envs/airsenalenv/lib/python3.11/site-packages/airsenal/scripts/airsenal_run_pipeline.py", line 174, in run_pipeline
opt_ok = run_optimize_squad(
^^^^^^^^^^^^^^^^^^^
File "/home/amro/miniconda3/envs/airsenalenv/lib/python3.11/site-packages/airsenal/scripts/airsenal_run_pipeline.py", line 287, in run_optimize_squad
run_optimization(
File "/home/amro/miniconda3/envs/airsenalenv/lib/python3.11/site-packages/airsenal/scripts/fill_transfersuggestion_table.py", line 595, in run_optimization
fill_suggestion_table(baseline_score, best_strategy, season, fpl_team_id)
File "/home/amro/miniconda3/envs/airsenalenv/lib/python3.11/site-packages/airsenal/framework/optimization_utils.py", line 252, in fill_suggestion_table
best_score = best_strat["total_score"]
~~~~~~~~~~^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable
Total progress: 5%|███▏ | 1/22 [03:59<1:23:47, 239.41s/it]

I'll attach the full result of airsnal_run_pipeline to this gist"
https://gist.github.com/morialo3/cd7fde8e69027b69ac472427cf4986ee

@morialo3 morialo3 changed the title 'airsnal_run_pipeline' command failure to finish. 'airsnal_run_pipeline' command fails to finish. Sep 19, 2023
@nbarlowATI
Copy link
Member

Hi @morialo3 , interesting - I haven't seen this before. I am guessing that Jax (which is used by numpyro in the points prediction) is somehow configured to try and use the GPU. I don't know why this would be, or if it might be possible (or even beneficial?) to run on the GPU, but a potential way to force it to use CPU could be to do something like:

import jax
jax.config.update("jax_default_device", jax.devices("cpu")[0])

in airsenal/framework/bpl_interface.py before the line from bpl import ExtendedDixonColesMatchPredictor...
You'd then need to rerun pip install ..
Could you try this and let us know if it works?

@morialo3
Copy link
Author

I patched airsenal/framework/bpl_interface.py , and it works with no issues. I usually run airsnal_{update_db,run_prediction,run_optimization} individually, so I never encountered this issue, apparently jax needs some adjustments to work with the gpu offload part, any suggestions in that part? thanks for the quick response.

@jack89roberts jack89roberts reopened this Sep 23, 2023
@jack89roberts jack89roberts added the bug Something isn't working label Sep 23, 2023
@jack89roberts
Copy link
Contributor

jack89roberts commented Sep 23, 2023

I usually run airsnal_{update_db,run_prediction,run_optimization} individually, so I never encountered this issue

Based on this my guess is this is something to do with multiprocessing and jax/GPUs not playing together nicely. I believe airsenal_run_prediction defaults to running single-threaded, but airsenal_run_pipeline defaults to running on four threads.

So airsenal_run_pipeline --num_thread 1 might work for you, but that would be worse than Nick's fix above because the majority of AIrsenal's runtime is in the optimisation, and running the optimisation single-threaded will be much slower (and there's nothing in the optimisation code that would make use of a GPU).

We might just want to include the workaround to force jax to always use CPU by default in AIrsenal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants