Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on Windows when CUDA_VISIBLE_DEVICES is set to -1 #11283

Open
RalfG opened this issue Feb 25, 2025 · 3 comments
Open

Crash on Windows when CUDA_VISIBLE_DEVICES is set to -1 #11283

RalfG opened this issue Feb 25, 2025 · 3 comments

Comments

@RalfG
Copy link

RalfG commented Feb 25, 2025

In a project where we combine XGBoost with Tensorflow within the same process, we ran into the following issue:

When the environment variable CUDA_VISIBLE_DEVICES is set to -1, the XGBoost predict step function crashes after about a minute of predicting. Strangely enough, it seems to happen stochastically. The crash only occurs after predicting for a while, either by setting nthread to a low value, or by repeating the same predict step many times. Doing the predict step once usually works without the crash, but not always.

The crash does not produce any error messages and only happens on Windows, as far as I can tell.

Here's a script to reproduce:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

from rich.progress import track
import xgboost as xgb

def main():
    features = xgb.DMatrix("features.buffer")
    model_filenames = ["model_20210416_HCD2021_B.xgboost"]

    prediction_list = []
    for model_filename in track(model_filenames * 100):
        xgb_model = xgb.Booster(model_file=model_filename)
        prediction_list.append(xgb_model.predict(features))

    print("Done without crashes!")


if __name__ == "__main__":
    main()

Comment out the first two lines makes it work again.

pip freeze output:

numpy==2.0.2
scipy==1.13.1
xgboost==2.1.4

And with optional rich install for the progress bar (does not change the crash behavior):

markdown-it-py==3.0.0
mdurl==0.1.2
numpy==2.0.2
Pygments==2.19.1
rich==13.9.4
scipy==1.13.1
typing_extensions==4.12.2
xgboost==2.1.4

Files used: https://1drv.ms/u/c/cc884c602a30d109/ET6oclsK3PpLqnj6p4W0h40BU2vIMXQzQnOWRLl5SfecFw?e=eCoexz

@trivialfis
Copy link
Member

trivialfis commented Feb 25, 2025

May I ask why do you need to set that environment variable to -1?

@RalfG
Copy link
Author

RalfG commented Feb 25, 2025

It's part of a dependency that uses TensorFlow and which is used before the dependency that uses XGBoost. In short, it's a pipeline that combines multiple machine learning predictors, each with their own purpose.

As a simple workaround we can definitely remove the environment variable before predicting with XGBoost. Nevertheless, it seemed sensible to report the issue.

@trivialfis
Copy link
Member

Thank you for sharing, I will try to look into it. Not familiar with debugging on Windows ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants