Crash on Windows when `CUDA_VISIBLE_DEVICES` is set to `-1` #11283

RalfG · 2025-02-25T17:55:55Z

In a project where we combine XGBoost with Tensorflow within the same process, we ran into the following issue:

When the environment variable CUDA_VISIBLE_DEVICES is set to -1, the XGBoost predict step function crashes after about a minute of predicting. Strangely enough, it seems to happen stochastically. The crash only occurs after predicting for a while, either by setting nthread to a low value, or by repeating the same predict step many times. Doing the predict step once usually works without the crash, but not always.

The crash does not produce any error messages and only happens on Windows, as far as I can tell.

Here's a script to reproduce:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

from rich.progress import track
import xgboost as xgb

def main():
    features = xgb.DMatrix("features.buffer")
    model_filenames = ["model_20210416_HCD2021_B.xgboost"]

    prediction_list = []
    for model_filename in track(model_filenames * 100):
        xgb_model = xgb.Booster(model_file=model_filename)
        prediction_list.append(xgb_model.predict(features))

    print("Done without crashes!")


if __name__ == "__main__":
    main()

Comment out the first two lines makes it work again.

pip freeze output:

numpy==2.0.2
scipy==1.13.1
xgboost==2.1.4

And with optional rich install for the progress bar (does not change the crash behavior):

markdown-it-py==3.0.0
mdurl==0.1.2
numpy==2.0.2
Pygments==2.19.1
rich==13.9.4
scipy==1.13.1
typing_extensions==4.12.2
xgboost==2.1.4

Files used: https://1drv.ms/u/c/cc884c602a30d109/ET6oclsK3PpLqnj6p4W0h40BU2vIMXQzQnOWRLl5SfecFw?e=eCoexz

The text was updated successfully, but these errors were encountered:

trivialfis · 2025-02-25T18:18:30Z

May I ask why do you need to set that environment variable to -1?

RalfG · 2025-02-25T19:39:30Z

It's part of a dependency that uses TensorFlow and which is used before the dependency that uses XGBoost. In short, it's a pipeline that combines multiple machine learning predictors, each with their own purpose.

As a simple workaround we can definitely remove the environment variable before predicting with XGBoost. Nevertheless, it seemed sensible to report the issue.

trivialfis · 2025-02-25T20:44:23Z

Thank you for sharing, I will try to look into it. Not familiar with debugging on Windows ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash on Windows when `CUDA_VISIBLE_DEVICES` is set to `-1` #11283

Crash on Windows when `CUDA_VISIBLE_DEVICES` is set to `-1` #11283

RalfG commented Feb 25, 2025 •

edited

Loading

trivialfis commented Feb 25, 2025 •

edited

Loading

RalfG commented Feb 25, 2025

trivialfis commented Feb 25, 2025

Crash on Windows when CUDA_VISIBLE_DEVICES is set to -1 #11283

Crash on Windows when CUDA_VISIBLE_DEVICES is set to -1 #11283

Comments

RalfG commented Feb 25, 2025 • edited Loading

trivialfis commented Feb 25, 2025 • edited Loading

RalfG commented Feb 25, 2025

trivialfis commented Feb 25, 2025

Crash on Windows when `CUDA_VISIBLE_DEVICES` is set to `-1` #11283

Crash on Windows when `CUDA_VISIBLE_DEVICES` is set to `-1` #11283

RalfG commented Feb 25, 2025 •

edited

Loading

trivialfis commented Feb 25, 2025 •

edited

Loading