Discrepancies between ONNX and sklearn probabilities with isotonic CalibratedClassifierCV #1151

cja-halfspace · 2025-01-07T09:03:15Z

Hello, and thank you for your work on this great library!
I'm seeing a pretty big difference in probabilities when using CalibratedClassifierCV with isotonic regression together with RandomForestClassifier.
It seems like it's only happening when the max_depth parameter is set high enough.

I've provided a small snippet to reproduce the issue, with the following versions of libraries:

scikit-learn==1.6.0
skl2onnx==1.18.0
onnxruntime==1.20.1

import numpy as np
import onnxruntime as ort
from numpy.testing import assert_almost_equal
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from sklearn.calibration import CalibratedClassifierCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

X, y = make_classification(
    n_samples=400_000,
    n_features=15,
    n_informative=15,
    n_redundant=0,
    n_classes=2,
    n_clusters_per_class=2,
    random_state=30,
)
X = X.astype(np.float32)

rf = RandomForestClassifier(
    max_depth=10,
    n_jobs=-1,
    random_state=1234,
).fit(X, y)

model = CalibratedClassifierCV(rf, method="isotonic", cv="prefit").fit(
    X, y
)

model_onnx = convert_sklearn(
    model,
    initial_types=[("input", FloatTensorType([None, X.shape[1]]))],
    target_opset=15,
    options={"zipmap": False},
)

session = ort.InferenceSession(model_onnx.SerializeToString())

output = session.run(
    ["probabilities"],
    {"input": X},
)
onnx_probs = output[0][:,1]
model_probs = model.predict_proba(X)[:,1].astype(np.float32)

assert_almost_equal(onnx_probs, model_probs, decimal=5)

The result is:

> Mismatched elements: 4485 / 400000 (1.12%)
Max absolute difference among violations: 0.01261032
Max relative difference among violations: 0.11618411

I see that IsotonicRegression is not supported on https://onnx.ai/sklearn-onnx/supported.html but I would think CalibratedClassifierCV with both methods would be supported.

The text was updated successfully, but these errors were encountered:

xadupre · 2025-01-08T15:37:57Z

It is supported otherwise you would have a bigger number of mismatched. The issue probably comes from the user of float in the trees instead of double. You can read this to understand where it comes from: https://onnx.ai/sklearn-onnx/auto_tutorial/plot_ebegin_float_double.html. We should implement the latest onnx standard to fix that.

cja-halfspace · 2025-01-13T16:04:54Z

Thanks for the fast reply! Is it correctly understood that this issue would be fixed with the switch to TreeEnsemble?

xadupre · 2025-01-13T16:06:13Z

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancies between ONNX and sklearn probabilities with isotonic CalibratedClassifierCV #1151

Discrepancies between ONNX and sklearn probabilities with isotonic CalibratedClassifierCV #1151

cja-halfspace commented Jan 7, 2025 •

edited

Loading

xadupre commented Jan 8, 2025

cja-halfspace commented Jan 13, 2025

xadupre commented Jan 13, 2025

Discrepancies between ONNX and sklearn probabilities with isotonic CalibratedClassifierCV #1151

Discrepancies between ONNX and sklearn probabilities with isotonic CalibratedClassifierCV #1151

Comments

cja-halfspace commented Jan 7, 2025 • edited Loading

xadupre commented Jan 8, 2025

cja-halfspace commented Jan 13, 2025

xadupre commented Jan 13, 2025

cja-halfspace commented Jan 7, 2025 •

edited

Loading