Probability Predictions for DecisionTreeClassifier get set to 1.0 for Onnx Model #1080

nkinnaird · 2024-03-08T20:39:33Z

I have an Sklearn Decision Tree Classifier model which I've converted to Onnx and am running within python. When I get the probabilities of the predictions via the Sklearn model, those probabilities can have floating point values between 0 and 1. When I get the probabilities of the predictions via the converted Onnx model however, any non-zero probability gets returned as 1.0. The predicted classes appear to be consistent between the two models, but the predicted probabilities are not. This seems super odd and I thought for a while that I had some data type issue, but I haven't been able to figure it out and so I'm posting here.

Relevant code is below:

import pickle
import numpy as np
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import onnxruntime as rt

with open('./models-dts/decision_tree_latest02012024.bin','rb') as f:
    trained_dt_model = pickle.load(f)

print(type(trained_dt_model)) # <class 'sklearn.tree._classes.DecisionTreeClassifier'>
print(trained_dt_model.n_features_in_) # 27

test_input1 = np.array([[0.59828803, 0.48945989, 0.50988916, 0.96838321, 0.86380612,
        0.16252359, 0.2610167 , 0.07852967, 0.89160985, 0.25703237,
        0.24157511, 0.24354545, 0.41300357, 0.65447316, 0.35243772,
        0.1305043 , 0.83504563, 0.49549272, 0.90585665, 0.90951762,
        0.21333307, 0.02261209, 0.22308332, 0.38713686, 0.02835888,
        0.29879675, 0.03562193]])

print(trained_dt_model.predict(test_input1.astype(np.float32))) # [[0. 1.]]
print(trained_dt_model.predict_proba(test_input1.astype(np.float32))) #  probabilities can have non-0 and non-1 floats
# [array([[0.5, 0.5]]), 
#    array([[0., 1.]])]


initial_type = [('float_input', FloatTensorType([None, 27]))]
options = {id(trained_dt_model): {"zipmap": False}}
onx = convert_sklearn(trained_dt_model, initial_types=initial_type, options=options)
with open("trained_dt.onnx", "wb") as f:
    f.write(onx.SerializeToString())

sess = rt.InferenceSession("trained_dt_test_opset12.onnx", providers=["CPUExecutionProvider"])
input_name = sess.get_inputs()[0].name
output_name_1 = sess.get_outputs()[0].name
output_name_2 = sess.get_outputs()[1].name
print(input_name, output_name_1, output_name_2) # float_input label probabilities

pred_onx = sess.run([output_name_1, output_name_2], {input_name: test_input1.astype(np.float32)})

print(pred_onx[0].shape) # (1, 2)
print(pred_onx[0]) # [[0 1]] - predicted labels are consistent

print(pred_onx[1].shape) # (2, 1, 2)
print(pred_onx[1]) # predicted probabilities are not - see below - non-0 probabilities get set to 1.
# [[[1. 1.]]
 # [[0. 1.]]]

Packages pip installed to run this include:

scikit-learn==1.3.0
skl2onnx
onnxruntime
ipykernel

The model file types are unsupported for upload, otherwise I would attach them. I can provide them if needed via Google Drive if someone wants to take a look.

I'm hoping that there is something simple I'm missing, but I'm not sure if that issue could be with the decision tree model itself or in some way that I've created the onnx model.

Small edit: In the vast majority of cases the predicted probabilities are 0 and 1 from the decision tree. It is the rare cases where the predictions are some float in-between 0 and 1 that the Sklearn and Onnx models diverge - hence the hardcoded input vector I pasted above.

The text was updated successfully, but these errors were encountered:

xadupre · 2024-04-04T07:29:17Z

I'm hesitating between a runtime issue and a converting issue. Most probably a converting issue. Did you try with the python runtime https://onnx.ai/onnx/api/reference.html to see if you get the same results?

nkinnaird · 2024-04-11T14:17:54Z

Apologies - was on vacation and getting back to this now.

Predicting with ReferenceEvaluator(...).run gives me the same results, so I'm thinking it's probably a converting issue. I'm going to try again and see if changing various sklearn or skl2onnx versions resolves the proglem, maybe I have a misalignment somewhere throwing things off.

xadupre self-assigned this Jun 21, 2024

github-project-automation bot added this to Can Fix Aug 29, 2024

github-project-automation bot moved this to Can Fix but Waiting for an Answer in Can Fix Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probability Predictions for DecisionTreeClassifier get set to 1.0 for Onnx Model #1080

Probability Predictions for DecisionTreeClassifier get set to 1.0 for Onnx Model #1080

nkinnaird commented Mar 8, 2024 •

edited by xadupre

Loading

xadupre commented Apr 4, 2024

nkinnaird commented Apr 11, 2024

Probability Predictions for DecisionTreeClassifier get set to 1.0 for Onnx Model #1080

Probability Predictions for DecisionTreeClassifier get set to 1.0 for Onnx Model #1080

Comments

nkinnaird commented Mar 8, 2024 • edited by xadupre Loading

xadupre commented Apr 4, 2024

nkinnaird commented Apr 11, 2024

nkinnaird commented Mar 8, 2024 •

edited by xadupre

Loading