Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probability Predictions for DecisionTreeClassifier get set to 1.0 for Onnx Model #1080

Open
nkinnaird opened this issue Mar 8, 2024 · 2 comments
Assignees

Comments

@nkinnaird
Copy link

nkinnaird commented Mar 8, 2024

I have an Sklearn Decision Tree Classifier model which I've converted to Onnx and am running within python. When I get the probabilities of the predictions via the Sklearn model, those probabilities can have floating point values between 0 and 1. When I get the probabilities of the predictions via the converted Onnx model however, any non-zero probability gets returned as 1.0. The predicted classes appear to be consistent between the two models, but the predicted probabilities are not. This seems super odd and I thought for a while that I had some data type issue, but I haven't been able to figure it out and so I'm posting here.

Relevant code is below:

import pickle
import numpy as np
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import onnxruntime as rt

with open('./models-dts/decision_tree_latest02012024.bin','rb') as f:
    trained_dt_model = pickle.load(f)

print(type(trained_dt_model)) # <class 'sklearn.tree._classes.DecisionTreeClassifier'>
print(trained_dt_model.n_features_in_) # 27

test_input1 = np.array([[0.59828803, 0.48945989, 0.50988916, 0.96838321, 0.86380612,
        0.16252359, 0.2610167 , 0.07852967, 0.89160985, 0.25703237,
        0.24157511, 0.24354545, 0.41300357, 0.65447316, 0.35243772,
        0.1305043 , 0.83504563, 0.49549272, 0.90585665, 0.90951762,
        0.21333307, 0.02261209, 0.22308332, 0.38713686, 0.02835888,
        0.29879675, 0.03562193]])

print(trained_dt_model.predict(test_input1.astype(np.float32))) # [[0. 1.]]
print(trained_dt_model.predict_proba(test_input1.astype(np.float32))) #  probabilities can have non-0 and non-1 floats
# [array([[0.5, 0.5]]), 
#    array([[0., 1.]])]


initial_type = [('float_input', FloatTensorType([None, 27]))]
options = {id(trained_dt_model): {"zipmap": False}}
onx = convert_sklearn(trained_dt_model, initial_types=initial_type, options=options)
with open("trained_dt.onnx", "wb") as f:
    f.write(onx.SerializeToString())

sess = rt.InferenceSession("trained_dt_test_opset12.onnx", providers=["CPUExecutionProvider"])
input_name = sess.get_inputs()[0].name
output_name_1 = sess.get_outputs()[0].name
output_name_2 = sess.get_outputs()[1].name
print(input_name, output_name_1, output_name_2) # float_input label probabilities

pred_onx = sess.run([output_name_1, output_name_2], {input_name: test_input1.astype(np.float32)})

print(pred_onx[0].shape) # (1, 2)
print(pred_onx[0]) # [[0 1]] - predicted labels are consistent

print(pred_onx[1].shape) # (2, 1, 2)
print(pred_onx[1]) # predicted probabilities are not - see below - non-0 probabilities get set to 1.
# [[[1. 1.]]
 # [[0. 1.]]]

Packages pip installed to run this include:

scikit-learn==1.3.0
skl2onnx
onnxruntime
ipykernel

The model file types are unsupported for upload, otherwise I would attach them. I can provide them if needed via Google Drive if someone wants to take a look.

I'm hoping that there is something simple I'm missing, but I'm not sure if that issue could be with the decision tree model itself or in some way that I've created the onnx model.

Small edit: In the vast majority of cases the predicted probabilities are 0 and 1 from the decision tree. It is the rare cases where the predictions are some float in-between 0 and 1 that the Sklearn and Onnx models diverge - hence the hardcoded input vector I pasted above.

@xadupre
Copy link
Collaborator

xadupre commented Apr 4, 2024

I'm hesitating between a runtime issue and a converting issue. Most probably a converting issue. Did you try with the python runtime https://onnx.ai/onnx/api/reference.html to see if you get the same results?

@nkinnaird
Copy link
Author

Apologies - was on vacation and getting back to this now.

Predicting with ReferenceEvaluator(...).run gives me the same results, so I'm thinking it's probably a converting issue. I'm going to try again and see if changing various sklearn or skl2onnx versions resolves the proglem, maybe I have a misalignment somewhere throwing things off.

@xadupre xadupre self-assigned this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Can Fix but Waiting for an Answer
Development

No branches or pull requests

2 participants