Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when converting CalibratedClassifierCV #1082

Open
paolo-sofia opened this issue Mar 14, 2024 · 4 comments
Open

Issue when converting CalibratedClassifierCV #1082

paolo-sofia opened this issue Mar 14, 2024 · 4 comments
Assignees

Comments

@paolo-sofia
Copy link

Hi, I'm having issues converting a CalibratedClassifierCV model to onnx, the error I get is this:

RuntimeError: For operator SklearnCalibratedClassifierCV (type: SklearnCalibratedClassifierCV), at most 1 input(s) is(are) supported but we got 33 input(s) which are [...]

The estimator is a Pipeline containing OneHotEncoder, OrdinalEncoder and RobustScaler, with the classifier being a RandomForestClassifier.

Schermata del 2024-03-14 12-10-17

If I try to export only the Pipeline, I don't get this error. Does the CalibratedClassifierCV works only for numerical data? Currently my dataframe contains both numerical and categorical columns. How can I fix this problem?

I'm currently using:
Python 3.12
sklearn==1.4.1.post1
skl2onnx==1.16.0
onnx==1.15.0
onnxruntime==1.17.1

@xadupre
Copy link
Collaborator

xadupre commented Apr 4, 2024

I fixed a similar issue yesterday. I guess this is the same. If you have a pipeline you share, it would help. Otherwise, I think I can replicate the issue I had with the VotingClassifier with this one and fix it with a similar solution.

@paolo-sofia
Copy link
Author

Hi @xadupre, thanks for the help. I tried to use the latest version that contains your fixes, but I still get the same result. Here's the pipeline I'm using:

column_transformer: ColumnTransformer = make_column_transformer(
    (OneHotEncoder(), CATEGORICAL_COLUMNS),
    remainder="passthrough",
    n_jobs=-1,
    verbose=True
)

classifier: RandomForestClassifier = RandomForestClassifier(
    n_jobs=-1,
    random_state = 42,
    verbose = 1,
    warm_start = False,
)

pipeline: Pipeline = Pipeline(
    steps=[
        ("column_transformer", column_transformer),
        ("classifier", classifier)
    ],
    verbose=True
)

pipeline.fit(X_train, y_train)

calibrated_classifier: CalibratedClassifierCV = CalibratedClassifierCV(estimator=pipeline, n_jobs=-1, cv="prefit")
calibrated_classifier.fit(X_test, y_test)


onx = to_onnx(calibrated_classifier, X_train[:1], options={CalibratedClassifierCV: {"zipmap": False}})
with open("classifier.onnx", "wb") as f:
    f.write(onx.SerializeToString())

@xadupre
Copy link
Collaborator

xadupre commented May 20, 2024

Sorry for the delay, I can't find the error message in the code. Could you print the full call stack?

@paolo-sofia
Copy link
Author

I'm sorry for the delay, here's the full stack trace:

25 calibrated_classifier: CalibratedClassifierCV = CalibratedClassifierCV(estimator=pipeline, n_jobs=-1, cv="prefit")
     26 calibrated_classifier.fit(X_test, y_test)
---> 29 onx = to_onnx(calibrated_classifier, X_train[:1], options={CalibratedClassifierCV: {"zipmap": False}})
     30 with open("classifier.onnx", "wb") as f:
     31     f.write(onx.SerializeToString())

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/skl2onnx/convert.py:304, in to_onnx(model, X, name, initial_types, target_opset, options, white_op, black_op, final_types, dtype, naming, model_optim, verbose)
    302 if verbose >= 1:
    303     print("[to_onnx] initial_types=%r" % initial_types)
--> 304 return convert_sklearn(
    305     model,
    306     initial_types=initial_types,
    307     target_opset=target_opset,
    308     name=name,
    309     options=options,
    310     white_op=white_op,
    311     black_op=black_op,
    312     final_types=final_types,
    313     dtype=dtype,
    314     verbose=verbose,
    315     naming=naming,
    316     model_optim=model_optim,
    317 )

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/skl2onnx/convert.py:206, in convert_sklearn(model, name, initial_types, doc_string, target_opset, custom_conversion_functions, custom_shape_calculators, custom_parsers, options, intermediate, white_op, black_op, final_types, dtype, naming, model_optim, verbose)
    204 if verbose >= 1:
    205     print("[convert_sklearn] convert_topology")
--> 206 onnx_model = convert_topology(
    207     topology,
    208     name,
    209     doc_string,
    210     target_opset,
    211     options=options,
    212     remove_identity=model_optim and not intermediate,
    213     verbose=verbose,
    214 )
    215 if verbose >= 1:
    216     print("[convert_sklearn] end")

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/skl2onnx/common/_topology.py:1533, in convert_topology(topology, model_name, doc_string, target_opset, options, remove_identity, verbose)
   1522 container = ModelComponentContainer(
   1523     target_opset,
   1524     options=options,
   (...)
   1528     verbose=verbose,
   1529 )
   1531 # Traverse the graph from roots to leaves
   1532 # This loop could eventually be parallelized.
-> 1533 topology.convert_operators(container=container, verbose=verbose)
   1534 container.ensure_topological_order()
   1536 if len(container.inputs) == 0:

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/skl2onnx/common/_topology.py:1350, in Topology.convert_operators(self, container, verbose)
   1347 for variable in operator.outputs:
   1348     _check_variable_out_(variable, operator)
-> 1350 self.call_shape_calculator(operator)
   1351 self.call_converter(operator, container, verbose=verbose)
   1353 # If an operator contains a sequence of operators,
   1354 # output variables are not necessarily known at this stage.

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/skl2onnx/common/_topology.py:1165, in Topology.call_shape_calculator(self, operator)
   1163 else:
   1164     logger.debug("[Shape2] call infer_types for %r", operator)
-> 1165     operator.infer_types()

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/skl2onnx/common/_topology.py:654, in Operator.infer_types(self)
    644     raise MissingShapeCalculator(
    645         "Unexpected shape calculator for alias '{}' "
    646         "and type '{}'.".format(self.type, type(self.raw_operator))
    647     )
    648 logger.debug(
    649     "[Shape-a] %r fed %r - %r",
    650     self,
    651     "".join(str(i.is_fed) for i in self.inputs),
    652     "".join(str(i.is_fed) for i in self.outputs),
    653 )
--> 654 shape_calc(self)
    655 logger.debug(
    656     "[Shape-b] %r inputs=%r - outputs=%r", self, self.inputs, self.outputs
    657 )

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/skl2onnx/common/shape_calculator.py:31, in calculate_linear_classifier_output_shapes(operator)
     20 def calculate_linear_classifier_output_shapes(operator):
     21     """
     22     This operator maps an input feature vector into a scalar label if
     23     the number of outputs is one. If two outputs appear in this
   (...)
     29 
     30     """
---> 31     _calculate_linear_classifier_output_shapes(operator)

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/skl2onnx/common/shape_calculator.py:43, in _calculate_linear_classifier_output_shapes(operator, decision_path, decision_leaf, enable_type_checking)
     41     n_out += 1
     42 out_range = [2, 2 + n_out]
---> 43 check_input_and_output_numbers(
     44     operator, input_count_range=1, output_count_range=out_range
     45 )
     46 if enable_type_checking:
     47     check_input_and_output_types(
     48         operator,
     49         good_input_types=[
   (...)
     54         ],
     55     )

File ~/git/sklearn-onnx-issue/.venv/lib/python3.12/site-packages/onnxconverter_common/utils.py:295, in check_input_and_output_numbers(operator, input_count_range, output_count_range)
    290     raise RuntimeError(
    291         'For operator %s (type: %s), at least %s input(s) is(are) required but we got %s input(s) which are %s'
    292         % (operator.full_name, operator.type, min_input_count, len(operator.inputs), operator.input_full_names))
    294 if max_input_count is not None and len(operator.inputs) > max_input_count:
--> 295     raise RuntimeError(
    296         'For operator %s (type: %s), at most %s input(s) is(are) supported but we got %s input(s) which are %s'
    297         % (operator.full_name, operator.type, max_input_count, len(operator.inputs), operator.input_full_names))
    299 if min_output_count is not None and len(operator.outputs) < min_output_count:
    300     raise RuntimeError(
    301         'For operator %s (type: %s), at least %s output(s) is(are) produced but we got %s output(s) which are %s'
    302         % (operator.full_name, operator.type, min_output_count, len(operator.outputs), operator.output_full_names))

RuntimeError: For operator SklearnCalibratedClassifierCV (type: SklearnCalibratedClassifierCV), at most 1 input(s) is(are) supported but we got 8 input(s) which are ['xx', 'xx', 'xx', 'xx', 'xx', 'xx', 'xx', 'xx']

I replaced the real column names with the placeholder "xx"

@xadupre xadupre self-assigned this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To do
Development

No branches or pull requests

2 participants