Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to suppress the (default-) Output element #180

Open
liuhuanshuo opened this issue Nov 9, 2022 · 2 comments
Open

Ability to suppress the (default-) Output element #180

liuhuanshuo opened this issue Nov 9, 2022 · 2 comments

Comments

@liuhuanshuo
Copy link

liuhuanshuo commented Nov 9, 2022

Using sklearn2pmml converted pmml file, the default output is [y,probability(1),probability(0)].

Is there a way to change the default column name, such as changing probability(1) to proba

Or can I select the column that I want, for example I only need to print y columns, I don't need to default to output probability(1),probability(0)

@vruusmann
Copy link
Member

Using sklearn2pmml converted pmml file, the default output is [y,probability(1),probability(0)].

These three values are calculated all in one pass. Therefore, there will be no "performance benefit" to getting rid of the probability output fields other than "visual effect" (eg. keeping things extremely focussed on the screen).

In Scikit-Learn, it would take two passes (first predict(X), then predict_proba(X)) to create such a results data matrix.

Is there a way to change the default column name, such as changing probability(1) to proba

Column renaming is covered in these recently opened issues: jpmml/sklearn2pmml#359 and jpmml/sklearn2pmml#361

There is a special API for renaming transformer fields, but not for renaming model fields.

Or can I select the column that I want, for example I only need to print y columns

yt = evaluator.evaluateAll(X)

# THIS!
yt = yt["y"]

You may consider wrapping the Evaluator.evaluate(X) function call into a separate helper function, which adds/removes result columns as you wish.

@vruusmann vruusmann changed the title Is there a way to change the column name of the pmml output? Ability to suppress the (default-) Output element Nov 9, 2022
@vruusmann
Copy link
Member

I can see the benefit of adding a special-purpose API for disabling the generation of default Output elements.

The easiest way would be such, where the end users signals his/her intent by setting a pmml_output = False attribute on the (fitted-) model object:

classifier = ...

pipeline = PMMLPipeline([
  ("classifier", classifier)
])
pipeline.fit(X, y)

# Default config - the Output element is created
sklearn2pmml(pipeline, "classifier.pmml")

classifier.pmml_output = False

# Custom config - the Output element is not created
sklearn2pmml(pipeline, "classifier-no_proba.pmml")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants