Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom converter for ClassificationProgressiveLearner #1041

Closed

Conversation

certh-knowledge-project
Copy link

Hi,

This pull request implements a custom converter regarding the ClassificationProgressiveLearner estimator as is defined in the neurodata proglearn repository. This package is designed for exploring and using Progressive learning algorithms. More specifically, Progressive learning is a concept in machine learning where the model learns from a series of tasks progressively, leveraging the knowledge gained from previous tasks to improve performance on later tasks. This approach can be particularly useful in scenarios where data comes in a sequential manner or when it’s beneficial to transfer knowledge from one task to another. If you are unfamiliar with the concept or wish to explore it in greater detail, you can find more information in the accompanying paper by Jayanta Dey et al.

PR Details

The Progressive learning implementation is split into two separate categories, TreeClassificationTransformer and NeuralClassificationTransformer, each utilizing a different underlying estimator. TreeClassificationTransformer is based on an ensemble of scikit-learn DecisionTreeClassifier instances (converters already implemented in skl2onnx) whereas the latter is centered around tensorflow/keras neural networks. Since skl2onnx solely focuses on scikit-learn estimators, this contribution emphasizes only on TreeClassificationTransformer to avoid conflicts. Concretely, this contribution is loosely based on the pyod.models.iforest.IForest custom converter, following its main set of instructions adapted to our needs.

Requirements

In an effort to keep the core skl2onnx dependencies pure, the original requirements.txt file is not modified, however, as a note the necessary additional dependencies are presented below (excluding the ones already required by skl2onnx).

scikit-learn>=1.0
tensorflow>=2.4.0
proglearn>=0.0.7

Testing

The main entry point should be the test_proglearn.py file, inside the tests folder, which defines a total of 7 independent test cases, after registering the external converters/shape calculators. It should be noted at this point that a version of onnxruntime>=1.8.0 is required to perform these tests.

Note: There is a known backtracking issue with pip which may result in multiple versions of the packages being installed. A quick fix is to use the --use-deprecated legacy-resolver option when running pip install.

Thank you for taking the time to review this pull request! Looking forward to your insights and suggestions.

pandoup and others added 8 commits November 7, 2023 10:50
Registers the custom shape calculator for the ClassificationProgressiveLearner class and the DictWrapper class (dependency of the ClassificationProgressiveLearner)

Signed-off-by: Panos Doupidis <[email protected]>
Adds proglearn to the registered shape calculators.

Signed-off-by: Panos Doupidis <[email protected]>
Implements the custom converter and parser for the ClassificationProgressiveLearner class.

Signed-off-by: Panos Doupidis <[email protected]>
added ClassificationProgressiveLearner dependency.

Signed-off-by: AvraamBardos <[email protected]>
The purpose of this file is to provide a helper class (DictWrapper), that wraps a dictionary as a scikit-learn estimator and a function to fill missing indices.

Signed-off-by: AvraamBardos <[email protected]>
Import DictWrapper and fill_missing_indices.

Signed-off-by: AvraamBardos <[email protected]>
This unittest is meant to test the functionality and the validity of the registered converters for the ClassificationProgressiveLearner class. In total, 7 unique test cases are assessed, targeting multiple parameter combinations.

Signed-off-by: Panos Doupidis <[email protected]>
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@xadupre
Copy link
Collaborator

xadupre commented Nov 17, 2023

I would prefer to see these converters in a separate package. Taking a dependency on tensorflow and proglearn is a huge change. I assume more converters would be added in the future. I'm reluctant to start fixing the converters when they fail because a new version of proglearn was released. sklearn-onnx has an API to register custom converters. Creating a separate package should not be an issue.

@certh-knowledge-project
Copy link
Author

Hi @xadupre,

Thank you for taking the time to review the pull request and for your valuable feedback. We understand your concerns about adding a dependency on like TensorFlow and ProgLearn, and the potential issues that could arise with future updates.

Your suggestion to create a separate package for the converters is a good one, and we appreciate your insight about the API in sklearn-onnx to register custom converters. We will take your feedback into consideration as we continue to improve our project. Thanks again for your thoughtful review.

@xadupre xadupre closed this Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants