Add custom converter for ClassificationProgressiveLearner #1041

certh-knowledge-project · 2023-11-07T11:29:50Z

Hi,

This pull request implements a custom converter regarding the ClassificationProgressiveLearner estimator as is defined in the neurodata proglearn repository. This package is designed for exploring and using Progressive learning algorithms. More specifically, Progressive learning is a concept in machine learning where the model learns from a series of tasks progressively, leveraging the knowledge gained from previous tasks to improve performance on later tasks. This approach can be particularly useful in scenarios where data comes in a sequential manner or when it’s beneficial to transfer knowledge from one task to another. If you are unfamiliar with the concept or wish to explore it in greater detail, you can find more information in the accompanying paper by Jayanta Dey et al.

PR Details

The Progressive learning implementation is split into two separate categories, TreeClassificationTransformer and NeuralClassificationTransformer, each utilizing a different underlying estimator. TreeClassificationTransformer is based on an ensemble of scikit-learn DecisionTreeClassifier instances (converters already implemented in skl2onnx) whereas the latter is centered around tensorflow/keras neural networks. Since skl2onnx solely focuses on scikit-learn estimators, this contribution emphasizes only on TreeClassificationTransformer to avoid conflicts. Concretely, this contribution is loosely based on the pyod.models.iforest.IForest custom converter, following its main set of instructions adapted to our needs.

Requirements

In an effort to keep the core skl2onnx dependencies pure, the original requirements.txt file is not modified, however, as a note the necessary additional dependencies are presented below (excluding the ones already required by skl2onnx).

scikit-learn>=1.0
tensorflow>=2.4.0
proglearn>=0.0.7

Testing

The main entry point should be the test_proglearn.py file, inside the tests folder, which defines a total of 7 independent test cases, after registering the external converters/shape calculators. It should be noted at this point that a version of onnxruntime>=1.8.0 is required to perform these tests.

Note: There is a known backtracking issue with pip which may result in multiple versions of the packages being installed. A quick fix is to use the --use-deprecated legacy-resolver option when running pip install.

Thank you for taking the time to review this pull request! Looking forward to your insights and suggestions.

Registers the custom shape calculator for the ClassificationProgressiveLearner class and the DictWrapper class (dependency of the ClassificationProgressiveLearner) Signed-off-by: Panos Doupidis <[email protected]>

Adds proglearn to the registered shape calculators. Signed-off-by: Panos Doupidis <[email protected]>

Implements the custom converter and parser for the ClassificationProgressiveLearner class. Signed-off-by: Panos Doupidis <[email protected]>

added ClassificationProgressiveLearner dependency. Signed-off-by: AvraamBardos <[email protected]>

The purpose of this file is to provide a helper class (DictWrapper), that wraps a dictionary as a scikit-learn estimator and a function to fill missing indices. Signed-off-by: AvraamBardos <[email protected]>

Import DictWrapper and fill_missing_indices. Signed-off-by: AvraamBardos <[email protected]>

This unittest is meant to test the functionality and the validity of the registered converters for the ClassificationProgressiveLearner class. In total, 7 unique test cases are assessed, targeting multiple parameter combinations. Signed-off-by: Panos Doupidis <[email protected]>

Signed-off-by: AvraamBardos <[email protected]>

github-advanced-security

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

xadupre · 2023-11-17T14:33:56Z

I would prefer to see these converters in a separate package. Taking a dependency on tensorflow and proglearn is a huge change. I assume more converters would be added in the future. I'm reluctant to start fixing the converters when they fail because a new version of proglearn was released. sklearn-onnx has an API to register custom converters. Creating a separate package should not be an issue.

certh-knowledge-project · 2023-11-20T08:15:12Z

Hi @xadupre,

Thank you for taking the time to review the pull request and for your valuable feedback. We understand your concerns about adding a dependency on like TensorFlow and ProgLearn, and the potential issues that could arise with future updates.

Your suggestion to create a separate package for the converters is a good one, and we appreciate your insight about the API in sklearn-onnx to register custom converters. We will take your feedback into consideration as we continue to improve our project. Thanks again for your thoughtful review.

pandoup and others added 8 commits November 7, 2023 10:50

Create proglearn.py

0204730

Registers the custom shape calculator for the ClassificationProgressiveLearner class and the DictWrapper class (dependency of the ClassificationProgressiveLearner) Signed-off-by: Panos Doupidis <[email protected]>

Update __init__.py

8c176b0

Adds proglearn to the registered shape calculators. Signed-off-by: Panos Doupidis <[email protected]>

Create proglearn.py

7ed2581

Implements the custom converter and parser for the ClassificationProgressiveLearner class. Signed-off-by: Panos Doupidis <[email protected]>

Update __init__.py

212f760

added ClassificationProgressiveLearner dependency. Signed-off-by: AvraamBardos <[email protected]>

Create dict_wrapper

546ec47

The purpose of this file is to provide a helper class (DictWrapper), that wraps a dictionary as a scikit-learn estimator and a function to fill missing indices. Signed-off-by: AvraamBardos <[email protected]>

Update __init__.py

880c620

Import DictWrapper and fill_missing_indices. Signed-off-by: AvraamBardos <[email protected]>

Rename dict_wrapper to dict_wrapper.py

8183b01

Signed-off-by: AvraamBardos <[email protected]>

github-advanced-security bot found potential problems Nov 17, 2023

View reviewed changes

xadupre closed this Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom converter for ClassificationProgressiveLearner #1041

Add custom converter for ClassificationProgressiveLearner #1041

certh-knowledge-project commented Nov 7, 2023

github-advanced-security bot left a comment

xadupre commented Nov 17, 2023

certh-knowledge-project commented Nov 20, 2023

Add custom converter for ClassificationProgressiveLearner #1041

Add custom converter for ClassificationProgressiveLearner #1041

Conversation

certh-knowledge-project commented Nov 7, 2023

PR Details

Requirements

Testing

github-advanced-security bot left a comment

Choose a reason for hiding this comment

xadupre commented Nov 17, 2023

certh-knowledge-project commented Nov 20, 2023