Serialzy is a library for python objects serialization into portable and interoperable data formats (if possible).
Suppose you have a catboost model:
from catboost import CatBoostClassifier
model = CatBoostClassifier()
model.fit(...)
Firstly you should find a proper serializer for the catboost model type or the corresponding data format:
from serialzy.registry import DefaultSerializerRegistry
registry = DefaultSerializerRegistry()
serializer = registry.find_serializer_by_type(type(model)) # registry.find_serializer_by_data_format("cbm")
Serializers have several properties:
serializer.available() # can be used in the current environment
serializer.requirements() # libraries needed to be installed to use this serializer
serializer.stable() # has portable data format
Serializers can provide data format and schema for a type:
serializer.data_format()
serializer.schema(type(model))
Serialization:
with open('model.cbm', 'wb') as file:
serializer.serialize(model, file)
Deserialization:
with open('result', 'rb') as file:
deserialized_obj = serializer.deserialize(file)
Library | Types | Data format |
---|---|---|
Python std lib | int, str, float, bool, None | string representation |
Python std lib | List, Tuple | custom format |
CatBoost | CatBoostRegressor, CatBoostClassifier, CatBoostRanker | cbm |
CatBoost | Pool | quantized pool |
Tensorflow.Keras | Sequential, Model with subclasses | tf_keras |
Tensorflow | Checkpoint, Module with subclasses | tf_pure |
LightGBM | LGBMClassifier, LGBMRegressor, LGBMRanker | lgbm |
XGBoost | XGBClassifier, XGBRegressor, XGBRanker | xgb |
Torch | Module with subclasses | pt |
ONNX | ModelProto | onnx |