Code for training models using the metric learning
approach.
The project is based on the library open-metric-learning.
python
(tested with Python version>=3.10, <3.11
).
Create a virtual environment, for example, using conda. Then install the dependencies:
pip install -r requirements.txt
Additional dependencies (logging, feed parsing, duplicate removal, export to onnx) are listed in the file requirements_optional.txt
.
If any of the above-mentioned features are needed (refer to examples), execute:
pip install -r requirements_optional.txt
Before usage, familiarize yourself with the library open-metric-learning.
Particular attention should be given to the configuration file.
The dataset should have a specific format.
When training a text model, the column path
should be replaced with text
and contain the textual description of the object.
Examples of data preparation are available in the examples directory:
- For text models - examples/bert_converter.py.
- For visual models - examples/vit_converter.py.
After making the necessary changes to the configuration files in the configs directory, execute:
python train_bert.py
# OR
python train_vit.py
Model optimization may be required for deployment.
The following examples can be used:
- Conversion to onnx format - examples/vit_to_onnx.py or examples/bert_to_onnx.py.
- Quantization - examples/quantize.py.
To register a model in mlflow
, you can use the following example:
import mlflow
import onnx
mlflow.set_tracking_uri("http://localhost:8000")
model_path = "./ViTExtractor.onnx"
artifact_path = "./artifacts"
onnx_model = onnx.load(model_path)
with mlflow.start_run(experiment_id="1") as run:
mlflow.onnx.log_model(onnx_model, "model", save_as_external_data=False)
mlflow.log_artifact(artifact_path)
In this example, a model of onnx format is registered.