This repo aims to show some first steps with mlflow.
- Tracking
- Models
- Model Registry
To use mlflow one general needs:
- a server on which mlflow runs (incl. the ui)
- an artifact store
- a database as well as a connector (e.g. sqlite)
Note that a database is not mandatory for tracking. If not specified, mlflow will create a specific folder structure on the disk instead. However, using the Model Registry is not possible in that case.
Note that the pyspark serving notebook is optional.
If you want to use it, you need to install pyspark and pyarrow as defined in the requirements.
Note that a corresponding java version needs to be installed as well to run spark.
Here, localhost simulates a cloud on which mlflow is running. A dedicated folder resp. database simulates the artifact store and remote database.
First, set up a virtual environment given the requirements.
Then, create an empty database, e.g. via sqlite which should be built in for macOs.
cd cloud_mock
sqlite 3
.save mlflow.db
.exit
Then start mlflow ui in your active virtual environment and start mlflow server while you're working directory is cloud_mock.
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./../cloud_mock/artifacts \
--host 127.0.0.1
Note that one reaches the minimal setup via
mlflow ui
but this has some disadvantages as described above.
Once a model is registered, one can serve the model
mlflow models serve -m "models:/{model_name}/{model_version}" -p yourport
Make sure to set the tracking uri in the corresponding terminal.
export MLFLOW_TRACKING_URI='http://localhost:5000'
Note that there are few other opportunities, e.g. building a docker-image or building specific images to deploy the model to different cloud platforms.