Getting started with mlflow

This repo aims to show some first steps with mlflow.

Tracking
Models
Model Registry

General

To use mlflow one general needs:

a server on which mlflow runs (incl. the ui)
an artifact store
a database as well as a connector (e.g. sqlite)

Note that a database is not mandatory for tracking. If not specified, mlflow will create a specific folder structure on the disk instead. However, using the Model Registry is not possible in that case.

Pyspark Serving

Note that the pyspark serving notebook is optional.
If you want to use it, you need to install pyspark and pyarrow as defined in the requirements.
Note that a corresponding java version needs to be installed as well to run spark.

Simulation on localhost

Here, localhost simulates a cloud on which mlflow is running. A dedicated folder resp. database simulates the artifact store and remote database.

First, set up a virtual environment given the requirements.

Then, create an empty database, e.g. via sqlite which should be built in for macOs.

cd cloud_mock
sqlite 3

.save mlflow.db
.exit

Then start mlflow ui in your active virtual environment and start mlflow server while you're working directory is cloud_mock.

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./../cloud_mock/artifacts \
    --host 127.0.0.1

Note that one reaches the minimal setup via

mlflow ui

but this has some disadvantages as described above.

Create an endpoint

Once a model is registered, one can serve the model

mlflow models serve -m "models:/{model_name}/{model_version}" -p yourport

Make sure to set the tracking uri in the corresponding terminal.

export MLFLOW_TRACKING_URI='http://localhost:5000'

Note that there are few other opportunities, e.g. building a docker-image or building specific images to deploy the model to different cloud platforms.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
cloud_mock		cloud_mock
config		config
data		data
getting_started		getting_started
usage_of_pyfunc_models		usage_of_pyfunc_models
utils		utils
.gitignore		.gitignore
README.md		README.md
intro.pdf		intro.pdf
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started with mlflow

Contents

General

Pyspark Serving

Simulation on localhost

Create an endpoint

Further reading

About

Releases

Packages

Contributors 2

Languages

datadrivers/mlflow_getting_started

Folders and files

Latest commit

History

Repository files navigation

Getting started with mlflow

Contents

General

Pyspark Serving

Simulation on localhost

Create an endpoint

Further reading

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages