Distributed Deep Learning Workshop

In this workshop, we will train a deep learning model in a distributed manner using Databricks. We will discuss how we can leverage Delta Lake to prepare structured, semi-structured, or unstructured datasets and Petastorm for distributing datasets efficiently on a cluster. We will also cover how to use Horovod for distributed training on both CPU and GPU based hardware. This example aims to serve as a reusable template that is tailorable to meet your specific modeling needs.

Workshop structure

The workshop involves a series of Databricks notebooks split into two parts,

In part 1 we look at how we can optimally leverage the parallelism of Spark for training deep learning models in a distributed manner. The notebooks outline the following:

Data Prep
- How to create a Delta table with the Binary file data source reader using JPEG image sources.
Single node training
Distributed training

In part 2 we look at how we can paralellize both hyperparameter tuning and model inference. We illustrate:

Model tuning with Hyperopt
- Tuning a single node DL model with Hyperopt
- Tuning a distributed Horovod process with Hyperopt
Distributed model inference
- How to package up a custom Pyfunc with preprocessing/post-processing steps
- Applying that logged custom Pyfunc in a single node inference setting
- Applying that logged custom Pyfunc in a distributed inference setting

Requirements

A recommended Databricks ML Runtime >= 7.3LTS is suggested. Please use the repos feature to clone into your repo and access the notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Distributed Deep Learning Workshop

Workshop structure

Requirements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Distributed Deep Learning Workshop

Workshop structure

Requirements