Please see https://docs.rapids.ai/deployment/stable/examples/ for up to date examples
RAPIDS is a suite of open-source libraries that bring GPU acceleration to data science pipelines. Users building cloud-based machine learning experiments can take advantage of this acceleration throughout their workloads to build models faster, cheaper, and more easily on the cloud platform of their choice.
This repository provides example notebooks and "getting started" code samples to help you integrate RAPIDS with the hyperparameter optimization services from Azure ML, AWS Sagemaker, Google Cloud, and Databricks. The directory for each cloud contains a step-by-step guide to launch an example hyperparameter optimization job. Each example job will use RAPIDS cuDF to load and preprocess data and use cuML or XGBoost for GPU-accelerated model training. RAPIDS also integrates easily with MLflow to track and orchestrate experiments from any of these frameworks.
For large datasets, you can find example notebooks using Dask to load data and train models on multiple GPUs in the same instance or in a multi-node multi-GPU cluster.
Notebooks with a ✅ are fully functional as of RAPIDS Release 22.08, Notebooks with a ❌ require an update, or replacement.
Cloud / Framework | HPO Example | Multi-node multi-GPU Example |
---|---|---|
Microsoft Azure | Azure ML HPO ❌ | Multi-node multi-GPU cuML on Azure ❌ |
Amazon Web Services (AWS) | AWS SageMaker HPO ✅ Scaling up hyperparameter optimization with Kubernetes and XGBoost GPU algorithm ✅ |
|
Google Cloud Platform (GCP) | Google AI Platform HPO ❌ Scaling up hyperparameter optimization with Kubernetes and XGBoost GPU algorithm ✅ |
Multi-node multi-GPU XGBoost and cuML on Google Kubernetes Engine (GKE) ✅ |
Dask | Dask-ML HPO ✅ | Multi-node multi-GPU XGBoost and cuML ✅ |
Databricks | Hyperopt and MLflow on Databricks ✅ | |
MLflow | Hyperopt and MLflow on GKE ✅ | |
Optuna | Dask-Optuna HPO ✅ Optuna on Azure ML ❌ |
|
Ray Tune | Ray Tune HPO ❌ |
The Cloud ML Docker Repository provides a ready to run Docker container with RAPIDS and libraries/SDKs for AWS SageMaker, Azure ML and Google AI Platfrom HPO examples.
docker pull rapidsai/rapidsai-cloud-ml:22.10-cuda11.5-base-ubuntu20.04-py3.9
From the root cloud-ml-examples directory:
docker build --tag rapidsai-cloud-ml:latest --file ./common/docker/Dockerfile.training.unified ./
In addition to public cloud HPO options, the respository also includes "BYOC" sample notebooks that can be run on the public cloud or private infrastructure of your choice, these leverage Ray Tune or Dask-ML for distributed infrastructure.
Check out the RAPIDS HPO webpage for video tutorials and blog posts.