Kubeflow Pipeline distributed training support

kfp-dist-train contains utilities to use together with Kubeflow Pipeline to enable writing distributed training code directly using Kubeflow Pipeline SDK.

Get Started

Setup an Kubeflow environment (maybe use https://github.com/alauda/kubeflow-chart).
Upload the example kfp-dist-train.ipynb into a Notebook instance, or setup local pipeline submit.
Execute the example to submit a workflow, you can configure the number of workers in the Kubeflow web UI. The job should look like below:

Some Roadmap

support kfpdist.component(dist=True) decorator as an wrap of dsl.component
support parameter server strategy
support pytorch

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
doc		doc
kfpdist		kfpdist
.gitignore		.gitignore
README.md		README.md
kfp-dist-train.ipynb		kfp-dist-train.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kubeflow Pipeline distributed training support

Get Started

Some Roadmap

About

Releases

Packages

Languages

typhoonzero/kfpdist

Folders and files

Latest commit

History

Repository files navigation

Kubeflow Pipeline distributed training support

Get Started

Some Roadmap

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages