Example Spark workflow on Flyte adapted from the pyspark pi example.
The workflow contains two tasks:
- A Spark task to calculate pi
- A Python task to print out the result
- make
- node (required for pyright. Install via
brew install node
) - python >= 3.7
- flytectl
brew install flyteorg/homebrew-tap/flytectl
make run
to run locally
Follow these steps to run the workflow inside the Flyte sandbox.
-
Create and start the sandbox, mounting the current dir (ie: this repo)
make sandbox-create
-
Enable the Spark backend plugin (restarts flytepropeller)
make enable-spark
-
Build the docker container inside the sandbox
version=v1 make build
-
Package and register
version=v1 make package register
-
Create execution spec from launchplan
make launchplan
-
Execute and watch for new ingress to the spark UI
make exec watch-sparkui
To get started run make install
. This will:
- install git hooks for formatting & linting on git push
- create the virtualenv in .venv/
- install this package in editable mode
Then run make
to see the options for running checks, tests etc.
. .venv/bin/activate
activates the virtualenv. When the requirements in setup.py
change, the virtualenv is updated by the make targets that use the virtualenv.