Skip to content

tekumara/flyte-spark-example

Repository files navigation

fspark

Example Spark workflow on Flyte adapted from the pyspark pi example.

The workflow contains two tasks:

  1. A Spark task to calculate pi
  2. A Python task to print out the result

Prerequisites

  • make
  • node (required for pyright. Install via brew install node)
  • python >= 3.7
  • flytectl brew install flyteorg/homebrew-tap/flytectl

Usage

make run to run locally

Sandbox

Follow these steps to run the workflow inside the Flyte sandbox.

  1. Create and start the sandbox, mounting the current dir (ie: this repo)

    make sandbox-create
    
  2. Enable the Spark backend plugin (restarts flytepropeller)

    make enable-spark
    
  3. Build the docker container inside the sandbox

    version=v1 make build
    
  4. Package and register

    version=v1 make package register
    
  5. Create execution spec from launchplan

    make launchplan
    
  6. Execute and watch for new ingress to the spark UI

    make exec watch-sparkui
    

Development

To get started run make install. This will:

  • install git hooks for formatting & linting on git push
  • create the virtualenv in .venv/
  • install this package in editable mode

Then run make to see the options for running checks, tests etc.

. .venv/bin/activate activates the virtualenv. When the requirements in setup.py change, the virtualenv is updated by the make targets that use the virtualenv.

About

Example Spark workflow on Flyte

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published