Skip to content

Commit

Permalink
Use celery for message queues (#24)
Browse files Browse the repository at this point in the history
* use celery worker

* update README

* v0.2

* clean up a bit

* update tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use always eager testing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix mdformat by updating?

* try new line for fix

* disable mdformat

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move into modules

* forgot to lock

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* new pre-release

* support different queue labels

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix file not found

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lint

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* not too complex

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* do not run nodes multiple times

* use hirarchical parallelisation approach

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bump pre-release

* fix gitignore

* lint

* improve CLI and docs a bit

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update README.md

* bump pre-release version

* shorter line

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update README.md

* print the graph

* print nodes that have name

* debug

* bugfix

* rename CLI to `submit` and `worker`

* bump pre-release

* feature request

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add short sleep and fix tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bump version to release

* line too long

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
PythonFZ and pre-commit-ci[bot] authored Nov 6, 2024
1 parent 80ea608 commit bbd77c8
Show file tree
Hide file tree
Showing 13 changed files with 1,836 additions and 1,747 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,3 +162,4 @@ cython_debug/
#.idea/
tmp/
.parrafin_*
.paraffin/
11 changes: 6 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,9 @@ repos:
args: [ --fix ]
# Run the formatter.
- id: ruff-format
- repo: https://github.com/executablebooks/mdformat
rev: 0.7.17
hooks:
- id: mdformat
args: ["--wrap=80"]
# messes with the warning tag in the README right now
# - repo: https://github.com/executablebooks/mdformat
# rev: 0.7.18
# hooks:
# - id: mdformat
# args: ["--wrap=80"]
86 changes: 54 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ documentation on
[parallel stage execution](https://dvc.org/doc/command-reference/repro#parallel-stage-execution).

> [!WARNING]
> `paraffin` is still very experimental. Do not use it for production workflows.
> `paraffin` is still very experimental.
> Do not use it for production workflows.
## Installation

Expand All @@ -23,51 +24,72 @@ pip install paraffin

## Usage

To use Paraffin, you can run the following to run up to 4 DVC stages in
parallel:
The `paraffin submit` command mirrors `dvc repro`, enabling you to queue and execute your entire pipeline or selected stages with parallelization.
If no parameters are specified, the entire graph will be queued and executed via `dvc repro --single-item`.

```bash
paraffin -n 4 <stage names>
paraffin submit <stage name> <stage name> ... <stage name>
# Example: run with a maximum of 4 parallel jobs
paraffin worker --concurrency=4
```

If you have `pip install dash` you can also access the dashboard by running

```bash
paraffin --dashboard <stage names>
### Parallel Execution
Due to limitations in Celery’s graph handling (see [Celery discussion](https://github.com/celery/celery/discussions/9376)), complete parallelization is not always achievable. Paraffin will display parallel-ready stages in a flowchart format.
All stages are visualized in a [Mermaid](https://mermaid.js.org/) flowchart.

```mermaid
flowchart TD
subgraph Level0:1
A_X_ParamsToOuts
A_X_ParamsToOuts_1
A_Y_ParamsToOuts
A_Y_ParamsToOuts_1
end
subgraph Level0:2
A_X_AddNodeNumbers
A_Y_AddNodeNumbers
end
subgraph Level0:3
A_SumNodeAttributes
end
Level0:1 --> Level0:2
Level0:2 --> Level0:3
subgraph Level1:1
B_X_ParamsToOuts
B_X_ParamsToOuts_1
B_Y_ParamsToOuts
B_Y_ParamsToOuts_1
end
subgraph Level1:2
B_X_AddNodeNumbers
B_Y_AddNodeNumbers
end
subgraph Level1:3
B_SumNodeAttributes
end
Level1:1 --> Level1:2
Level1:2 --> Level1:3
```

For more information, run:

```bash
paraffin --help
```

## Labels

You can run `paraffin` in multiple processes (e.g. on different hardware with a
shared file system). To specify where a `stage` should run, you can assign
labels to each worker.
## Queue Labels

```
paraffin --labels GPU # on a GPU node
paraffin --label CPU intel # on a CPU node
```

To configure the stages you need to create a `paraffin.yaml` file as follows:
To fine-tune execution, you can assign stages to specific Celery queues, allowing you to manage execution across different environments or hardware setups.
Define queues in a `paraffin.yaml` file:

```yaml
labels:
GPU_TASK:
- GPU
CPU_TASK:
- CPU
SPECIAL_CPU_TASK:
- CPU
- intel
queue:
"B_X*": BQueue
"A_X_AddNodeNumbers": AQueue
```
Then, start a worker with specified queues, such as celery (default) and AQueue:
```bash
paraffin worker -q AQueue,celery
```
All `stages` not assigned to a queue in `paraffin.yaml` will default to the `celery` queue.

All `stages` that are not part of the `paraffin.yaml` will choose any of the
available workers.

> [!TIP]
> If you are building Python-based workflows with DVC, consider trying
Expand Down
5 changes: 5 additions & 0 deletions paraffin/abc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import typing as t

from dvc.stage import PipelineStage

HirachicalStages = t.Dict[int, t.List[PipelineStage]]
Loading

0 comments on commit bbd77c8

Please sign in to comment.