diff --git a/docs/user_guide/advanced_composition/chaining_flyte_entities.md b/docs/user_guide/advanced_composition/chaining_flyte_entities.md index f51b45a2d0..29bc71e1bf 100644 --- a/docs/user_guide/advanced_composition/chaining_flyte_entities.md +++ b/docs/user_guide/advanced_composition/chaining_flyte_entities.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (chain_flyte_entities)= # Chaining Flyte entities @@ -28,68 +9,27 @@ kernelspec: Flytekit offers a mechanism for chaining Flyte entities using the `>>` operator. This is particularly valuable when chaining tasks and subworkflows without the need for data flow between the entities. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` + ## Tasks Let's establish a sequence where `t1()` occurs after `t0()`, and `t2()` follows `t1()`. - -```{code-cell} -from flytekit import task, workflow - - -@task -def t2(): - print("Running t2") - return - - -@task -def t1(): - print("Running t1") - return - - -@task -def t0(): - print("Running t0") - return - - -@workflow -def chain_tasks_wf(): - t2_promise = t2() - t1_promise = t1() - t0_promise = t0() - - t0_promise >> t1_promise - t1_promise >> t2_promise + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/chain_entities.py +:caption: advanced_composition/chain_entities.py +:lines: 1-30 ``` -+++ {"lines_to_next_cell": 0} - (chain_subworkflow)= ## Subworkflows Just like tasks, you can chain {ref}`subworkflows `. -```{code-cell} -:lines_to_next_cell: 2 - -@workflow -def sub_workflow_1(): - t1() - - -@workflow -def sub_workflow_0(): - t0() - - -@workflow -def chain_workflows_wf(): - sub_wf1 = sub_workflow_1() - sub_wf0 = sub_workflow_0() - - sub_wf0 >> sub_wf1 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/chain_entities.py +:caption: advanced_composition/chain_entities.py +:lines: 34-49 ``` To run the provided workflows on the Flyte cluster, use the following commands: @@ -110,3 +50,5 @@ pyflyte run --remote \ Chaining tasks and subworkflows is not supported in local environments. Follow the progress of this issue [here](https://github.com/flyteorg/flyte/issues/4080). ::: + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/advanced_composition/conditionals.md b/docs/user_guide/advanced_composition/conditionals.md index 88c447a05c..e73b81e5ed 100644 --- a/docs/user_guide/advanced_composition/conditionals.md +++ b/docs/user_guide/advanced_composition/conditionals.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (conditional)= # Conditionals @@ -31,76 +12,39 @@ received as workflow inputs. While conditions are highly performant in their eva it's important to note that they are restricted to specific binary and logical operators and are applicable only to primitive values. -To begin, import the necessary libraries. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -import random +To begin, import the necessary libraries. -from flytekit import conditional, task, workflow +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:lines: 1-3 ``` -+++ {"lines_to_next_cell": 0} - ## Simple branch In this example, we introduce two tasks, `calculate_circle_circumference` and `calculate_circle_area`. The workflow dynamically chooses between these tasks based on whether the input falls within the fraction range (0-1) or not. -```{code-cell} -@task -def calculate_circle_circumference(radius: float) -> float: - return 2 * 3.14 * radius # Task to calculate the circumference of a circle - - -@task -def calculate_circle_area(radius: float) -> float: - return 3.14 * radius * radius # Task to calculate the area of a circle - - -@workflow -def shape_properties(radius: float) -> float: - return ( - conditional("shape_properties") - .if_((radius >= 0.1) & (radius < 1.0)) - .then(calculate_circle_circumference(radius=radius)) - .else_() - .then(calculate_circle_area(radius=radius)) - ) - - -if __name__ == "__main__": - radius_small = 0.5 - print(f"Circumference of circle (radius={radius_small}): {shape_properties(radius=radius_small)}") - - radius_large = 3.0 - print(f"Area of circle (radius={radius_large}): {shape_properties(radius=radius_large)}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:lines: 11-37 ``` -+++ {"lines_to_next_cell": 0} - ## Multiple branches We establish an `if` condition with multiple branches, which will result in a failure if none of the conditions is met. It's important to note that any `conditional` statement in Flyte is expected to be complete, meaning that all possible branches must be accounted for. -```{code-cell} -@workflow -def shape_properties_with_multiple_branches(radius: float) -> float: - return ( - conditional("shape_properties_with_multiple_branches") - .if_((radius >= 0.1) & (radius < 1.0)) - .then(calculate_circle_circumference(radius=radius)) - .elif_((radius >= 1.0) & (radius <= 10.0)) - .then(calculate_circle_area(radius=radius)) - .else_() - .fail("The input must be within the range of 0 to 10.") - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:pyobject: shape_properties_with_multiple_branches ``` -+++ {"lines_to_next_cell": 0} - :::{note} Take note of the usage of bitwise operators (`&`). Due to Python's PEP-335, the logical `and`, `or` and `not` operators cannot be overloaded. @@ -111,69 +55,22 @@ a convention also observed in other libraries. ## Consuming the output of a conditional Here, we write a task that consumes the output returned by a `conditional`. -```{code-cell} -@workflow -def shape_properties_accept_conditional_output(radius: float) -> float: - result = ( - conditional("shape_properties_accept_conditional_output") - .if_((radius >= 0.1) & (radius < 1.0)) - .then(calculate_circle_circumference(radius=radius)) - .elif_((radius >= 1.0) & (radius <= 10.0)) - .then(calculate_circle_area(radius=radius)) - .else_() - .fail("The input must exist between 0 and 10.") - ) - return calculate_circle_area(radius=result) - - -if __name__ == "__main__": - print(f"Circumference of circle x Area of circle (radius={radius_small}): {shape_properties(radius=5.0)}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:lines: 66-81 ``` -+++ {"lines_to_next_cell": 0} - ## Using the output of a previous task in a conditional You can check if a boolean returned from the previous task is `True`, but unary operations are not supported directly. Instead, use the `is_true`, `is_false` and `is_none` methods on the result. -```{code-cell} -@task -def coin_toss(seed: int) -> bool: - """ - Mimic a condition to verify the successful execution of an operation - """ - r = random.Random(seed) - if r.random() < 0.5: - return True - return False - - -@task -def failed() -> int: - """ - Mimic a task that handles failure - """ - return -1 - - -@task -def success() -> int: - """ - Mimic a task that handles success - """ - return 0 - - -@workflow -def boolean_wf(seed: int = 5) -> int: - result = coin_toss(seed=seed) - return conditional("coin_toss").if_(result.is_true()).then(success()).else_().then(failed()) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:lines: 89-119 ``` -+++ {"lines_to_next_cell": 0} - :::{note} *How do output values acquire these methods?* In a workflow, direct access to outputs is not permitted. Inputs and outputs are automatically encapsulated in a special object known as {py:class}`flytekit.extend.Promise`. @@ -182,14 +79,11 @@ Inputs and outputs are automatically encapsulated in a special object known as { ## Using boolean workflow inputs in a conditional You can directly pass a boolean to a workflow. -```{code-cell} -@workflow -def boolean_input_wf(boolean_input: bool) -> int: - return conditional("boolean_input_conditional").if_(boolean_input.is_true()).then(success()).else_().then(failed()) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:pyobject: boolean_input_wf ``` -+++ {"lines_to_next_cell": 0} - :::{note} Observe that the passed boolean possesses a method called `is_true`. This boolean resides within the workflow context and is encapsulated in a specialized Flytekit object. @@ -198,82 +92,36 @@ This special object enables it to exhibit additional behavior. You can run the workflows locally as follows: -```{code-cell} -if __name__ == "__main__": - print("Running boolean_wf a few times...") - for index in range(0, 5): - print(f"The output generated by boolean_wf = {boolean_wf(seed=index)}") - print( - f"Boolean input: {True if index < 2 else False}; workflow output: {boolean_input_wf(boolean_input=True if index < 2 else False)}" - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:lines: 129-135 ``` -+++ {"lines_to_next_cell": 0} - ## Nested conditionals You can nest conditional sections arbitrarily inside other conditional sections. However, these nested sections can only be in the `then` part of a `conditional` block. -```{code-cell} -@workflow -def nested_conditions(radius: float) -> float: - return ( - conditional("nested_conditions") - .if_((radius >= 0.1) & (radius < 1.0)) - .then( - conditional("inner_nested_conditions") - .if_(radius < 0.5) - .then(calculate_circle_circumference(radius=radius)) - .elif_((radius >= 0.5) & (radius < 0.9)) - .then(calculate_circle_area(radius=radius)) - .else_() - .fail("0.9 is an outlier.") - ) - .elif_((radius >= 1.0) & (radius <= 10.0)) - .then(calculate_circle_area(radius=radius)) - .else_() - .fail("The input must be within the range of 0 to 10.") - ) - - -if __name__ == "__main__": - print(f"nested_conditions(0.4): {nested_conditions(radius=0.4)}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:lines: 142-164 ``` -+++ {"lines_to_next_cell": 0} - ## Using the output of a task in a conditional Let's write a fun workflow that triggers the `calculate_circle_circumference` task in the event of a "heads" outcome, and alternatively, runs the `calculate_circle_area` task in the event of a "tail" outcome. -```{code-cell} -@workflow -def consume_task_output(radius: float, seed: int = 5) -> float: - is_heads = coin_toss(seed=seed) - return ( - conditional("double_or_square") - .if_(is_heads.is_true()) - .then(calculate_circle_circumference(radius=radius)) - .else_() - .then(calculate_circle_area(radius=radius)) - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:pyobject: consume_task_output ``` -+++ {"lines_to_next_cell": 0} - You can run the workflow locally as follows: -```{code-cell} -if __name__ == "__main__": - default_seed_output = consume_task_output(radius=0.4) - print( - f"Executing consume_task_output(0.4) with default seed=5. Expected output: calculate_circle_circumference => {default_seed_output}" - ) - - custom_seed_output = consume_task_output(radius=0.4, seed=7) - print(f"Executing consume_task_output(0.4, seed=7). Expected output: calculate_circle_area => {custom_seed_output}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py +:caption: advanced_composition/conditional.py +:lines: 181-188 ``` ## Run the example on the Flyte cluster @@ -321,3 +169,5 @@ pyflyte run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/conditional.py \ consume_task_output --radius 0.4 --seed 7 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition diff --git a/docs/user_guide/advanced_composition/decorating_tasks.md b/docs/user_guide/advanced_composition/decorating_tasks.md index 50135ee8ab..87141d4cc0 100644 --- a/docs/user_guide/advanced_composition/decorating_tasks.md +++ b/docs/user_guide/advanced_composition/decorating_tasks.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (decorating_tasks)= # Decorating tasks @@ -30,58 +11,44 @@ You can easily change how tasks behave by using decorators to wrap your task fun In order to make sure that your decorated function contains all the type annotation and docstring information that Flyte needs, you will need to use the built-in {py:func}`~functools.wraps` decorator. -To begin, import the required dependencies. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -import logging -from functools import partial, wraps +To begin, import the required dependencies. -from flytekit import task, workflow +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_tasks.py +:caption: advanced_composition/decorating_tasks.py +:lines: 1-4 ``` -+++ {"lines_to_next_cell": 0} - Create a logger to monitor the execution's progress. -```{code-cell} -logger = logging.getLogger(__file__) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_tasks.py +:caption: advanced_composition/decorating_tasks.py +:lines: 7 ``` -+++ {"lines_to_next_cell": 0} - ## Using a single decorator We define a decorator that logs the input and output details for a decorated task. -```{code-cell} -def log_io(fn): - @wraps(fn) - def wrapper(*args, **kwargs): - logger.info(f"task {fn.__name__} called with args: {args}, kwargs: {kwargs}") - out = fn(*args, **kwargs) - logger.info(f"task {fn.__name__} output: {out}") - return out - - return wrapper +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_tasks.py +:caption: advanced_composition/decorating_tasks.py +:pyobject: log_io ``` -+++ {"lines_to_next_cell": 0} - We create a task named `t1` that is decorated with `log_io`. :::{note} The order of invoking the decorators is important. `@task` should always be the outer-most decorator. ::: -```{code-cell} -@task -@log_io -def t1(x: int) -> int: - return x + 1 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_tasks.py +:caption: advanced_composition/decorating_tasks.py +:pyobject: t1 ``` -+++ {"lines_to_next_cell": 0} - (stacking_decorators)= ## Stacking multiple decorators @@ -91,49 +58,27 @@ You can also stack multiple decorators on top of each other as long as `@task` i We define a decorator that verifies if the output from the decorated function is a positive number before it's returned. If this assumption is violated, it raises a `ValueError` exception. -```{code-cell} -def validate_output(fn=None, *, floor=0): - @wraps(fn) - def wrapper(*args, **kwargs): - out = fn(*args, **kwargs) - if out <= floor: - raise ValueError(f"output of task {fn.__name__} must be a positive number, found {out}") - return out - - if fn is None: - return partial(validate_output, floor=floor) - - return wrapper +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_tasks.py +:caption: advanced_composition/decorating_tasks.py +:pyobject: validate_output ``` -+++ {"lines_to_next_cell": 0} - :::{note} The output of the `validate_output` task uses {py:func}`~functools.partial` to implement parameterized decorators. ::: We define a function that uses both the logging and validator decorators. -```{code-cell} -@task -@log_io -@validate_output(floor=10) -def t2(x: int) -> int: - return x + 10 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_tasks.py +:caption: advanced_composition/decorating_tasks.py +:pyobject: t2 ``` -+++ {"lines_to_next_cell": 0} - Finally, we compose a workflow that calls `t1` and `t2`. -```{code-cell} -@workflow -def decorating_task_wf(x: int) -> int: - return t2(x=t1(x=x)) - - -if __name__ == "__main__": - print(f"Running decorating_task_wf(x=10) {decorating_task_wf(x=10)}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_tasks.py +:caption: advanced_composition/decorating_tasks.py +:lines: 53-59 ``` ## Run the example on the Flyte cluster @@ -150,3 +95,5 @@ In this example, you learned how to modify the behavior of tasks via function de {py:func}`~functools.wraps` decorator pattern. To learn more about how to extend Flyte at a deeper level, for example creating custom types, custom tasks or backend plugins, see {ref}`Extending Flyte `. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/advanced_composition/decorating_workflows.md b/docs/user_guide/advanced_composition/decorating_workflows.md index 3a369cc433..3cd4fda0a2 100644 --- a/docs/user_guide/advanced_composition/decorating_workflows.md +++ b/docs/user_guide/advanced_composition/decorating_workflows.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (decorating_workflows)= # Decorating workflows @@ -28,51 +9,34 @@ kernelspec: The behavior of workflows can be modified in a light-weight fashion by using the built-in {py:func}`~functools.wraps` decorator pattern, similar to using decorators to {ref}`customize task behavior `. However, unlike in the case of -tasks, we need to do a little extra work to make sure that the DAG underlying the workflow executes tasks in the -correct order. +tasks, we need to do a little extra work to make sure that the DAG underlying the workflow executes tasks in the correct order. ## Setup-teardown pattern The main use case of decorating `@workflow`-decorated functions is to establish a setup-teardown pattern to execute task before and after your main workflow logic. This is useful when integrating with other external services -like [wandb](https://wandb.ai/site) or [clearml](https://clear.ml/), which enable you to track metrics of model -training runs. +like [wandb](https://wandb.ai/site) or [clearml](https://clear.ml/), which enable you to track metrics of model training runs. -To begin, import the necessary libraries. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -from functools import partial, wraps -from unittest.mock import MagicMock +To begin, import the necessary libraries. -import flytekit -from flytekit import FlyteContextManager, task, workflow -from flytekit.core.node_creation import create_node +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_workflows.py +:caption: advanced_composition/decorating_workflows.py +:lines: 1-6 ``` -+++ {"lines_to_next_cell": 0} - Let's define the tasks we need for setup and teardown. In this example, we use the {py:class}`unittest.mock.MagicMock` class to create a fake external service that we want to initialize at the beginning of our workflow and finish at the end. -```{code-cell} -external_service = MagicMock() - - -@task -def setup(): - print("initializing external service") - external_service.initialize(id=flytekit.current_context().execution_id) - - -@task -def teardown(): - print("finish external service") - external_service.complete(id=flytekit.current_context().execution_id) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_workflows.py +:caption: advanced_composition/decorating_workflows.py +:lines: 9-21 ``` -+++ {"lines_to_next_cell": 0} - As you can see, you can even use Flytekit's current context to access the `execution_id` of the current workflow if you need to link Flyte with the external service so that you reference the same unique identifier in both the external service and Flyte. @@ -81,46 +45,11 @@ external service and Flyte. We create a decorator that we want to use to wrap our workflow function. -```{code-cell} -def setup_teardown(fn=None, *, before, after): - @wraps(fn) - def wrapper(*args, **kwargs): - # get the current flyte context to obtain access to the compilation state of the workflow DAG. - ctx = FlyteContextManager.current_context() - - # defines before node - before_node = create_node(before) - # ctx.compilation_state.nodes == [before_node] - - # under the hood, flytekit compiler defines and threads - # together nodes within the `my_workflow` function body - outputs = fn(*args, **kwargs) - # ctx.compilation_state.nodes == [before_node, *nodes_created_by_fn] - - # defines the after node - after_node = create_node(after) - # ctx.compilation_state.nodes == [before_node, *nodes_created_by_fn, after_node] - - # compile the workflow correctly by making sure `before_node` - # runs before the first workflow node and `after_node` - # runs after the last workflow node. - if ctx.compilation_state is not None: - # ctx.compilation_state.nodes is a list of nodes defined in the - # order of execution above - workflow_node0 = ctx.compilation_state.nodes[1] - workflow_node1 = ctx.compilation_state.nodes[-2] - before_node >> workflow_node0 - workflow_node1 >> after_node - return outputs - - if fn is None: - return partial(setup_teardown, before=before, after=after) - - return wrapper +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_workflows.py +:caption: advanced_composition/decorating_workflows.py +:pyobject: setup_teardown ``` -+++ {"lines_to_next_cell": 0} - There are a few key pieces to note in the `setup_teardown` decorator above: 1. It takes a `before` and `after` argument, both of which need to be `@task`-decorated functions. These @@ -137,32 +66,16 @@ There are a few key pieces to note in the `setup_teardown` decorator above: We define two tasks that will constitute the workflow. -```{code-cell} -@task -def t1(x: float) -> float: - return x - 1 - - -@task -def t2(x: float) -> float: - return x**2 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_workflows.py +:caption: advanced_composition/decorating_workflows.py +:lines: 63-70 ``` -+++ {"lines_to_next_cell": 0} - And then create our decorated workflow: -```{code-cell} -:lines_to_next_cell: 2 - -@workflow -@setup_teardown(before=setup, after=teardown) -def decorating_workflow(x: float) -> float: - return t2(x=t1(x=x)) - - -if __name__ == "__main__": - print(decorating_workflow(x=10.0)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/decorating_workflows.py +:caption: advanced_composition/decorating_workflows.py +:lines: 74-82 ``` ## Run the example on the Flyte cluster @@ -178,3 +91,5 @@ pyflyte run --remote \ To define workflows imperatively, refer to {ref}`this example `, and to learn more about how to extend Flyte at a deeper level, for example creating custom types, custom tasks or backend plugins, see {ref}`Extending Flyte `. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/advanced_composition/dynamic_workflows.md b/docs/user_guide/advanced_composition/dynamic_workflows.md index 99bc88a372..c39272dfdf 100644 --- a/docs/user_guide/advanced_composition/dynamic_workflows.md +++ b/docs/user_guide/advanced_composition/dynamic_workflows.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (dynamic_workflow)= # Dynamic workflows @@ -53,51 +34,38 @@ Dynamic workflows become essential when you require: This example utilizes dynamic workflow to count the common characters between any two strings. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` + To begin, we import the required libraries. -```{code-cell} -from flytekit import dynamic, task, workflow +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py +:caption: advanced_composition/dynamic_workflow.py +:lines: 1 ``` -+++ {"lines_to_next_cell": 0} - We define a task that returns the index of a character, where A-Z/a-z is equivalent to 0-25. -```{code-cell} -@task -def return_index(character: str) -> int: - if character.islower(): - return ord(character) - ord("a") - else: - return ord(character) - ord("A") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py +:caption: advanced_composition/dynamic_workflow.py +:pyobject: return_index ``` -+++ {"lines_to_next_cell": 0} - We also create a task that prepares a list of 26 characters by populating the frequency of each character. -```{code-cell} -@task -def update_list(freq_list: list[int], list_index: int) -> list[int]: - freq_list[list_index] += 1 - return freq_list +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py +:caption: advanced_composition/dynamic_workflow.py +:pyobject: update_list ``` -+++ {"lines_to_next_cell": 0} - We define a task to calculate the number of common characters between the two strings. -```{code-cell} -@task -def derive_count(freq1: list[int], freq2: list[int]) -> int: - count = 0 - for i in range(26): - count += min(freq1[i], freq2[i]) - return count +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py +:caption: advanced_composition/dynamic_workflow.py +:pyobject: derive_count ``` -+++ {"lines_to_next_cell": 0} - We define a dynamic workflow to accomplish the following: 1. Initialize an empty 26-character list to be passed to the `update_list` task @@ -107,37 +75,11 @@ We define a dynamic workflow to accomplish the following: The looping process is contingent on the number of characters in both strings, which is unknown until runtime. -```{code-cell} -@dynamic -def count_characters(s1: str, s2: str) -> int: - # s1 and s2 should be accessible - - # Initialize empty lists with 26 slots each, corresponding to every alphabet (lower and upper case) - freq1 = [0] * 26 - freq2 = [0] * 26 - - # Loop through characters in s1 - for i in range(len(s1)): - # Calculate the index for the current character in the alphabet - index = return_index(character=s1[i]) - # Update the frequency list for s1 - freq1 = update_list(freq_list=freq1, list_index=index) - # index and freq1 are not accessible as they are promises - - # looping through the string s2 - for i in range(len(s2)): - # Calculate the index for the current character in the alphabet - index = return_index(character=s2[i]) - # Update the frequency list for s2 - freq2 = update_list(freq_list=freq2, list_index=index) - # index and freq2 are not accessible as they are promises - - # Count the common characters between s1 and s2 - return derive_count(freq1=freq1, freq2=freq2) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py +:caption: advanced_composition/dynamic_workflow.py +:pyobject: count_characters ``` -+++ {"lines_to_next_cell": 0} - A dynamic workflow is modeled as a task in the backend, but the body of the function is executed to produce a workflow at run-time. In both dynamic and static workflows, the output of tasks are promise objects. @@ -155,25 +97,18 @@ Local execution works when a `@dynamic` decorator is used because Flytekit treat Define a workflow that triggers the dynamic workflow. -```{code-cell} -@workflow -def dynamic_wf(s1: str, s2: str) -> int: - return count_characters(s1=s1, s2=s2) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py +:caption: advanced_composition/dynamic_workflow.py +:pyobject: dynamic_wf ``` -+++ {"lines_to_next_cell": 0} - You can run the workflow locally as follows: -```{code-cell} -:lines_to_next_cell: 2 - -if __name__ == "__main__": - print(dynamic_wf(s1="Pear", s2="Earth")) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py +:caption: advanced_composition/dynamic_workflow.py +:lines: 78-79 ``` -+++ {"lines_to_next_cell": 0} - ## Why use dynamic workflows? ### Flexibility @@ -206,9 +141,7 @@ resulting in less noticeable overhead. Merge sort is a perfect example to showcase how to seamlessly achieve recursion using dynamic workflows. Flyte imposes limitations on the depth of recursion to prevent misuse and potential impacts on the overall stability of the system. -```{code-cell} -:lines_to_next_cell: 2 - +```python from typing import Tuple from flytekit import conditional, dynamic, task, workflow @@ -290,3 +223,5 @@ pyflyte run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/dynamic_workflow.py \ merge_sort --numbers '[1813, 3105, 3260, 2634, 383, 7037, 3291, 2403, 315, 7164]' --numbers_count 10 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/advanced_composition/eager_workflows.md b/docs/user_guide/advanced_composition/eager_workflows.md index c2cc1dc542..480374413b 100644 --- a/docs/user_guide/advanced_composition/eager_workflows.md +++ b/docs/user_guide/advanced_composition/eager_workflows.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (eager_workflows)= # Eager workflows @@ -60,32 +41,14 @@ the python constructs that you're familiar with via the `asyncio` API. To understand what this looks like, let's define a very basic eager workflow using the `@eager` decorator. -```{code-cell} -:lines_to_next_cell: 2 - -from flytekit import task, workflow -from flytekit.experimental import eager - - -@task -def add_one(x: int) -> int: - return x + 1 - - -@task -def double(x: int) -> int: - return x * 2 - - -@eager -async def simple_eager_workflow(x: int) -> int: - out = await add_one(x=x) - if out < 0: - return -1 - return await double(x=out) +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 2} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:lines: 1-21 +``` As we can see in the code above, we're defining an `async` function called `simple_eager_workflow` that takes an integer as input and returns an integer. @@ -153,19 +116,11 @@ One of the biggest benefits of eager workflows is that you can now materialize task and subworkflow outputs as Python values and do operations on them just like you would in any other Python function. Let's look at another example: -```{code-cell} -@eager -async def another_eager_workflow(x: int) -> int: - out = await add_one(x=x) - - # out is a Python integer - out = out - 1 - - return await double(x=out) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:pyobject: another_eager_workflow ``` -+++ {"lines_to_next_cell": 0} - Since out is an actual Python integer and not a promise, we can do operations on it at runtime, inside the eager workflow function body. This is not possible with static or dynamic workflows. @@ -176,27 +131,9 @@ As you saw in the `simple_eager_workflow` workflow above, you can use regular Python conditionals in your eager workflows. Let's look at a more complicated example: -```{code-cell} -:lines_to_next_cell: 2 - -@task -def gt_100(x: int) -> bool: - return x > 100 - - -@eager -async def eager_workflow_with_conditionals(x: int) -> int: - out = await add_one(x=x) - - if out < 0: - return -1 - elif await gt_100(x=out): - return 100 - else: - out = await double(x=out) - - assert out >= -1 - return out +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:lines: 36-53 ``` In the above example, we're using the eager workflow's Python runtime @@ -207,88 +144,36 @@ to check if `out` is negative, but we're also using the `gt_100` task in the You can also gather the outputs of multiple tasks or subworkflows into a list: -```{code-cell} -import asyncio - - -@eager -async def eager_workflow_with_for_loop(x: int) -> int: - outputs = [] - - for i in range(x): - outputs.append(add_one(x=i)) - - outputs = await asyncio.gather(*outputs) - return await double(x=sum(outputs)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:lines: 58-69 ``` -+++ {"lines_to_next_cell": 0} - ### Static subworkflows You can also invoke static workflows from within an eager workflow: -```{code-cell} -:lines_to_next_cell: 2 - -@workflow -def subworkflow(x: int) -> int: - out = add_one(x=x) - return double(x=out) - - -@eager -async def eager_workflow_with_static_subworkflow(x: int) -> int: - out = await subworkflow(x=x) - assert out == (x + 1) * 2 - return out +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:lines: 74-84 ``` -+++ {"lines_to_next_cell": 0} - ### Eager subworkflows You can have nest eager subworkflows inside a parent eager workflow: -```{code-cell} -:lines_to_next_cell: 2 - -@eager -async def eager_subworkflow(x: int) -> int: - return await add_one(x=x) - - -@eager -async def nested_eager_workflow(x: int) -> int: - out = await eager_subworkflow(x=x) - return await double(x=out) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:lines: 89-97 ``` -+++ {"lines_to_next_cell": 0} - ### Catching exceptions You can also catch exceptions in eager workflows through `EagerException`: -```{code-cell} -:lines_to_next_cell: 2 - -from flytekit.experimental import EagerException - - -@task -def raises_exc(x: int) -> int: - if x <= 0: - raise TypeError - return x - - -@eager -async def eager_workflow_with_exception(x: int) -> int: - try: - return await raises_exc(x=x) - except EagerException: - return -1 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:lines: 102-117 ``` Even though the `raises_exc` exception task raises a `TypeError`, the @@ -310,10 +195,9 @@ and remotely. You can execute eager workflows locally by simply calling them like a regular `async` function: -```{code-cell} -if __name__ == "__main__": - result = asyncio.run(simple_eager_workflow(x=5)) - print(f"Result: {result}") # "Result: 12" +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:lines: 123-125 ``` This just uses the `asyncio.run` function to execute the eager workflow just @@ -329,7 +213,7 @@ object to kick off task, static workflow, and eager workflow executions. In order to actually execute them on a Flyte cluster, you'll need to configure eager workflows with a `FlyteRemote` object and secrets configuration that -allows you to authenticate into the cluster via a client secret key. +allows you to authenticate into the cluster via a client secret key: ```{code-block} python from flytekit.remote import FlyteRemote @@ -348,41 +232,21 @@ async def eager_workflow_remote(x: int) -> int: ... ``` -+++ - Where `config.yaml` contains a [flytectl](https://docs.flyte.org/projects/flytectl/en/latest/#configuration)-compatible config file and `my_client_secret_group` and `my_client_secret_key` are the {ref}`secret group and key ` that you've configured for your Flyte cluster to authenticate via a client key. -+++ - ### Sandbox Flyte cluster execution When using a sandbox cluster started with `flytectl demo start`, however, the `client_secret_group` and `client_secret_key` are not required, since the default sandbox configuration does not require key-based authentication. -```{code-cell} -:lines_to_next_cell: 2 - -from flytekit.configuration import Config -from flytekit.remote import FlyteRemote - - -@eager( - remote=FlyteRemote( - config=Config.for_sandbox(), - default_project="flytesnacks", - default_domain="development", - ) -) -async def eager_workflow_sandbox(x: int) -> int: - out = await add_one(x=x) - if out < 0: - return -1 - return await double(x=out) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/eager_workflows.py +:caption: advanced_composition/eager_workflows.py +:lines: 130-145 ``` ```{important} @@ -493,3 +357,5 @@ of promises and materialized values: | `@workflow` | Compiled at compile-time | All inputs and intermediary outputs are promises | Type errors caught at compile-time | Constrained by Flyte DSL | | `@dynamic` | Compiled at run-time | Inputs are materialized, but outputs of all Flyte entities are Promises | More flexible than `@workflow`, e.g. can do Python operations on inputs | Can't use a lot of Python constructs (e.g. try/except) | | `@eager` | Never compiled | Everything is materialized! | Can effectively use all Python constructs via `asyncio` syntax | No compile-time benefits, this is the wild west 🏜 | + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/advanced_composition/intratask_checkpoints.md b/docs/user_guide/advanced_composition/intratask_checkpoints.md index 703279abcb..8c83eb154b 100644 --- a/docs/user_guide/advanced_composition/intratask_checkpoints.md +++ b/docs/user_guide/advanced_composition/intratask_checkpoints.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - # Intratask checkpoints ```{eval-rst} @@ -64,66 +45,40 @@ It's important to note that Flyte currently offers the low-level API for checkpo Future integrations aim to incorporate higher-level checkpointing APIs from popular training frameworks like Keras, PyTorch, Scikit-learn, and big-data frameworks such as Spark and Flink, enhancing their fault-tolerance capabilities. -To begin, import the necessary libraries and set the number of task retries to `3`. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -from flytekit import current_context, task, workflow -from flytekit.exceptions.user import FlyteRecoverableException +To begin, import the necessary libraries and set the number of task retries to `3`: -RETRIES = 3 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/checkpoint.py +:caption: advanced_composition/checkpoint.py +:lines: 1-4 ``` -+++ {"lines_to_next_cell": 0} - -We define a task to iterate precisely `n_iterations`, checkpoint its state, and recover from simulated failures. - -```{code-cell} -@task(retries=RETRIES) -def use_checkpoint(n_iterations: int) -> int: - cp = current_context().checkpoint - prev = cp.read() - - start = 0 - if prev: - start = int(prev.decode()) - - # Create a failure interval to simulate failures across 'n' iterations and then succeed after configured retries - failure_interval = n_iterations // RETRIES - index = 0 - for index in range(start, n_iterations): - # Simulate a deterministic failure for demonstration. Showcasing how it eventually completes within the given retries - if index > start and index % failure_interval == 0: - raise FlyteRecoverableException(f"Failed at iteration {index}, failure_interval {failure_interval}.") - # Save progress state. It is also entirely possible to save state every few intervals - cp.write(f"{index + 1}".encode()) - return index -``` +We define a task to iterate precisely `n_iterations`, checkpoint its state, and recover from simulated failures: -+++ {"lines_to_next_cell": 0} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/checkpoint.py +:caption: advanced_composition/checkpoint.py +:pyobject: use_checkpoint +``` The checkpoint system offers additional APIs, documented in the code accessible at [checkpointer code](https://github.com/flyteorg/flytekit/blob/master/flytekit/core/checkpointer.py). -Create a workflow that invokes the task. +Create a workflow that invokes the task: The task will automatically undergo retries in the event of a {ref}`FlyteRecoverableException `. -```{code-cell} -@workflow -def checkpointing_example(n_iterations: int) -> int: - return use_checkpoint(n_iterations=n_iterations) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/checkpoint.py +:caption: advanced_composition/checkpoint.py +:pyobject: checkpointing_example ``` -+++ {"lines_to_next_cell": 0} - -The local checkpoint is not utilized here because retries are not supported. +The local checkpoint is not utilized here because retries are not supported: -```{code-cell} -if __name__ == "__main__": - try: - checkpointing_example(n_iterations=10) - except RuntimeError as e: # noqa : F841 - # Since no retries are performed, an exception is expected when run locally - pass +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/checkpoint.py +:caption: advanced_composition/checkpoint.py +:lines: 37-42 ``` ## Run the example on the Flyte cluster @@ -135,3 +90,5 @@ pyflyte run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/checkpoint.py \ checkpointing_example --n_iterations 10 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/advanced_composition/map_tasks.md b/docs/user_guide/advanced_composition/map_tasks.md index 6449b6d124..8c3127fc4e 100644 --- a/docs/user_guide/advanced_composition/map_tasks.md +++ b/docs/user_guide/advanced_composition/map_tasks.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (map_task)= # Map tasks @@ -36,37 +17,24 @@ Map tasks find utility in diverse scenarios, such as: The following examples demonstrate how to use map tasks with both single and multiple inputs. -To begin, import the required libraries. - -```{code-cell} -from flytekit import map_task, task, workflow +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} - -Here's a simple workflow that uses {py:func}`map_task `. - -```{code-cell} -threshold = 11 - - -@task -def detect_anomalies(data_point: int) -> bool: - return data_point > threshold - +To begin, import the required libraries: -@workflow -def map_workflow(data: list[int] = [10, 12, 11, 10, 13, 12, 100, 11, 12, 10]) -> list[bool]: - # Use the map task to apply the anomaly detection function to each data point - return map_task(detect_anomalies)(data_point=data) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py +:caption: advanced_composition/map_task.py +:lines: 1 +``` +Here's a simple workflow that uses {py:func}`map_task `: -if __name__ == "__main__": - print(f"Anomalies Detected: {map_workflow()}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py +:caption: advanced_composition/map_task.py +:lines: 4-19 ``` -+++ {"lines_to_next_cell": 0} - To customize resource allocations, such as memory usage for individual map tasks, you can leverage `with_overrides`. Here's an example using the `detect_anomalies` map task within a workflow: @@ -79,7 +47,7 @@ def map_workflow_with_resource_overrides(data: list[int] = [10, 12, 11, 10, 13, return map_task(detect_anomalies)(data_point=data).with_overrides(requests=Resources(mem="2Gi")) ``` -You can use {py:class}`~flytekit.TaskMetadata` to set attributes such as `cache`, `cache_version`, `interruptible`, `retries` and `timeout`. +You can use {py:class}`~flytekit.TaskMetadata` to set attributes such as `cache`, `cache_version`, `interruptible`, `retries` and `timeout`: ```python from flytekit import TaskMetadata @@ -93,10 +61,8 @@ def map_workflow_with_metadata(data: list[int] = [10, 12, 11, 10, 13, 12, 100, 1 You can also configure `concurrency` and `min_success_ratio` for a map task: - `concurrency` limits the number of mapped tasks that can run in parallel to the specified batch size. -If the input size exceeds the concurrency value, multiple batches will run serially until all inputs are processed. -If left unspecified, it implies unbounded concurrency. -- `min_success_ratio` determines the minimum fraction of total jobs that must complete successfully before terminating -the map task and marking it as successful. +If the input size exceeds the concurrency value, multiple batches will run serially until all inputs are processed. If left unspecified, it implies unbounded concurrency. +- `min_success_ratio` determines the minimum fraction of total jobs that must complete successfully before terminating the map task and marking it as successful. ```python @workflow @@ -107,94 +73,52 @@ def map_workflow_with_additional_params(data: list[int] = [10, 12, 11, 10, 13, 1 A map task internally uses a compression algorithm (bitsets) to handle every Flyte workflow node’s metadata, which would have otherwise been in the order of 100s of bytes. -When defining a map task, avoid calling other tasks in it. Flyte -can't accurately register tasks that call other tasks. While Flyte -will correctly execute a task that calls other tasks, it will not be -able to give full performance advantages. This is -especially true for map tasks. +When defining a map task, avoid calling other tasks in it. Flyte can't accurately register tasks that call other tasks. While Flyte will correctly execute a task that calls other tasks, it will not be able to give full performance advantages. This is especially true for map tasks. -In this example, the map task `suboptimal_mappable_task` would not -give you the best performance. +In this example, the map task `suboptimal_mappable_task` would not give you the best performance: -```{code-cell} -@task -def upperhalf(a: int) -> int: - return a / 2 + 1 - - -@task -def suboptimal_mappable_task(a: int) -> str: - inc = upperhalf(a=a) - stringified = str(inc) - return stringified +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py +:caption: advanced_composition/map_task.py +:lines: 31-40 ``` -+++ {"lines_to_next_cell": 0} - By default, the map task utilizes the Kubernetes array plugin for execution. However, map tasks can also be run on alternate execution backends. For example, you can configure the map task to run on -[AWS Batch](https://docs.flyte.org/en/latest/deployment/plugin_setup/aws/batch.html#deployment-plugin-setup-aws-array), -a provisioned service that offers scalability for handling large-scale tasks. +[AWS Batch](https://docs.flyte.org/en/latest/deployment/plugin_setup/aws/batch.html#deployment-plugin-setup-aws-array), a provisioned service that offers scalability for handling large-scale tasks. ## Map a task with multiple inputs You might need to map a task with multiple inputs. -For instance, consider a task that requires three inputs. +For instance, consider a task that requires three inputs: -```{code-cell} -@task -def multi_input_task(quantity: int, price: float, shipping: float) -> float: - return quantity * price * shipping +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py +:caption: advanced_composition/map_task.py +:pyobject: multi_input_task ``` -+++ {"lines_to_next_cell": 0} - You may want to map this task with only the ``quantity`` input, while keeping the other inputs unchanged. Since a map task accepts only one input, you can achieve this by partially binding values to the map task. -This can be done using the {py:func}`functools.partial` function. - -```{code-cell} -import functools +This can be done using the {py:func}`functools.partial` function: - -@workflow -def multiple_inputs_map_workflow(list_q: list[int] = [1, 2, 3, 4, 5], p: float = 6.0, s: float = 7.0) -> list[float]: - partial_task = functools.partial(multi_input_task, price=p, shipping=s) - return map_task(partial_task)(quantity=list_q) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py +:caption: advanced_composition/map_task.py +:lines: 52-58 ``` -+++ {"lines_to_next_cell": 0} +Another possibility is to bind the outputs of a task to partials: -Another possibility is to bind the outputs of a task to partials. - -```{code-cell} -@task -def get_price() -> float: - return 7.0 - - -@workflow -def map_workflow_partial_with_task_output(list_q: list[int] = [1, 2, 3, 4, 5], s: float = 6.0) -> list[float]: - p = get_price() - partial_task = functools.partial(multi_input_task, price=p, shipping=s) - return map_task(partial_task)(quantity=list_q) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py +:caption: advanced_composition/map_task.py +:lines: 63-72 ``` -+++ {"lines_to_next_cell": 0} - -You can also provide multiple lists as input to a ``map_task``. +You can also provide multiple lists as input to a `map_task`: -```{code-cell} -:lines_to_next_cell: 2 - -@workflow -def map_workflow_with_lists( - list_q: list[int] = [1, 2, 3, 4, 5], list_p: list[float] = [6.0, 9.0, 8.7, 6.5, 1.2], s: float = 6.0 -) -> list[float]: - partial_task = functools.partial(multi_input_task, shipping=s) - return map_task(partial_task)(quantity=list_q, price=list_p) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/map_task.py +:caption: advanced_composition/map_task.py +:pyobject: map_workflow_with_lists ``` ```{note} @@ -239,7 +163,7 @@ pyflyte run --remote \ :::{important} This feature is experimental and the API is subject to breaking changes. -If you encounter any issues please consider submitting a +If you encounter any issues, please submit a [bug report](https://github.com/flyteorg/flyte/issues/new?assignees=&labels=bug%2Cuntriaged&projects=&template=bug_report.yaml&title=%5BBUG%5D+). ::: @@ -276,3 +200,5 @@ In contrast to map tasks, an ArrayNode provides the following enhancements: - **Multiple input values**. Subtasks can be defined with multiple input values, enhancing their versatility. We expect the performance of ArrayNode map tasks to compare closely to standard map tasks. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/advanced_composition/subworkflows.md b/docs/user_guide/advanced_composition/subworkflows.md index 59826aa491..8c4971e853 100644 --- a/docs/user_guide/advanced_composition/subworkflows.md +++ b/docs/user_guide/advanced_composition/subworkflows.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (subworkflow)= # Subworkflows @@ -37,81 +18,43 @@ Consequently, all nodes of a subworkflow adhere to the overall constraints impos Consider this scenario: when workflow `A` is integrated as a subworkflow of workflow `B`, running workflow `B` results in the entire graph of workflow `A` being duplicated into workflow `B` at the point of invocation. -Here's an example illustrating the calculation of slope, intercept and the corresponding y-value. - -```{code-cell} -from flytekit import task, workflow - - -@task -def slope(x: list[int], y: list[int]) -> float: - sum_xy = sum([x[i] * y[i] for i in range(len(x))]) - sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) - n = len(x) - return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) - - -@task -def intercept(x: list[int], y: list[int], slope: float) -> float: - mean_x = sum(x) / len(x) - mean_y = sum(y) / len(y) - intercept = mean_y - slope * mean_x - return intercept - - -@workflow -def slope_intercept_wf(x: list[int], y: list[int]) -> (float, float): - slope_value = slope(x=x, y=y) - intercept_value = intercept(x=x, y=y, slope=slope_value) - return (slope_value, intercept_value) - - -@task -def regression_line(val: int, slope_value: float, intercept_value: float) -> float: - return (slope_value * val) + intercept_value # y = mx + c +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` +Here's an example illustrating the calculation of slope, intercept and the corresponding y-value: -@workflow -def regression_line_wf(val: int = 5, x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> float: - slope_value, intercept_value = slope_intercept_wf(x=x, y=y) - return regression_line(val=val, slope_value=slope_value, intercept_value=intercept_value) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py +:caption: advanced_composition/subworkflow.py +:lines: 1-35 ``` -+++ {"lines_to_next_cell": 0} - The `slope_intercept_wf` computes the slope and intercept of the regression line. Subsequently, the `regression_line_wf` triggers `slope_intercept_wf` and then computes the y-value. To execute the workflow locally, use the following: -```{code-cell} -if __name__ == "__main__": - print(f"Executing regression_line_wf(): {regression_line_wf()}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py +:caption: advanced_composition/subworkflow.py +:lines: 39-40 ``` -+++ {"lines_to_next_cell": 0} - It's possible to nest a workflow that contains a subworkflow within another workflow. Workflows can be easily constructed from other workflows, even if they function as standalone entities. -Each workflow in this module has the capability to exist and run independently. +Each workflow in this module has the capability to exist and run independently: -```{code-cell} -@workflow -def nested_regression_line_wf() -> float: - return regression_line_wf() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py +:caption: advanced_composition/subworkflow.py +:pyobject: nested_regression_line_wf ``` -+++ {"lines_to_next_cell": 0} - -You can run the nested workflow locally as well. +You can run the nested workflow locally as well: -```{code-cell} -if __name__ == "__main__": - print(f"Running nested_regression_line_wf(): {nested_regression_line_wf()}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py +:caption: advanced_composition/subworkflow.py +:lines: 52-53 ``` -+++ {"lines_to_next_cell": 0} - ## External workflow When launch plans are employed within a workflow to initiate the execution of a pre-defined workflow, @@ -128,23 +71,11 @@ external workflows may offer a way to distribute the workload of a workflow acro Here's an example that illustrates the concept of external workflows: -```{code-cell} - -from flytekit import LaunchPlan - -launch_plan = LaunchPlan.get_or_create( - regression_line_wf, "regression_line_workflow", default_inputs={"val": 7, "x": [-3, 0, 3], "y": [7, 4, -2]} -) - - -@workflow -def nested_regression_line_lp() -> float: - # Trigger launch plan from within a workflow - return launch_plan() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py +:caption: advanced_composition/subworkflow.py +:lines: 61-71 ``` -+++ {"lines_to_next_cell": 0} - :::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_external_workflow_execution.png :alt: External workflow execution :class: with-shadow @@ -154,9 +85,9 @@ In the console screenshot above, note that the launch plan execution ID differs You can run a workflow containing an external workflow locally as follows: -```{code-cell} -if __name__ == "__main__": - print(f"Running nested_regression_line_lp(): {nested_regression_line_lp}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py +:caption: advanced_composition/subworkflow.py +:lines: 75-76 ``` ## Run the example on a Flyte cluster @@ -180,3 +111,5 @@ pyflyte run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/subworkflow.py \ nested_regression_line_lp ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/advanced_composition/waiting_for_external_inputs.md b/docs/user_guide/advanced_composition/waiting_for_external_inputs.md index d694b62443..6af1782c12 100644 --- a/docs/user_guide/advanced_composition/waiting_for_external_inputs.md +++ b/docs/user_guide/advanced_composition/waiting_for_external_inputs.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - # Waiting for external inputs *New in Flyte 1.3.0* @@ -58,30 +39,14 @@ Though this type of node may not be used often in a production setting, you might want to use it, for example, if you want to simulate a delay in your workflow to mock out the behavior of some long-running computation. -```{code-cell} -from datetime import timedelta - -from flytekit import sleep, task, workflow - - -@task -def long_running_computation(num: int) -> int: - """A mock task pretending to be a long-running computation.""" - return num - - -@workflow -def sleep_wf(num: int) -> int: - """Simulate a "long-running" computation with sleep.""" - - # increase the sleep duration to actually make it long-running - sleeping = sleep(timedelta(seconds=10)) - result = long_running_computation(num=num) - sleeping >> result - return result +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/waiting_for_external_inputs.py +:caption: advanced_composition/waiting_for_external_inputs.py +:lines: 1-20 +``` As you can see above, we define a simple `add_one` task and a `sleep_wf` workflow. We first create a `sleeping` and `result` node, then @@ -97,6 +62,10 @@ You can learn more about the `>>` chaining operator Now that you have a general sense of how this works, let's move onto the {func}`~flytekit.wait_for_input` workflow node. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` + ## Supply external inputs with `wait_for_input` With the {py:func}`~flytekit.wait_for_input` node, you can pause a @@ -106,33 +75,9 @@ but before publishing it you want to give it a custom title. You can achieve this by defining a `wait_for_input` node that takes a `str` input and finalizes the report: -```{code-cell} -import typing - -from flytekit import wait_for_input - - -@task -def create_report(data: typing.List[float]) -> dict: # o0 - """A toy report task.""" - return { - "mean": sum(data) / len(data), - "length": len(data), - "max": max(data), - "min": min(data), - } - - -@task -def finalize_report(report: dict, title: str) -> dict: - return {"title": title, **report} - - -@workflow -def reporting_wf(data: typing.List[float]) -> dict: - report = create_report(data=data) - title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) - return finalize_report(report=report, title=title_input) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/waiting_for_external_inputs.py +:caption: advanced_composition/waiting_for_external_inputs.py +:lines: 24-49 ``` Let's breakdown what's happening in the code above: @@ -162,23 +107,11 @@ an explicit approval signal before continuing execution. Going back to our report-publishing use case, suppose that we want to block the publishing of a report for some reason (e.g. if they don't appear to be valid): -```{code-cell} -from flytekit import approve - - -@workflow -def reporting_with_approval_wf(data: typing.List[float]) -> dict: - report = create_report(data=data) - title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) - final_report = finalize_report(report=report, title=title_input) - - # approve the final report, where the output of approve is the final_report - # dictionary. - return approve(final_report, "approve-final-report", timeout=timedelta(hours=2)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/waiting_for_external_inputs.py +:caption: advanced_composition/waiting_for_external_inputs.py +:lines: 53-64 ``` -+++ {"lines_to_next_cell": 0} - The `approve` node will pass the `final_report` promise through as the output of the workflow, provided that the `approve-final-report` gets an approval input via the Flyte UI or Flyte API. @@ -187,25 +120,11 @@ You can also use the output of the `approve` function as a promise, feeding it to a subsequent task. Let's create a version of our report-publishing workflow where the approval happens after `create_report`: -```{code-cell} -@workflow -def approval_as_promise_wf(data: typing.List[float]) -> dict: - report = create_report(data=data) - title_input = wait_for_input("title", timeout=timedelta(hours=1), expected_type=str) - - # wait for report to run so that the user can view it before adding a custom - # title to the report - report >> title_input - - final_report = finalize_report( - report=approve(report, "raw-report-approval", timeout=timedelta(hours=2)), - title=title_input, - ) - return final_report +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/waiting_for_external_inputs.py +:caption: advanced_composition/waiting_for_external_inputs.py +:pyobject: approval_as_promise_wf ``` -+++ {"lines_to_next_cell": 0} - ## Working with conditionals The node constructs by themselves are useful, but they become even more @@ -214,36 +133,9 @@ useful when we combine them with other Flyte constructs, like {ref}`conditionals To illustrate this, let's extend the report-publishing use case so that we produce an "invalid report" output in case we don't approve the final report: -```{code-cell} -:lines_to_next_cell: 2 - -from flytekit import conditional - - -@task -def invalid_report() -> dict: - return {"invalid_report": True} - - -@workflow -def conditional_wf(data: typing.List[float]) -> dict: - report = create_report(data=data) - title_input = wait_for_input("title-input", timeout=timedelta(hours=1), expected_type=str) - - # Define a "review-passes" wait_for_input node so that a human can review - # the report before finalizing it. - review_passed = wait_for_input("review-passes", timeout=timedelta(hours=2), expected_type=bool) - report >> review_passed - - # This conditional returns the finalized report if the review passes, - # otherwise it returns an invalid report output. - return ( - conditional("final-report-condition") - .if_(review_passed.is_true()) - .then(finalize_report(report=report, title=title_input)) - .else_() - .then(invalid_report()) - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/advanced_composition/advanced_composition/waiting_for_external_inputs.py +:caption: advanced_composition/waiting_for_external_inputs.py +:lines: 88-114 ``` On top of the `approved` node, which we use in the `conditional` to @@ -312,3 +204,5 @@ remote.set_signal("title-input", execution.id.name, "my report") # node is in the `signals` list above remote.set_signal("review-passes", execution.id.name, True) ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/advanced_composition/ diff --git a/docs/user_guide/basics/documenting_workflows.md b/docs/user_guide/basics/documenting_workflows.md index d6a561c532..3a29f9e562 100644 --- a/docs/user_guide/basics/documenting_workflows.md +++ b/docs/user_guide/basics/documenting_workflows.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - # Documenting workflows ```{eval-rst} @@ -28,24 +9,24 @@ Flyte enables the use of docstrings to document your code. Docstrings are stored in [FlyteAdmin](https://docs.flyte.org/en/latest/concepts/admin.html) and displayed on the UI. -To begin, import the relevant libraries. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -from typing import Tuple +To begin, import the relevant libraries: -from flytekit import workflow +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/documenting_workflows.py +:caption: basics/documenting_workflows.py +:lines: 1-3 ``` -+++ {"lines_to_next_cell": 0} - We import the `slope` and `intercept` tasks from the `workflow.py` file. -```{code-cell} -from .workflow import intercept, slope +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/documenting_workflows.py +:caption: basics/documenting_workflows.py +:lines: 6 ``` -+++ {"lines_to_next_cell": 0} - ## Sphinx-style docstring An example to demonstrate Sphinx-style docstring. @@ -54,26 +35,11 @@ The initial section of the docstring provides a concise overview of the workflow The subsequent section provides a comprehensive explanation. The last part of the docstring outlines the parameters and return type. -```{code-cell} -@workflow -def sphinx_docstring_wf(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> Tuple[float, float]: - """ - Slope and intercept of a regression line - - This workflow accepts a list of coefficient pairs for a regression line. - It calculates both the slope and intercept of the regression line. - - :param x: List of x-coefficients - :param y: List of y-coefficients - :return: Slope and intercept values - """ - slope_value = slope(x=x, y=y) - intercept_value = intercept(x=x, y=y, slope=slope_value) - return slope_value, intercept_value +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/documenting_workflows.py +:caption: basics/documenting_workflows.py +:pyobject: sphinx_docstring_wf ``` -+++ {"lines_to_next_cell": 0} - ## NumPy-style docstring An example to demonstrate NumPy-style docstring. @@ -83,34 +49,11 @@ The next section offers a comprehensive description. The third section of the docstring details all parameters along with their respective data types. The final section of the docstring explains the return type and its associated data type. -```{code-cell} -@workflow -def numpy_docstring_wf(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> Tuple[float, float]: - """ - Slope and intercept of a regression line - - This workflow accepts a list of coefficient pairs for a regression line. - It calculates both the slope and intercept of the regression line. - - Parameters - ---------- - x : list[int] - List of x-coefficients - y : list[int] - List of y-coefficients - - Returns - ------- - out : Tuple[float, float] - Slope and intercept values - """ - slope_value = slope(x=x, y=y) - intercept_value = intercept(x=x, y=y, slope=slope_value) - return slope_value, intercept_value +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/documenting_workflows.py +:caption: basics/documenting_workflows.py +:pyobject: numpy_docstring_wf ``` -+++ {"lines_to_next_cell": 0} - ## Google-style docstring An example to demonstrate Google-style docstring. @@ -120,27 +63,9 @@ The subsequent section of the docstring provides an extensive explanation. The third segment of the docstring outlines the parameters and return type, including their respective data types. -```{code-cell} -:lines_to_next_cell: 2 - -@workflow -def google_docstring_wf(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> Tuple[float, float]: - """ - Slope and intercept of a regression line - - This workflow accepts a list of coefficient pairs for a regression line. - It calculates both the slope and intercept of the regression line. - - Args: - x (list[int]): List of x-coefficients - y (list[int]): List of y-coefficients - - Returns: - Tuple[float, float]: Slope and intercept values - """ - slope_value = slope(x=x, y=y) - intercept_value = intercept(x=x, y=y, slope=slope_value) - return slope_value, intercept_value +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/documenting_workflows.py +:caption: basics/documenting_workflows.py +:pyobject: google_docstring_wf ``` Here are two screenshots showcasing how the description appears on the UI: @@ -155,3 +80,5 @@ Here are two screenshots showcasing how the description appears on the UI: :alt: Long description :class: with-shadow ::: + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/basics diff --git a/docs/user_guide/basics/hello_world.md b/docs/user_guide/basics/hello_world.md index 45e5e89c4d..63b830c010 100644 --- a/docs/user_guide/basics/hello_world.md +++ b/docs/user_guide/basics/hello_world.md @@ -1,23 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - - # Hello, World! ```{eval-rst} @@ -31,45 +11,41 @@ Flyte tasks are the core building blocks of larger, more complex workflows. Workflows compose multiple tasks – or other workflows – into meaningful steps of computation to produce some useful set of outputs or outcomes. -To begin, import `task` and `workflow` from the `flytekit` library. - -```{code-cell} -from flytekit import task, workflow +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +To begin, import `task` and `workflow` from the `flytekit` library: + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py +:caption: basics/hello_world.py +:lines: 1 +``` Define a task that produces the string "Hello, World!". -Simply using the `@task` decorator to annotate the Python function. +Simply using the `@task` decorator to annotate the Python function: -```{code-cell} -@task -def say_hello() -> str: - return "Hello, World!" +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py +:caption: basics/hello_world.py +:pyobject: say_hello ``` -+++ {"lines_to_next_cell": 0} - You can handle the output of a task in the same way you would with a regular Python function. -Store the output in a variable and use it as a return value for a Flyte workflow. +Store the output in a variable and use it as a return value for a Flyte workflow: -```{code-cell} -@workflow -def hello_world_wf() -> str: - res = say_hello() - return res +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py +:caption: basics/hello_world.py +:pyobject: hello_world_wf ``` -+++ {"lines_to_next_cell": 0} - -Run the workflow by simply calling it like a Python function. +Run the workflow by simply calling it like a Python function: -```{code-cell} -:lines_to_next_cell: 2 - -if __name__ == "__main__": - print(f"Running hello_world_wf() {hello_world_wf()}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py +:caption: basics/hello_world.py +:lines: 19-20 ``` Next, let's delve into the specifics of {ref}`tasks `, {ref}`workflows ` and {ref}`launch plans `. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/basics/ diff --git a/docs/user_guide/basics/imperative_workflows.md b/docs/user_guide/basics/imperative_workflows.md index b5da5b6336..562685f32a 100644 --- a/docs/user_guide/basics/imperative_workflows.md +++ b/docs/user_guide/basics/imperative_workflows.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (imperative_workflow)= # Imperative workflows @@ -36,73 +17,67 @@ in textual form (perhaps during a transition from a legacy system). In such scenarios, you want to orchestrate these tasks. This is where Flyte's imperative workflows come into play, allowing you to programmatically construct workflows. -To begin, import the necessary dependencies. - -```{code-cell} -from flytekit import Workflow +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} - -We import the `slope` and `intercept` tasks from the `workflow.py` file. +To begin, import the necessary dependencies: -```{code-cell} -from .workflow import intercept, slope +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/imperative_workflow.py +:caption: basics/imperative_workflow.py +:lines: 1 ``` -+++ {"lines_to_next_cell": 0} +We import the `slope` and `intercept` tasks from the `workflow.py` file: -Create an imperative workflow. - -```{code-cell} -imperative_wf = Workflow(name="imperative_workflow") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/imperative_workflow.py +:caption: basics/imperative_workflow.py +:lines: 4 ``` -+++ {"lines_to_next_cell": 0} - -Add the workflow inputs to the imperative workflow. +Create an imperative workflow: -```{code-cell} -imperative_wf.add_workflow_input("x", list[int]) -imperative_wf.add_workflow_input("y", list[int]) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/imperative_workflow.py +:caption: basics/imperative_workflow.py +:lines: 7 ``` -+++ {"lines_to_next_cell": 0} +Add the workflow inputs to the imperative workflow: + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/imperative_workflow.py +:caption: basics/imperative_workflow.py +:lines: 11-12 +``` ::: {note} If you want to assign default values to the workflow inputs, you can create a {ref}`launch plan `. ::: -Add the tasks that need to be triggered from within the workflow. +Add the tasks that need to be triggered from within the workflow: -```{code-cell} -node_t1 = imperative_wf.add_entity(slope, x=imperative_wf.inputs["x"], y=imperative_wf.inputs["y"]) -node_t2 = imperative_wf.add_entity( - intercept, x=imperative_wf.inputs["x"], y=imperative_wf.inputs["y"], slope=node_t1.outputs["o0"] -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/imperative_workflow.py +:caption: basics/imperative_workflow.py +:lines: 16-19 ``` -+++ {"lines_to_next_cell": 0} - -Lastly, add the workflow output. +Lastly, add the workflow output: -```{code-cell} -imperative_wf.add_workflow_output("wf_output", node_t2.outputs["o0"]) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/imperative_workflow.py +:caption: basics/imperative_workflow.py +:lines: 23 ``` -+++ {"lines_to_next_cell": 0} - You can execute the workflow locally as follows: -```{code-cell} -if __name__ == "__main__": - print(f"Running imperative_wf() {imperative_wf(x=[-3, 0, 3], y=[7, 4, -2])}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/imperative_workflow.py +:caption: basics/imperative_workflow.py +:lines: 27-28 ``` :::{note} You also have the option to provide a list of inputs and -retrieve a list of outputs from the workflow. +retrieve a list of outputs from the workflow: ```python wf_input_y = imperative_wf.add_workflow_input("y", list[str]) @@ -117,3 +92,5 @@ wf.add_workflow_output( ) ``` ::: + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/basics/ diff --git a/docs/user_guide/basics/launch_plans.md b/docs/user_guide/basics/launch_plans.md index 01eb9d1051..7ef61e4d2f 100644 --- a/docs/user_guide/basics/launch_plans.md +++ b/docs/user_guide/basics/launch_plans.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (launch_plan)= # Launch plans @@ -40,73 +21,67 @@ When a workflow is serialized and registered, a _default launch plan_ is generat This default launch plan can bind default workflow inputs and runtime options defined in the project's flytekit configuration (such as user role). -To begin, import the necessary libraries. - -```{code-cell} -from flytekit import LaunchPlan, current_context +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} - -We import the workflow from the `workflow.py` file for which we're going to create a launch plan. +To begin, import the necessary libraries: -```{code-cell} -from .workflow import simple_wf +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/launch_plan.py +:caption: basics/launch_plan.py +:lines: 1 ``` -+++ {"lines_to_next_cell": 0} +We import the workflow from the `workflow.py` file for which we're going to create a launch plan: -Create a default launch plan with no inputs during serialization. - -```{code-cell} -default_lp = LaunchPlan.get_default_launch_plan(current_context(), simple_wf) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/launch_plan.py +:caption: basics/launch_plan.py +:lines: 5 ``` -+++ {"lines_to_next_cell": 0} +Create a default launch plan with no inputs during serialization: + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/launch_plan.py +:caption: basics/launch_plan.py +:lines: 8 +``` You can run the launch plan locally as follows: -```{code-cell} -default_lp(x=[-3, 0, 3], y=[7, 4, -2]) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/launch_plan.py +:caption: basics/launch_plan.py +:lines: 11 ``` -+++ {"lines_to_next_cell": 0} +Create a launch plan and specify the default inputs: -Create a launch plan and specify the default inputs. - -```{code-cell} -simple_wf_lp = LaunchPlan.create( - name="simple_wf_lp", workflow=simple_wf, default_inputs={"x": [-3, 0, 3], "y": [7, 4, -2]} -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/launch_plan.py +:caption: basics/launch_plan.py +:lines: 14-16 ``` -+++ {"lines_to_next_cell": 0} - You can trigger the launch plan locally as follows: -```{code-cell} -simple_wf_lp() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/launch_plan.py +:caption: basics/launch_plan.py +:lines: 19 ``` -+++ {"lines_to_next_cell": 0} - You can override the defaults as follows: -```{code-cell} -simple_wf_lp(x=[3, 5, 3], y=[-3, 2, -2]) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/launch_plan.py +:caption: basics/launch_plan.py +:lines: 22 ``` -+++ {"lines_to_next_cell": 0} +It's possible to lock launch plan inputs, preventing them from being overridden during execution: -It's possible to lock launch plan inputs, preventing them from being overridden during execution. - -```{code-cell} -simple_wf_lp_fixed_inputs = LaunchPlan.get_or_create( - name="fixed_inputs", workflow=simple_wf, fixed_inputs={"x": [-3, 0, 3]} -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/launch_plan.py +:caption: basics/launch_plan.py +:lines: 25-27 ``` -Attempting to modify the inputs will result in an error being raised by Flyte. +Attempting to modify the inputs will result in an error being raised by Flyte: :::{note} You can employ default and fixed inputs in conjunction in a launch plan. @@ -114,3 +89,5 @@ You can employ default and fixed inputs in conjunction in a launch plan. Launch plans can also be used to run workflows on a specific cadence. For more information, refer to the {ref}`scheduling_launch_plan` documentation. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/basics/ diff --git a/docs/user_guide/basics/named_outputs.md b/docs/user_guide/basics/named_outputs.md index a609cd50a9..2e02678822 100644 --- a/docs/user_guide/basics/named_outputs.md +++ b/docs/user_guide/basics/named_outputs.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (named_outputs)= # Named outputs @@ -35,48 +16,31 @@ and you wish to assign a distinct name to each of them. The following example illustrates the process of assigning names to outputs for both a task and a workflow. -To begin, import the required dependencies. - -```{code-cell} -from typing import NamedTuple - -from flytekit import task, workflow +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} - -Define a `NamedTuple` and assign it as an output to a task. +To begin, import the required dependencies: -```{code-cell} -slope_value = NamedTuple("slope_value", [("slope", float)]) - - -@task -def slope(x: list[int], y: list[int]) -> slope_value: - sum_xy = sum([x[i] * y[i] for i in range(len(x))]) - sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) - n = len(x) - return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/named_outputs.py +:caption: basics/named_outputs.py +:lines: 1-3 ``` -+++ {"lines_to_next_cell": 0} - -Likewise, assign a `NamedTuple` to the output of `intercept` task. +Define a `NamedTuple` and assign it as an output to a task: -```{code-cell} -intercept_value = NamedTuple("intercept_value", [("intercept", float)]) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/named_outputs.py +:caption: basics/named_outputs.py +:lines: 6-14 +``` +Likewise, assign a `NamedTuple` to the output of `intercept` task: -@task -def intercept(x: list[int], y: list[int], slope: float) -> intercept_value: - mean_x = sum(x) / len(x) - mean_y = sum(y) / len(y) - intercept = mean_y - slope * mean_x - return intercept +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/named_outputs.py +:caption: basics/named_outputs.py +:lines: 18-26 ``` -+++ {"lines_to_next_cell": 0} - :::{note} While it's possible to create `NamedTuple`s directly within the code, it's often better to declare them explicitly. This helps prevent potential linting errors in tools like mypy. @@ -92,25 +56,19 @@ Additionally, you can also have the workflow return a `NamedTuple` as an output. :::{note} Remember that we are extracting individual task execution outputs by dereferencing them. -This is necessary because `NamedTuple`s function as tuples and require this dereferencing. +This is necessary because `NamedTuple`s function as tuples and require this dereferencing: ::: -```{code-cell} -slope_and_intercept_values = NamedTuple("slope_and_intercept_values", [("slope", float), ("intercept", float)]) - - -@workflow -def simple_wf_with_named_outputs(x: list[int] = [-3, 0, 3], y: list[int] = [7, 4, -2]) -> slope_and_intercept_values: - slope_value = slope(x=x, y=y) - intercept_value = intercept(x=x, y=y, slope=slope_value.slope) - return slope_and_intercept_values(slope=slope_value.slope, intercept=intercept_value.intercept) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/named_outputs.py +:caption: basics/named_outputs.py +:lines: 32-39 ``` -+++ {"lines_to_next_cell": 0} - You can run the workflow locally as follows: -```{code-cell} -if __name__ == "__main__": - print(f"Running simple_wf_with_named_outputs() {simple_wf_with_named_outputs()}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/named_outputs.py +:caption: basics/named_outputs.py +:lines: 43-44 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/basics/ diff --git a/docs/user_guide/basics/shell_tasks.md b/docs/user_guide/basics/shell_tasks.md index 73cc5ab6b8..95df9a90d8 100644 --- a/docs/user_guide/basics/shell_tasks.md +++ b/docs/user_guide/basics/shell_tasks.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (shell_task)= # Shell tasks @@ -28,72 +9,25 @@ kernelspec: To execute bash scripts within Flyte, you can utilize the {py:class}`~flytekit.extras.tasks.shell.ShellTask` class. This example includes three shell tasks to execute bash commands. -First, import the necessary libraries. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -from pathlib import Path -from typing import Tuple +First, import the necessary libraries: -import flytekit -from flytekit import kwtypes, task, workflow -from flytekit.extras.tasks.shell import OutputLocation, ShellTask -from flytekit.types.directory import FlyteDirectory -from flytekit.types.file import FlyteFile +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/shell_task.py +:caption: basics/shell_task.py +:lines: 1-8 ``` -+++ {"lines_to_next_cell": 0} - With the required imports in place, you can proceed to define a shell task. To create a shell task, provide a name for it, specify the bash script to be executed, -and define inputs and outputs if needed. - -```{code-cell} -t1 = ShellTask( - name="task_1", - debug=True, - script=""" - set -ex - echo "Hey there! Let's run some bash scripts using Flyte's ShellTask." - echo "Showcasing Flyte's Shell Task." >> {inputs.x} - if grep "Flyte" {inputs.x} - then - echo "Found it!" >> {inputs.x} - else - echo "Not found!" - fi - """, - inputs=kwtypes(x=FlyteFile), - output_locs=[OutputLocation(var="i", var_type=FlyteFile, location="{inputs.x}")], -) - - -t2 = ShellTask( - name="task_2", - debug=True, - script=""" - set -ex - cp {inputs.x} {inputs.y} - tar -zcvf {outputs.j} {inputs.y} - """, - inputs=kwtypes(x=FlyteFile, y=FlyteDirectory), - output_locs=[OutputLocation(var="j", var_type=FlyteFile, location="{inputs.y}.tar.gz")], -) - - -t3 = ShellTask( - name="task_3", - debug=True, - script=""" - set -ex - tar -zxvf {inputs.z} - cat {inputs.y}/$(basename {inputs.x}) | wc -m > {outputs.k} - """, - inputs=kwtypes(x=FlyteFile, y=FlyteDirectory, z=FlyteFile), - output_locs=[OutputLocation(var="k", var_type=FlyteFile, location="output.txt")], -) -``` +and define inputs and outputs if needed: -+++ {"lines_to_next_cell": 0} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/shell_task.py +:caption: basics/shell_task.py +:lines: 13-55 +``` Here's a breakdown of the parameters of the `ShellTask`: @@ -104,42 +38,25 @@ Here's a breakdown of the parameters of the `ShellTask`: - The `debug` parameter is helpful for debugging purposes We define a task to instantiate `FlyteFile` and `FlyteDirectory`. -A `.gitkeep` file is created in the FlyteDirectory as a placeholder to ensure the directory exists. +A `.gitkeep` file is created in the FlyteDirectory as a placeholder to ensure the directory exists: -```{code-cell} -@task -def create_entities() -> Tuple[FlyteFile, FlyteDirectory]: - working_dir = Path(flytekit.current_context().working_directory) - flytefile = working_dir / "test.txt" - flytefile.touch() - - flytedir = working_dir / "testdata" - flytedir.mkdir(exist_ok=True) - - flytedir_file = flytedir / ".gitkeep" - flytedir_file.touch() - return flytefile, flytedir +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/shell_task.py +:caption: basics/shell_task.py +:pyobject: create_entities ``` -+++ {"lines_to_next_cell": 0} - -We create a workflow to define the dependencies between the tasks. +We create a workflow to define the dependencies between the tasks: -```{code-cell} -@workflow -def shell_task_wf() -> FlyteFile: - x, y = create_entities() - t1_out = t1(x=x) - t2_out = t2(x=t1_out, y=y) - t3_out = t3(x=x, y=y, z=t2_out) - return t3_out +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/shell_task.py +:caption: basics/shell_task.py +:pyobject: shell_task_wf ``` -+++ {"lines_to_next_cell": 0} +You can run the workflow locally: -You can run the workflow locally. - -```{code-cell} -if __name__ == "__main__": - print(f"Running shell_task_wf() {shell_task_wf()}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/shell_task.py +:caption: basics/shell_task.py +:lines: 85-86 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/basics/ diff --git a/docs/user_guide/basics/tasks.md b/docs/user_guide/basics/tasks.md index 3f9fcb493d..8c059e8d02 100644 --- a/docs/user_guide/basics/tasks.md +++ b/docs/user_guide/basics/tasks.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (task)= # Tasks @@ -47,42 +28,39 @@ Flyte offers numerous plugins for tasks, including backend plugins like This example demonstrates how to write and execute a [Python function task](https://github.com/flyteorg/flytekit/blob/master/flytekit/core/python_function_task.py#L75). -To begin, import `task` from the `flytekit` library. - -```{code-cell} -from flytekit import task +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +To begin, import `task` from the `flytekit` library: + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/task.py +:caption: basics/task.py +:lines: 1 +``` The use of the {py:func}`~flytekit.task` decorator is mandatory for a ``PythonFunctionTask``. A task is essentially a regular Python function, with the exception that all inputs and outputs must be clearly annotated with their types. Learn more about the supported types in the {ref}`type-system section `. -We create a task that computes the slope of a regression line. +We create a task that computes the slope of a regression line: -```{code-cell} -@task -def slope(x: list[int], y: list[int]) -> float: - sum_xy = sum([x[i] * y[i] for i in range(len(x))]) - sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) - n = len(x) - return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/task.py +:caption: basics/task.py +:pyobject: slope ``` -+++ {"lines_to_next_cell": 0} - :::{note} Flytekit will assign a default name to the output variable like `out0`. In case of multiple outputs, each output will be numbered in the order starting with 0, e.g., -> `out0, out1, out2, ...`. ::: -You can execute a Flyte task just like any regular Python function. +You can execute a Flyte task just like any regular Python function: -```{code-cell} -if __name__ == "__main__": - print(slope(x=[-3, 0, 3], y=[7, 4, -2])) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/task.py +:caption: basics/task.py +:lines: 14-15 ``` :::{note} @@ -106,3 +84,5 @@ pyflyte run --remote \ https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/task.py \ slope --x '[-3,0,3]' --y '[7,4,-2]' ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/basics/ diff --git a/docs/user_guide/basics/workflows.md b/docs/user_guide/basics/workflows.md index d5f46be04e..85b6db1b8e 100644 --- a/docs/user_guide/basics/workflows.md +++ b/docs/user_guide/basics/workflows.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (workflow)= # Workflows @@ -38,55 +19,34 @@ enabling the workflow to be triggered. For more information, see the {std:ref}`registration documentation `. -To begin, import {py:func}`~flytekit.task` and {py:func}`~flytekit.workflow` from the flytekit library. - -```{code-cell} -from flytekit import task, workflow +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +To begin, import {py:func}`~flytekit.task` and {py:func}`~flytekit.workflow` from the flytekit library: -We define `slope` and `intercept` tasks to compute the slope and -intercept of the regression line, respectively. - -```{code-cell} -@task -def slope(x: list[int], y: list[int]) -> float: - sum_xy = sum([x[i] * y[i] for i in range(len(x))]) - sum_x_squared = sum([x[i] ** 2 for i in range(len(x))]) - n = len(x) - return (n * sum_xy - sum(x) * sum(y)) / (n * sum_x_squared - sum(x) ** 2) - - -@task -def intercept(x: list[int], y: list[int], slope: float) -> float: - mean_x = sum(x) / len(x) - mean_y = sum(y) / len(y) - intercept = mean_y - slope * mean_x - return intercept +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py +:caption: basics/workflow.py +:lines: 1 ``` -+++ {"lines_to_next_cell": 0} +We define `slope` and `intercept` tasks to compute the slope and +intercept of the regression line, respectively: -Define a workflow to establish the task dependencies. -Just like a task, a workflow is also strongly typed. - -```{code-cell} -@workflow -def simple_wf(x: list[int], y: list[int]) -> float: - slope_value = slope(x=x, y=y) - intercept_value = intercept(x=x, y=y, slope=slope_value) - return intercept_value +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py +:caption: basics/workflow.py +:lines: 6-19 ``` -+++ {"lines_to_next_cell": 0} +Define a workflow to establish the task dependencies. +Just like a task, a workflow is also strongly typed: + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py +:caption: basics/workflow.py +:pyobject: simple_wf +``` -The {py:func}`~flytekit.workflow` decorator encapsulates Flyte tasks, -essentially representing lazily evaluated promises. -During parsing, function calls are deferred until execution time. -These function calls generate {py:class}`~flytekit.extend.Promise`s that can be propagated to downstream functions, -yet remain inaccessible within the workflow itself. -The actual evaluation occurs when the workflow is executed. +The {py:func}`~flytekit.workflow` decorator encapsulates Flyte tasks, essentially representing lazily evaluated promises. During parsing, function calls are deferred until execution time. These function calls generate {py:class}`~flytekit.extend.Promise`s that can be propagated to downstream functions, yet remain inaccessible within the workflow itself. The actual evaluation occurs when the workflow is executed. Workflows can be executed locally, resulting in immediate evaluation, or through tools like [`pyflyte`](https://docs.flyte.org/projects/flytekit/en/latest/pyflyte.html), @@ -106,15 +66,13 @@ However, each task invocation within the dynamic workflow still generates a prom Bear in mind that a workflow can have tasks, other workflows and dynamic workflows. ::: -You can run a workflow by calling it as you would with a Python function and providing the necessary inputs. +You can run a workflow by calling it as you would with a Python function and providing the necessary inputs: -```{code-cell} -if __name__ == "__main__": - print(f"Running simple_wf() {simple_wf(x=[-3, 0, 3], y=[7, 4, -2])}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py +:caption: basics/workflow.py +:lines: 33-34 ``` -+++ {"lines_to_next_cell": 0} - To run the workflow locally, you can use the following `pyflyte run` command: ``` @@ -143,12 +101,9 @@ without the confines of a workflow, offers a convenient approach for iterating o You can use the {py:func}`functools.partial` function to assign default or constant values to the parameters of your tasks. -```{code-cell} -import functools - - -@workflow -def simple_wf_with_partial(x: list[int], y: list[int]) -> float: - partial_task = functools.partial(slope, x=x) - return partial_task(y=y) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py +:caption: basics/workflow.py +:lines: 39-45 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/basics/ diff --git a/docs/user_guide/customizing_dependencies/imagespec.md b/docs/user_guide/customizing_dependencies/imagespec.md index 12d9295b1e..586bfe2580 100644 --- a/docs/user_guide/customizing_dependencies/imagespec.md +++ b/docs/user_guide/customizing_dependencies/imagespec.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (image_spec_example)= # ImageSpec @@ -39,15 +20,15 @@ the [default Docker image](https://ghcr.io/flyteorg/flytekit), to all tasks. To use the `container_image` parameter available in the {py:func}`flytekit.task` decorator, and pass an `ImageSpec`. -Before building the image, Flytekit checks the container registry first to see if the image already exists. By doing -so, it avoids having to rebuild the image over and over again. If the image does not exist, flytekit will build the -image before registering the workflow, and replace the image name in the task template with the newly built image name. +Before building the image, Flytekit checks the container registry first to see if the image already exists. By doing so, it avoids having to rebuild the image over and over again. If the image does not exist, flytekit will build the image before registering the workflow, and replace the image name in the task template with the newly built image name. -```{code-cell} -import typing +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -import pandas as pd -from flytekit import ImageSpec, Resources, task, workflow +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/customizing_dependencies/customizing_dependencies/image_spec.py +:caption: customizing_dependencies/image_spec.py +:lines: 1-4 ``` :::{admonition} Prerequisites @@ -58,35 +39,19 @@ from flytekit import ImageSpec, Resources, task, workflow - When using a registry in ImageSpec, `docker login` is required to push the image ::: -+++ {"lines_to_next_cell": 0} - You can specify python packages, apt packages, and environment variables in the `ImageSpec`. These specified packages will be added on top of the [default image](https://github.com/flyteorg/flytekit/blob/master/Dockerfile), which can be found in the Flytekit Dockerfile. More specifically, flytekit invokes [DefaultImages.default_image()](https://github.com/flyteorg/flytekit/blob/f2cfef0ec098d4ae8f042ab915b0b30d524092c6/flytekit/configuration/default_images.py#L26-L27) function. This function determines and returns the default image based on the Python version and flytekit version. For example, if you are using python 3.8 and flytekit 0.16.0, the default image assigned will be `ghcr.io/flyteorg/flytekit:py3.8-1.6.0`. If desired, you can also override the default image by providing a custom `base_image` parameter when using the `ImageSpec`. -```{code-cell} -pandas_image_spec = ImageSpec( - base_image="ghcr.io/flyteorg/flytekit:py3.8-1.6.2", - packages=["pandas", "numpy"], - python_version="3.9", - apt_packages=["git"], - env={"Debug": "True"}, - registry="ghcr.io/flyteorg", -) - -sklearn_image_spec = ImageSpec( - base_image="ghcr.io/flyteorg/flytekit:py3.8-1.6.2", - packages=["scikit-learn"], - registry="ghcr.io/flyteorg", -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/customizing_dependencies/customizing_dependencies/image_spec.py +:caption: customizing_dependencies/image_spec.py +:lines: 6-19 ``` -+++ {"lines_to_next_cell": 0} - :::{important} -Replace `ghcr.io/flyteorg` with a container registry you've access to publish to. +Replace `ghcr.io/flyteorg` with a container registry you can publish to. To upload the image to the local registry in the demo cluster, indicate the registry as `localhost:30000`. ::: @@ -94,45 +59,16 @@ To upload the image to the local registry in the demo cluster, indicate the regi If the task is indeed using the image built from the `ImageSpec`, it will then import Tensorflow. This approach helps minimize module loading time and prevents unnecessary dependency installation within a single image. -```{code-cell} -if sklearn_image_spec.is_container(): - from sklearn.linear_model import LogisticRegression +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/customizing_dependencies/customizing_dependencies/image_spec.py +:caption: customizing_dependencies/image_spec.py +:lines: 21-22 ``` -+++ {"lines_to_next_cell": 0} - To enable tasks to utilize the images built with `ImageSpec`, you can specify the `container_image` parameter for those tasks. -```{code-cell} -@task(container_image=pandas_image_spec) -def get_pandas_dataframe() -> typing.Tuple[pd.DataFrame, pd.Series]: - df = pd.read_csv("https://storage.googleapis.com/download.tensorflow.org/data/heart.csv") - print(df.head()) - return df[["age", "thalach", "trestbps", "chol", "oldpeak"]], df.pop("target") - - -@task(container_image=sklearn_image_spec, requests=Resources(cpu="1", mem="1Gi")) -def get_model(max_iter: int, multi_class: str) -> typing.Any: - return LogisticRegression(max_iter=max_iter, multi_class=multi_class) - - -# Get a basic model to train. -@task(container_image=sklearn_image_spec, requests=Resources(cpu="1", mem="1Gi")) -def train_model(model: typing.Any, feature: pd.DataFrame, target: pd.Series) -> typing.Any: - model.fit(feature, target) - return model - - -# Lastly, let's define a workflow to capture the dependencies between the tasks. -@workflow() -def wf(): - feature, target = get_pandas_dataframe() - model = get_model(max_iter=3000, multi_class="auto") - train_model(model=model, feature=feature, target=target) - - -if __name__ == "__main__": - wf() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/customizing_dependencies/customizing_dependencies/image_spec.py +:caption: customizing_dependencies/image_spec.py +:lines: 27-56 ``` There exists an option to override the container image by providing an Image Spec YAML file to the `pyflyte run` or `pyflyte register` command. @@ -153,16 +89,12 @@ env: pyflyte run --remote --image image.yaml image_spec.py wf ``` -+++ - If you only want to build the image without registering the workflow, you can use the `pyflyte build` command. ``` pyflyte build --remote image_spec.py wf ``` -+++ - In some cases, you may want to force an image to rebuild, even if the image spec hasn’t changed. If you want to overwrite an existing image, you can pass the `FLYTE_FORCE_PUSH_IMAGE_SPEC=True` to `pyflyte` command or add `force_push()` to the ImageSpec. ```bash @@ -174,3 +106,4 @@ or ```python image = ImageSpec(registry="ghcr.io/flyteorg", packages=["pandas"]).force_push() ``` +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/customizing_dependencies/ diff --git a/docs/user_guide/customizing_dependencies/multiple_images_in_a_workflow.md b/docs/user_guide/customizing_dependencies/multiple_images_in_a_workflow.md index 0c323cada9..f4411fac65 100644 --- a/docs/user_guide/customizing_dependencies/multiple_images_in_a_workflow.md +++ b/docs/user_guide/customizing_dependencies/multiple_images_in_a_workflow.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (multi_images)= # Multiple images in a workflow @@ -31,33 +14,13 @@ To modify this behavior, use the `container_image` parameter available in the {p If the Docker image is not available publicly, refer to {ref}`Pulling Private Images`. ::: -```{code-cell} -:lines_to_next_cell: 2 - -import numpy as np -from flytekit import task, workflow - - -@task(container_image="{{.image.mindmeld.fqn}}:{{.image.mindmeld.version}}") -def get_data() -> np.ndarray: - # here we're importing scikit learn within the Flyte task - from sklearn import datasets - - iris = datasets.load_iris() - X = iris.data[:, :2] - return X - - -@task(container_image="{{.image.borebuster.fqn}}:{{.image.borebuster.version}}") -def normalize(X: np.ndarray) -> np.ndarray: - return (X - X.mean(axis=0)) / X.std(axis=0) - +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -@workflow -def multi_images_wf() -> np.ndarray: - X = get_data() - X = normalize(X=X) - return X +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/customizing_dependencies/customizing_dependencies/multi_images.py +:caption: customizing_dependencies/multi_images.py +:lines: 1-24 ``` Observe how the `sklearn` library is imported in the context of a Flyte task. @@ -108,3 +71,5 @@ Send the name of the configuration file to your `pyflyte run` command as follows ``` pyflyte --config $HOME/.flyte/config.yaml run --remote multi_images.py multi_images_wf ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/customizing_dependencies/ diff --git a/docs/user_guide/customizing_dependencies/raw_containers.md b/docs/user_guide/customizing_dependencies/raw_containers.md index 2ba6cfec55..7f7d25afd6 100644 --- a/docs/user_guide/customizing_dependencies/raw_containers.md +++ b/docs/user_guide/customizing_dependencies/raw_containers.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (raw_container)= # Raw containers @@ -25,22 +6,19 @@ kernelspec: .. tags:: Containerization, Advanced ``` -This example demonstrates how to use arbitrary containers in 5 different languages, all orchestrated in flytekit seamlessly. -Flyte mounts an input data volume where all the data needed by the container is available, and an output data volume -for the container to write all the data which will be stored away. +This example demonstrates how to use arbitrary containers in 5 different languages, all orchestrated in flytekit seamlessly. Flyte mounts an input data volume where all the data needed by the container is available, and an output data volume for the container to write all the data which will be stored away. The data is written as separate files, one per input variable. The format of the file is serialized strings. Refer to the raw protocol to understand how to leverage this. -```{code-cell} -import logging - -from flytekit import ContainerTask, kwtypes, task, workflow - -logger = logging.getLogger(__file__) +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/customizing_dependencies/customizing_dependencies/raw_container.py +:caption: customizing_dependencies/raw_container.py +:lines: 1-5 +``` ## Container tasks @@ -53,137 +31,17 @@ is `calculate_ellipse_area_shell`. This name has to be unique in the entire proj `inputs` and `outputs` specify the interface for the task; thus it should be an ordered dictionary of typed input and output variables. -```{code-cell} -calculate_ellipse_area_shell = ContainerTask( - name="ellipse-area-metadata-shell", - input_data_dir="/var/inputs", - output_data_dir="/var/outputs", - inputs=kwtypes(a=float, b=float), - outputs=kwtypes(area=float, metadata=str), - image="ghcr.io/flyteorg/rawcontainers-shell:v2", - command=[ - "./calculate-ellipse-area.sh", - "{{.inputs.a}}", - "{{.inputs.b}}", - "/var/outputs", - ], -) - -calculate_ellipse_area_python = ContainerTask( - name="ellipse-area-metadata-python", - input_data_dir="/var/inputs", - output_data_dir="/var/outputs", - inputs=kwtypes(a=float, b=float), - outputs=kwtypes(area=float, metadata=str), - image="ghcr.io/flyteorg/rawcontainers-python:v2", - command=[ - "python", - "calculate-ellipse-area.py", - "{{.inputs.a}}", - "{{.inputs.b}}", - "/var/outputs", - ], -) - -calculate_ellipse_area_r = ContainerTask( - name="ellipse-area-metadata-r", - input_data_dir="/var/inputs", - output_data_dir="/var/outputs", - inputs=kwtypes(a=float, b=float), - outputs=kwtypes(area=float, metadata=str), - image="ghcr.io/flyteorg/rawcontainers-r:v2", - command=[ - "Rscript", - "--vanilla", - "calculate-ellipse-area.R", - "{{.inputs.a}}", - "{{.inputs.b}}", - "/var/outputs", - ], -) - -calculate_ellipse_area_haskell = ContainerTask( - name="ellipse-area-metadata-haskell", - input_data_dir="/var/inputs", - output_data_dir="/var/outputs", - inputs=kwtypes(a=float, b=float), - outputs=kwtypes(area=float, metadata=str), - image="ghcr.io/flyteorg/rawcontainers-haskell:v2", - command=[ - "./calculate-ellipse-area", - "{{.inputs.a}}", - "{{.inputs.b}}", - "/var/outputs", - ], -) - -calculate_ellipse_area_julia = ContainerTask( - name="ellipse-area-metadata-julia", - input_data_dir="/var/inputs", - output_data_dir="/var/outputs", - inputs=kwtypes(a=float, b=float), - outputs=kwtypes(area=float, metadata=str), - image="ghcr.io/flyteorg/rawcontainers-julia:v2", - command=[ - "julia", - "calculate-ellipse-area.jl", - "{{.inputs.a}}", - "{{.inputs.b}}", - "/var/outputs", - ], -) - - -@task -def report_all_calculated_areas( - area_shell: float, - metadata_shell: str, - area_python: float, - metadata_python: str, - area_r: float, - metadata_r: str, - area_haskell: float, - metadata_haskell: str, - area_julia: float, - metadata_julia: str, -): - logger.info(f"shell: area={area_shell}, metadata={metadata_shell}") - logger.info(f"python: area={area_python}, metadata={metadata_python}") - logger.info(f"r: area={area_r}, metadata={metadata_r}") - logger.info(f"haskell: area={area_haskell}, metadata={metadata_haskell}") - logger.info(f"julia: area={area_julia}, metadata={metadata_julia}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/customizing_dependencies/customizing_dependencies/raw_container.py +:caption: customizing_dependencies/raw_container.py +:lines: 15-112 ``` -+++ {"lines_to_next_cell": 0} - As can be seen in this example, `ContainerTask`s can be interacted with like normal Python functions, whose inputs correspond to the declared input variables. All data returned by the tasks are consumed and logged by a Flyte task. -```{code-cell} -:lines_to_next_cell: 2 - -@workflow -def wf(a: float, b: float): - # Calculate area in all languages - area_shell, metadata_shell = calculate_ellipse_area_shell(a=a, b=b) - area_python, metadata_python = calculate_ellipse_area_python(a=a, b=b) - area_r, metadata_r = calculate_ellipse_area_r(a=a, b=b) - area_haskell, metadata_haskell = calculate_ellipse_area_haskell(a=a, b=b) - area_julia, metadata_julia = calculate_ellipse_area_julia(a=a, b=b) - - # Report on all results in a single task to simplify comparison - report_all_calculated_areas( - area_shell=area_shell, - metadata_shell=metadata_shell, - area_python=area_python, - metadata_python=metadata_python, - area_r=area_r, - metadata_r=metadata_r, - area_haskell=area_haskell, - metadata_haskell=metadata_haskell, - area_julia=area_julia, - metadata_julia=metadata_julia, - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/customizing_dependencies/customizing_dependencies/raw_container.py +:caption: customizing_dependencies/raw_container.py +:pyobject: wf ``` One of the benefits of raw container tasks is that Flytekit does not need to be installed in the target container. @@ -225,3 +83,5 @@ The contents of each script specified in the `ContainerTask` is as follows: ```{literalinclude} raw-containers-supporting-files/per-language/julia/calculate-ellipse-area.jl :language: julia ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/customizing_dependencies/ diff --git a/docs/user_guide/data_types_and_io/accessing_attributes.md b/docs/user_guide/data_types_and_io/accessing_attributes.md index 42706a3d1d..4c4a01483f 100644 --- a/docs/user_guide/data_types_and_io/accessing_attributes.md +++ b/docs/user_guide/data_types_and_io/accessing_attributes.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (attribute_access)= # Accessing attributes @@ -25,27 +6,20 @@ kernelspec: .. tags:: Basic ``` -You can directly access attributes on output promises for lists, dicts, dataclasses and combinations of these types in Flyte. -This functionality facilitates the direct passing of output attributes within workflows, +You can directly access attributes on output promises for lists, dicts, dataclasses and combinations of these types in Flyte. This functionality facilitates the direct passing of output attributes within workflows, enhancing the convenience of working with complex data structures. -To begin, import the required dependencies and define a common task for subsequent use. - -```{code-cell} -from dataclasses import dataclass - -from dataclasses_json import dataclass_json -from flytekit import task, workflow +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` +To begin, import the required dependencies and define a common task for subsequent use: -@task -def print_message(message: str): - print(message) - return +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py +:caption: data_types_and_io/attribute_access.py +:lines: 1-10 ``` -+++ {"lines_to_next_cell": 0} - ## List You can access an output list using index notation. @@ -53,103 +27,40 @@ You can access an output list using index notation. Flyte currently does not support output promise access through list slicing. ::: -```{code-cell} -@task -def list_task() -> list[str]: - return ["apple", "banana"] - - -@workflow -def list_wf(): - items = list_task() - first_item = items[0] - print_message(message=first_item) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py +:caption: data_types_and_io/attribute_access.py +:lines: 14-23 ``` -+++ {"lines_to_next_cell": 0} - ## Dictionary Access the output dictionary by specifying the key. -```{code-cell} -@task -def dict_task() -> dict[str, str]: - return {"fruit": "banana"} - - -@workflow -def dict_wf(): - fruit_dict = dict_task() - print_message(message=fruit_dict["fruit"]) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py +:caption: data_types_and_io/attribute_access.py +:lines: 27-35 ``` -+++ {"lines_to_next_cell": 0} - ## Data class Directly access an attribute of a dataclass. -```{code-cell} -@dataclass_json -@dataclass -class Fruit: - name: str - - -@task -def dataclass_task() -> Fruit: - return Fruit(name="banana") - - -@workflow -def dataclass_wf(): - fruit_instance = dataclass_task() - print_message(message=fruit_instance.name) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py +:caption: data_types_and_io/attribute_access.py +:lines: 39-53 ``` -+++ {"lines_to_next_cell": 0} - ## Complex type Combinations of list, dict and dataclass also work effectively. -```{code-cell} -@task -def advance_task() -> (dict[str, list[str]], list[dict[str, str]], dict[str, Fruit]): - return {"fruits": ["banana"]}, [{"fruit": "banana"}], {"fruit": Fruit(name="banana")} - - -@task -def print_list(fruits: list[str]): - print(fruits) - - -@task -def print_dict(fruit_dict: dict[str, str]): - print(fruit_dict) - - -@workflow -def advanced_workflow(): - dictionary_list, list_dict, dict_dataclass = advance_task() - print_message(message=dictionary_list["fruits"][0]) - print_message(message=list_dict[0]["fruit"]) - print_message(message=dict_dataclass["fruit"].name) - - print_list(fruits=dictionary_list["fruits"]) - print_dict(fruit_dict=list_dict[0]) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py +:caption: data_types_and_io/attribute_access.py +:lines: 57-80 ``` -+++ {"lines_to_next_cell": 0} - You can run all the workflows locally as follows: -```{code-cell} -:lines_to_next_cell: 2 - -if __name__ == "__main__": - list_wf() - dict_wf() - dataclass_wf() - advanced_workflow() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/attribute_access.py +:caption: data_types_and_io/attribute_access.py +:lines: 84-88 ``` ## Failure scenario @@ -174,3 +85,5 @@ def failed_workflow(): print_message(message=fruit_dict["fruits"]) # Accessing a non-existent key print_message(message=fruit_instance.fruit) # Accessing a non-existent param ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/ diff --git a/docs/user_guide/data_types_and_io/dataclass.md b/docs/user_guide/data_types_and_io/dataclass.md index 7bdaee0385..4c7704d73a 100644 --- a/docs/user_guide/data_types_and_io/dataclass.md +++ b/docs/user_guide/data_types_and_io/dataclass.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (dataclass)= # Dataclass @@ -38,36 +19,25 @@ If you're using Flytekit version >= v1.11.1, you don't need to decorate with `@d inherit from Mashumaro's `DataClassJSONMixin`. ::: -To begin, import the necessary dependencies. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -import os -import tempfile -from dataclasses import dataclass +To begin, import the necessary dependencies: -import pandas as pd -from flytekit import task, workflow -from flytekit.types.directory import FlyteDirectory -from flytekit.types.file import FlyteFile -from flytekit.types.structured import StructuredDataset -from mashumaro.mixins.json import DataClassJSONMixin +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py +:caption: data_types_and_io/dataclass.py +:lines: 1-10 ``` -+++ {"lines_to_next_cell": 0} - ## Python types We define a `dataclass` with `int`, `str` and `dict` as the data types. -```{code-cell} -@dataclass -class Datum(DataClassJSONMixin): - x: int - y: str - z: dict[int, str] +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py +:caption: data_types_and_io/dataclass.py +:pyobject: Datum ``` -+++ {"lines_to_next_cell": 0} - You can send a `dataclass` between different tasks written in various languages, and input it through the Flyte console as raw JSON. :::{note} @@ -76,95 +46,35 @@ All variables in a data class should be **annotated with their type**. Failure t Once declared, a dataclass can be returned as an output or accepted as an input. -```{code-cell} -@task -def stringify(s: int) -> Datum: - """ - A dataclass return will be treated as a single complex JSON return. - """ - return Datum(x=s, y=str(s), z={s: str(s)}) - - -@task -def add(x: Datum, y: Datum) -> Datum: - """ - Flytekit automatically converts the provided JSON into a data class. - If the structures don't match, it triggers a runtime failure. - """ - x.z.update(y.z) - return Datum(x=x.x + y.x, y=x.y + y.y, z=x.z) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py +:caption: data_types_and_io/dataclass.py +:lines: 28-43 ``` -+++ {"lines_to_next_cell": 0} - ## Flyte types We also define a data class that accepts {std:ref}`StructuredDataset `, {std:ref}`FlyteFile ` and {std:ref}`FlyteDirectory `. -```{code-cell} -@dataclass -class FlyteTypes(DataClassJSONMixin): - dataframe: StructuredDataset - file: FlyteFile - directory: FlyteDirectory - - -@task -def upload_data() -> FlyteTypes: - """ - Flytekit will upload FlyteFile, FlyteDirectory and StructuredDataset to the blob store, - such as GCP or S3. - """ - # 1. StructuredDataset - df = pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}) - - # 2. FlyteDirectory - temp_dir = tempfile.mkdtemp(prefix="flyte-") - df.to_parquet(temp_dir + "/df.parquet") - - # 3. FlyteFile - file_path = tempfile.NamedTemporaryFile(delete=False) - file_path.write(b"Hello, World!") - - fs = FlyteTypes( - dataframe=StructuredDataset(dataframe=df), - file=FlyteFile(file_path.name), - directory=FlyteDirectory(temp_dir), - ) - return fs - - -@task -def download_data(res: FlyteTypes): - assert pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [20, 22]}).equals(res.dataframe.open(pd.DataFrame).all()) - f = open(res.file, "r") - assert f.read() == "Hello, World!" - assert os.listdir(res.directory) == ["df.parquet"] +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py +:caption: data_types_and_io/dataclass.py +:lines: 47-84 ``` -+++ {"lines_to_next_cell": 0} - A data class supports the usage of data associated with Python types, data classes, flyte file, flyte directory and structured dataset. We define a workflow that calls the tasks created above. -```{code-cell} -@workflow -def dataclass_wf(x: int, y: int) -> (Datum, FlyteTypes): - o1 = add(x=stringify(s=x), y=stringify(s=y)) - o2 = upload_data() - download_data(res=o2) - return o1, o2 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py +:caption: data_types_and_io/dataclass.py +:pyobject: dataclass_wf ``` -+++ {"lines_to_next_cell": 0} - You can run the workflow locally as follows: -```{code-cell} -if __name__ == "__main__": - dataclass_wf(x=10, y=20) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py +:caption: data_types_and_io/dataclass.py +:lines: 97-98 ``` To trigger a task that accepts a dataclass as an input with `pyflyte run`, you can provide a JSON file as an input: @@ -173,3 +83,5 @@ pyflyte run \ https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/dataclass.py \ add --x dataclass_input.json --y dataclass_input.json ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/ diff --git a/docs/user_guide/data_types_and_io/enum_type.md b/docs/user_guide/data_types_and_io/enum_type.md index b4727c508f..58142c93ea 100644 --- a/docs/user_guide/data_types_and_io/enum_type.md +++ b/docs/user_guide/data_types_and_io/enum_type.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - # Enum type ```{eval-rst} @@ -35,55 +16,32 @@ Flyte assumes the first value in the list as the default, and Enum types cannot Therefore, when defining enums, it's important to design them with the first value as a valid default. ::: -To begin, import the dependencies. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -from enum import Enum +To begin, import the dependencies: -from flytekit import task, workflow +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/enum_type.py +:caption: data_types_and_io/enum_type.py +:lines: 1-3 ``` -+++ {"lines_to_next_cell": 0} - We define an enum and a simple coffee maker workflow that accepts an order and brews coffee ☕️ accordingly. -The assumption is that the coffee maker only understands enum inputs. - -```{code-cell} -class Coffee(Enum): - ESPRESSO = "espresso" - AMERICANO = "americano" - LATTE = "latte" - CAPPUCCINO = "cappucccino" - - -@task -def take_order(coffee: str) -> Coffee: - return Coffee(coffee) +The assumption is that the coffee maker only understands enum inputs: - -@task -def prep_order(coffee_enum: Coffee) -> str: - return f"Preparing {coffee_enum.value} ..." - - -@workflow -def coffee_maker(coffee: str) -> str: - coffee_enum = take_order(coffee=coffee) - return prep_order(coffee_enum=coffee_enum) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/enum_type.py +:caption: data_types_and_io/enum_type.py +:lines: 9-35 ``` -+++ {"lines_to_next_cell": 0} +The workflow can also accept an enum value: -The workflow can also accept an enum value. - -```{code-cell} -@workflow -def coffee_maker_enum(coffee_enum: Coffee) -> str: - return prep_order(coffee_enum=coffee_enum) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/enum_type.py +:caption: data_types_and_io/enum_type.py +:pyobject: coffee_maker_enum ``` -+++ {"lines_to_next_cell": 0} - You can send a string to the `coffee_maker_enum` workflow during its execution, like this: ``` pyflyte run \ @@ -91,10 +49,11 @@ pyflyte run \ coffee_maker_enum --coffee_enum="latte" ``` -You can run the workflows locally. +You can run the workflows locally: -```{code-cell} -if __name__ == "__main__": - print(coffee_maker(coffee="latte")) - print(coffee_maker_enum(coffee_enum=Coffee.LATTE)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/enum_type.py +:caption: data_types_and_io/enum_type.py +:lines: 44-46 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/ diff --git a/docs/user_guide/data_types_and_io/flytedirectory.md b/docs/user_guide/data_types_and_io/flytedirectory.md index 6dd75ed159..6297535269 100644 --- a/docs/user_guide/data_types_and_io/flytedirectory.md +++ b/docs/user_guide/data_types_and_io/flytedirectory.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (folder)= # FlyteDirectory @@ -29,50 +10,28 @@ In addition to files, folders are another fundamental operating system primitive Flyte supports folders in the form of [multi-part blobs](https://github.com/flyteorg/flyteidl/blob/master/protos/flyteidl/core/types.proto#L73). -To begin, import the libraries. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -import csv -import os -import urllib.request -from collections import defaultdict -from pathlib import Path -from typing import List +To begin, import the libraries: -import flytekit -from flytekit import task, workflow -from flytekit.types.directory import FlyteDirectory +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/folder.py +:caption: data_types_and_io/folder.py +:lines: 1-10 ``` -+++ {"lines_to_next_cell": 0} - Building upon the previous example demonstrated in the {std:ref}`file ` section, let's continue by considering the normalization of columns in a CSV file. The following task downloads a list of URLs pointing to CSV files and returns the folder path in a `FlyteDirectory` object. -```{code-cell} -@task -def download_files(csv_urls: List[str]) -> FlyteDirectory: - working_dir = flytekit.current_context().working_directory - local_dir = Path(os.path.join(working_dir, "csv_files")) - local_dir.mkdir(exist_ok=True) - - # get the number of digits needed to preserve the order of files in the local directory - zfill_len = len(str(len(csv_urls))) - for idx, remote_location in enumerate(csv_urls): - local_image = os.path.join( - # prefix the file name with the index location of the file in the original csv_urls list - local_dir, - f"{str(idx).zfill(zfill_len)}_{os.path.basename(remote_location)}", - ) - urllib.request.urlretrieve(remote_location, local_image) - return FlyteDirectory(path=str(local_dir)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/folder.py +:caption: data_types_and_io/folder.py +:pyobject: download_files ``` -+++ {"lines_to_next_cell": 0} - :::{note} You can annotate a `FlyteDirectory` when you want to download or upload the contents of the directory in batches. For example, @@ -98,102 +57,33 @@ demonstrates how Flyte tasks are simply entrypoints of execution, which can them other functions and routines that are written in pure Python. ::: -```{code-cell} -def normalize_columns( - local_csv_file: str, - column_names: List[str], - columns_to_normalize: List[str], -): - # read the data from the raw csv file - parsed_data = defaultdict(list) - with open(local_csv_file, newline="\n") as input_file: - reader = csv.DictReader(input_file, fieldnames=column_names) - for row in (x for i, x in enumerate(reader) if i > 0): - for column in columns_to_normalize: - parsed_data[column].append(float(row[column].strip())) - - # normalize the data - normalized_data = defaultdict(list) - for colname, values in parsed_data.items(): - mean = sum(values) / len(values) - std = (sum([(x - mean) ** 2 for x in values]) / len(values)) ** 0.5 - normalized_data[colname] = [(x - mean) / std for x in values] - - # overwrite the csv file with the normalized columns - with open(local_csv_file, mode="w") as output_file: - writer = csv.DictWriter(output_file, fieldnames=columns_to_normalize) - writer.writeheader() - for row in zip(*normalized_data.values()): - writer.writerow({k: row[i] for i, k in enumerate(columns_to_normalize)}) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/folder.py +:caption: data_types_and_io/folder.py +:pyobject: normalize_columns ``` -+++ {"lines_to_next_cell": 0} - We then define a task that accepts the previously downloaded folder, along with some metadata about the column names of each file in the directory and the column names that we want to normalize. -```{code-cell} -@task -def normalize_all_files( - csv_files_dir: FlyteDirectory, - columns_metadata: List[List[str]], - columns_to_normalize_metadata: List[List[str]], -) -> FlyteDirectory: - for local_csv_file, column_names, columns_to_normalize in zip( - # make sure we sort the files in the directory to preserve the original order of the csv urls - [os.path.join(csv_files_dir, x) for x in sorted(os.listdir(csv_files_dir))], - columns_metadata, - columns_to_normalize_metadata, - ): - normalize_columns(local_csv_file, column_names, columns_to_normalize) - return FlyteDirectory(path=csv_files_dir.path) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/folder.py +:caption: data_types_and_io/folder.py +:pyobject: normalize_all_files ``` -+++ {"lines_to_next_cell": 0} - Compose all of the above tasks into a workflow. This workflow accepts a list of URL strings pointing to a remote location containing a CSV file, a list of column names associated with each CSV file, and a list of columns that we want to normalize. -```{code-cell} -@workflow -def download_and_normalize_csv_files( - csv_urls: List[str], - columns_metadata: List[List[str]], - columns_to_normalize_metadata: List[List[str]], -) -> FlyteDirectory: - directory = download_files(csv_urls=csv_urls) - return normalize_all_files( - csv_files_dir=directory, - columns_metadata=columns_metadata, - columns_to_normalize_metadata=columns_to_normalize_metadata, - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/folder.py +:caption: data_types_and_io/folder.py +:pyobject: download_and_normalize_csv_files ``` -+++ {"lines_to_next_cell": 0} - You can run the workflow locally as follows: -```{code-cell} -if __name__ == "__main__": - csv_urls = [ - "https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv", - "https://people.sc.fsu.edu/~jburkardt/data/csv/faithful.csv", - ] - columns_metadata = [ - ["Name", "Sex", "Age", "Heights (in)", "Weight (lbs)"], - ["Index", "Eruption length (mins)", "Eruption wait (mins)"], - ] - columns_to_normalize_metadata = [ - ["Age"], - ["Eruption length (mins)"], - ] - - print(f"Running {__file__} main...") - directory = download_and_normalize_csv_files( - csv_urls=csv_urls, - columns_metadata=columns_metadata, - columns_to_normalize_metadata=columns_to_normalize_metadata, - ) - print(f"Running download_and_normalize_csv_files on {csv_urls}: " f"{directory}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/folder.py +:caption: data_types_and_io/folder.py +:lines: 98-118 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/ diff --git a/docs/user_guide/data_types_and_io/flytefile.md b/docs/user_guide/data_types_and_io/flytefile.md index 474cad4041..97330669a5 100644 --- a/docs/user_guide/data_types_and_io/flytefile.md +++ b/docs/user_guide/data_types_and_io/flytefile.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (file)= # FlyteFile @@ -36,21 +17,17 @@ links, read them with the python built-in {py:class}`csv.DictReader` function, normalize some pre-specified columns, and output the normalized columns to another csv file. -First, import the libraries. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -import csv -import os -from collections import defaultdict -from typing import List +First, import the libraries: -import flytekit -from flytekit import task, workflow -from flytekit.types.file import FlyteFile +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/file.py +:caption: data_types_and_io/file.py +:lines: 1-8 ``` -+++ {"lines_to_next_cell": 0} - Define a task that accepts {py:class}`~flytekit.types.file.FlyteFile` as an input. The following is a task that accepts a `FlyteFile`, a list of column names, and a list of column names to normalize. The task then outputs a CSV file @@ -66,103 +43,29 @@ Predefined aliases for commonly used flyte file formats are also available. You can find them [here](https://github.com/flyteorg/flytekit/blob/master/flytekit/types/file/__init__.py). ::: -```{code-cell} -@task -def normalize_columns( - csv_url: FlyteFile, - column_names: List[str], - columns_to_normalize: List[str], - output_location: str, -) -> FlyteFile: - # read the data from the raw csv file - parsed_data = defaultdict(list) - with open(csv_url, newline="\n") as input_file: - reader = csv.DictReader(input_file, fieldnames=column_names) - next(reader) # Skip header - for row in reader: - for column in columns_to_normalize: - parsed_data[column].append(float(row[column].strip())) - - # normalize the data - normalized_data = defaultdict(list) - for colname, values in parsed_data.items(): - mean = sum(values) / len(values) - std = (sum([(x - mean) ** 2 for x in values]) / len(values)) ** 0.5 - normalized_data[colname] = [(x - mean) / std for x in values] - - # write to local path - out_path = os.path.join( - flytekit.current_context().working_directory, - f"normalized-{os.path.basename(csv_url.path).rsplit('.')[0]}.csv", - ) - with open(out_path, mode="w") as output_file: - writer = csv.DictWriter(output_file, fieldnames=columns_to_normalize) - writer.writeheader() - for row in zip(*normalized_data.values()): - writer.writerow({k: row[i] for i, k in enumerate(columns_to_normalize)}) - - if output_location: - return FlyteFile(path=out_path, remote_path=output_location) - else: - return FlyteFile(path=out_path) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/file.py +:caption: data_types_and_io/file.py +:pyobject: normalize_columns ``` -+++ {"lines_to_next_cell": 0} - -When the image URL is sent to the task, the Flytekit engine translates it into a `FlyteFile` object on the local -drive (but doesn't download it). The act of calling the `download()` method should trigger the download, and the `path` -attribute enables to `open` the file. - -If the `output_location` argument is specified, it will be passed to the `remote_path` argument of `FlyteFile`, -which will use that path as the storage location instead of a random location (Flyte's object store). - -When this task finishes, the Flytekit engine returns the `FlyteFile` instance, uploads the file to the location, and -creates a blob literal pointing to it. - -Lastly, define a workflow. The `normalize_csv_files` workflow has an `output_location` argument which is passed -to the `location` input of the task. If it's not an empty string, the task attempts to -upload its file to that location. - -```{code-cell} -@workflow -def normalize_csv_file( - csv_url: FlyteFile, - column_names: List[str], - columns_to_normalize: List[str], - output_location: str = "", -) -> FlyteFile: - return normalize_columns( - csv_url=csv_url, - column_names=column_names, - columns_to_normalize=columns_to_normalize, - output_location=output_location, - ) -``` +When the image URL is sent to the task, the Flytekit engine translates it into a `FlyteFile` object on the local drive (but doesn't download it). The act of calling the `download()` method should trigger the download, and the `path` attribute enables to `open` the file. + +If the `output_location` argument is specified, it will be passed to the `remote_path` argument of `FlyteFile`, which will use that path as the storage location instead of a random location (Flyte's object store). -+++ {"lines_to_next_cell": 0} +When this task finishes, the Flytekit engine returns the `FlyteFile` instance, uploads the file to the location, and creates a blob literal pointing to it. + +Lastly, define a workflow. The `normalize_csv_files` workflow has an `output_location` argument which is passed to the `location` input of the task. If it's not an empty string, the task attempts to upload its file to that location. + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/file.py +:caption: data_types_and_io/file.py +:pyobject: normalize_csv_file +``` You can run the workflow locally as follows: -```{code-cell} -if __name__ == "__main__": - default_files = [ - ( - "https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv", - ["Name", "Sex", "Age", "Heights (in)", "Weight (lbs)"], - ["Age"], - ), - ( - "https://people.sc.fsu.edu/~jburkardt/data/csv/faithful.csv", - ["Index", "Eruption length (mins)", "Eruption wait (mins)"], - ["Eruption length (mins)"], - ), - ] - print(f"Running {__file__} main...") - for index, (csv_url, column_names, columns_to_normalize) in enumerate(default_files): - normalized_columns = normalize_csv_file( - csv_url=csv_url, - column_names=column_names, - columns_to_normalize=columns_to_normalize, - ) - print(f"Running normalize_csv_file workflow on {csv_url}: " f"{normalized_columns}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/file.py +:caption: data_types_and_io/file.py +:lines: 75-95 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/ diff --git a/docs/user_guide/data_types_and_io/pickle_type.md b/docs/user_guide/data_types_and_io/pickle_type.md index b5cbb89f5a..19987d6288 100644 --- a/docs/user_guide/data_types_and_io/pickle_type.md +++ b/docs/user_guide/data_types_and_io/pickle_type.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (pickle_type)= # Pickle type @@ -42,11 +23,14 @@ or register a custom transformer, as using pickle types can result in lower perf This example demonstrates how you can utilize custom objects without registering a transformer. -```{code-cell} -from flytekit import task, workflow +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/pickle_type.py +:caption: data_types_and_io/pickle_type.py +:lines: 1 +``` `Superhero` represents a user-defined complex type that can be serialized to a pickle file by Flytekit and transferred between tasks as both input and output data. @@ -56,31 +40,11 @@ Alternatively, you can {ref}`turn this object into a dataclass ` for We have used a simple object here for demonstration purposes. ::: -```{code-cell} -class Superhero: - def __init__(self, name, power): - self.name = name - self.power = power - - -@task -def welcome_superhero(name: str, power: str) -> Superhero: - return Superhero(name, power) - - -@task -def greet_superhero(superhero: Superhero) -> str: - return f"👋 Hello {superhero.name}! Your superpower is {superhero.power}." - - -@workflow -def superhero_wf(name: str = "Thor", power: str = "Flight") -> str: - superhero = welcome_superhero(name=name, power=power) - return greet_superhero(superhero=superhero) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/pickle_type.py +:caption: data_types_and_io/pickle_type.py +:lines: 7-26 ``` -+++ {"lines_to_next_cell": 0} - ## Batch size By default, if the list subtype is unrecognized, a single pickle file is generated. @@ -89,43 +53,20 @@ or significant list elements, you can specify a batch size. This feature allows for the processing of each batch as a separate pickle file. The following example demonstrates how to set the batch size. -```{code-cell} -from typing import Iterator - -from flytekit.types.pickle.pickle import BatchSize -from typing_extensions import Annotated - - -@task -def welcome_superheroes(names: list[str], powers: list[str]) -> Annotated[list[Superhero], BatchSize(3)]: - return [Superhero(name, power) for name, power in zip(names, powers)] - - -@task -def greet_superheroes(superheroes: list[Superhero]) -> Iterator[str]: - for superhero in superheroes: - yield f"👋 Hello {superhero.name}! Your superpower is {superhero.power}." - - -@workflow -def superheroes_wf( - names: list[str] = ["Thor", "Spiderman", "Hulk"], - powers: list[str] = ["Flight", "Surface clinger", "Shapeshifting"], -) -> Iterator[str]: - superheroes = welcome_superheroes(names=names, powers=powers) - return greet_superheroes(superheroes=superheroes) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/pickle_type.py +:caption: data_types_and_io/pickle_type.py +:lines: 35-58 ``` -+++ {"lines_to_next_cell": 0} - :::{note} The `welcome_superheroes` task will generate two pickle files: one containing two superheroes and the other containing one superhero. ::: You can run the workflows locally as follows: -```{code-cell} -if __name__ == "__main__": - print(f"Superhero wf: {superhero_wf()}") - print(f"Superhero(es) wf: {superheroes_wf()}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/pickle_type.py +:caption: data_types_and_io/pickle_type.py +:lines: 62-64 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/ diff --git a/docs/user_guide/data_types_and_io/pytorch_type.md b/docs/user_guide/data_types_and_io/pytorch_type.md index 4e5715d128..b224b0f7b5 100644 --- a/docs/user_guide/data_types_and_io/pytorch_type.md +++ b/docs/user_guide/data_types_and_io/pytorch_type.md @@ -1,21 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} (pytorch_type)= @@ -25,73 +7,21 @@ kernelspec: .. tags:: MachineLearning, Basic ``` -Flyte advocates for the use of strongly-typed data to simplify the development of robust and testable pipelines. -In addition to its application in data engineering, Flyte is primarily used for machine learning. -To streamline the communication between Flyte tasks, particularly when dealing with tensors and models, -we have introduced support for PyTorch types. +Flyte advocates for the use of strongly-typed data to simplify the development of robust and testable pipelines. In addition to its application in data engineering, Flyte is primarily used for machine learning. +To streamline the communication between Flyte tasks, particularly when dealing with tensors and models, we have introduced support for PyTorch types. ## Tensors and modules -At times, you may find the need to pass tensors and modules (models) within your workflow. -Without native support for PyTorch tensors and modules, Flytekit relies on {std:ref}`pickle ` for serializing -and deserializing these entities, as well as any unknown types. -However, this approach isn't the most efficient. As a result, we've integrated PyTorch's -serialization and deserialization support into the Flyte type system. +At times, you may find the need to pass tensors and modules (models) within your workflow. Without native support for PyTorch tensors and modules, Flytekit relies on {std:ref}`pickle ` for serializing and deserializing these entities, as well as any unknown types. However, this approach isn't the most efficient. As a result, we've integrated PyTorch's serialization and deserialization support into the Flyte type system. -```{code-cell} -import torch -from flytekit import task, workflow - - -@task -def generate_tensor_2d() -> torch.Tensor: - return torch.tensor([[1.0, -1.0, 2], [1.0, -1.0, 9], [0, 7.0, 3]]) - - -@task -def reshape_tensor(tensor: torch.Tensor) -> torch.Tensor: - # convert 2D to 3D - tensor.unsqueeze_(-1) - return tensor.expand(3, 3, 2) - - -@task -def generate_module() -> torch.nn.Module: - bn = torch.nn.BatchNorm1d(3, track_running_stats=True) - return bn - - -@task -def get_model_weight(model: torch.nn.Module) -> torch.Tensor: - return model.weight - - -class MyModel(torch.nn.Module): - def __init__(self): - super(MyModel, self).__init__() - self.l0 = torch.nn.Linear(4, 2) - self.l1 = torch.nn.Linear(2, 1) - - def forward(self, input): - out0 = self.l0(input) - out0_relu = torch.nn.functional.relu(out0) - return self.l1(out0_relu) - - -@task -def get_l1() -> torch.nn.Module: - model = MyModel() - return model.l1 - - -@workflow -def pytorch_native_wf(): - reshape_tensor(tensor=generate_tensor_2d()) - get_model_weight(model=generate_module()) - get_l1() +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/pytorch_type.py +:caption: data_types_and_io/pytorch_type.py +:lines: 5-50 +``` Passing around tensors and modules is no more a hassle! @@ -106,64 +36,9 @@ According to the PyTorch [docs](https://pytorch.org/tutorials/beginner/saving_lo it's recommended to store the module's `state_dict` rather than the module itself, although the serialization should work in either case. -```{code-cell} -:lines_to_next_cell: 2 - -from dataclasses import dataclass - -import torch.nn as nn -import torch.nn.functional as F -import torch.optim as optim -from dataclasses_json import dataclass_json -from flytekit.extras.pytorch import PyTorchCheckpoint - - -@dataclass_json -@dataclass -class Hyperparameters: - epochs: int - loss: float - - -class Net(nn.Module): - def __init__(self): - super(Net, self).__init__() - self.conv1 = nn.Conv2d(3, 6, 5) - self.pool = nn.MaxPool2d(2, 2) - self.conv2 = nn.Conv2d(6, 16, 5) - self.fc1 = nn.Linear(16 * 5 * 5, 120) - self.fc2 = nn.Linear(120, 84) - self.fc3 = nn.Linear(84, 10) - - def forward(self, x): - x = self.pool(F.relu(self.conv1(x))) - x = self.pool(F.relu(self.conv2(x))) - x = x.view(-1, 16 * 5 * 5) - x = F.relu(self.fc1(x)) - x = F.relu(self.fc2(x)) - x = self.fc3(x) - return x - - -@task -def generate_model(hyperparameters: Hyperparameters) -> PyTorchCheckpoint: - bn = Net() - optimizer = optim.SGD(bn.parameters(), lr=0.001, momentum=0.9) - return PyTorchCheckpoint(module=bn, hyperparameters=hyperparameters, optimizer=optimizer) - - -@task -def load(checkpoint: PyTorchCheckpoint): - new_bn = Net() - new_bn.load_state_dict(checkpoint["module_state_dict"]) - optimizer = optim.SGD(new_bn.parameters(), lr=0.001, momentum=0.9) - optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) - - -@workflow -def pytorch_checkpoint_wf(): - checkpoint = generate_model(hyperparameters=Hyperparameters(epochs=10, loss=0.1)) - load(checkpoint=checkpoint) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/pytorch_type.py +:caption: data_types_and_io/pytorch_type.py +:lines: 63-117 ``` :::{note} @@ -217,3 +92,5 @@ def predict( The `predict` task will run on a CPU, and the device conversion from GPU to CPU will be automatically handled by Flytekit. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/ diff --git a/docs/user_guide/data_types_and_io/structureddataset.md b/docs/user_guide/data_types_and_io/structureddataset.md index 6639d1b2a7..03ebfb5275 100644 --- a/docs/user_guide/data_types_and_io/structureddataset.md +++ b/docs/user_guide/data_types_and_io/structureddataset.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (structured_dataset)= # StructuredDataset @@ -27,10 +8,7 @@ kernelspec: ```{currentmodule} flytekit.types.structured ``` -As with most type systems, Python has primitives, container types like maps and tuples, and support for user-defined structures. -However, while there’s a rich variety of dataframe classes (Pandas, Spark, Pandera, etc.), there’s no native Python type that -represents a dataframe in the abstract. This is the gap that the {py:class}`StructuredDataset` type is meant to fill. -It offers the following benefits: +As with most type systems, Python has primitives, container types like maps and tuples, and support for user-defined structures. However, while there’s a rich variety of dataframe classes (Pandas, Spark, Pandera, etc.), there’s no native Python type that represents a dataframe in the abstract. This is the gap that the {py:class}`StructuredDataset` type is meant to fill. It offers the following benefits: - Eliminate boilerplate code you would otherwise need to write to serialize/deserialize from file objects into dataframe instances, - Eliminate additional inputs/outputs that convey metadata around the format of the tabular data held in those files, @@ -50,45 +28,27 @@ the {py:class}`StructuredDataset` type. This example demonstrates how to work with a structured dataset using Flyte entities. ```{note} -To use the `StructuredDataset` type, you only need to import `pandas`. -The other imports specified below are only necessary for this specific example. +To use the `StructuredDataset` type, you only need to import `pandas`. The other imports specified below are only necessary for this specific example. +``` + +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` To begin, import the dependencies for the example: -```{code-cell} -import os -import typing - -import numpy as np -import pandas as pd -import pyarrow as pa -import pyarrow.parquet as pq -from flytekit import FlyteContext, StructuredDatasetType, kwtypes, task, workflow -from flytekit.models import literals -from flytekit.models.literals import StructuredDatasetMetadata -from flytekit.types.structured.structured_dataset import ( - PARQUET, - StructuredDataset, - StructuredDatasetDecoder, - StructuredDatasetEncoder, - StructuredDatasetTransformerEngine, -) -from typing_extensions import Annotated +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:lines: 1-18 ``` -+++ {"lines_to_next_cell": 0} - Define a task that returns a Pandas DataFrame. -```{code-cell} -@task -def generate_pandas_df(a: int) -> pd.DataFrame: - return pd.DataFrame({"Name": ["Tom", "Joseph"], "Age": [a, 22], "Height": [160, 178]}) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:pyobject: generate_pandas_df ``` -+++ {"lines_to_next_cell": 0} - Using this simplest form, however, the user is not able to set the additional dataframe information alluded to above, - Column type information @@ -106,34 +66,21 @@ you can just specify the column names and their types in the structured dataset First, initialize column types you want to extract from the `StructuredDataset`. -```{code-cell} -all_cols = kwtypes(Name=str, Age=int, Height=int) -col = kwtypes(Age=int) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:lines: 30-31 ``` -+++ {"lines_to_next_cell": 0} - Define a task that opens a structured dataset by calling `all()`. When you invoke `all()` with ``pandas.DataFrame``, the Flyte engine downloads the parquet file on S3, and deserializes it to `pandas.DataFrame`. Keep in mind that you can invoke ``open()`` with any dataframe type that's supported or added to structured dataset. For instance, you can use ``pa.Table`` to convert the Pandas DataFrame to a PyArrow table. -```{code-cell} -@task -def get_subset_pandas_df(df: Annotated[StructuredDataset, all_cols]) -> Annotated[StructuredDataset, col]: - df = df.open(pd.DataFrame).all() - df = pd.concat([df, pd.DataFrame([[30]], columns=["Age"])]) - return StructuredDataset(dataframe=df) - - -@workflow -def simple_sd_wf(a: int = 19) -> Annotated[StructuredDataset, col]: - pandas_df = generate_pandas_df(a=a) - return get_subset_pandas_df(df=pandas_df) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:lines: 41-51 ``` -+++ {"lines_to_next_cell": 0} - The code may result in runtime failures if the columns do not match. The input ``df`` has ``Name``, ``Age`` and ``Height`` columns, whereas the output structured dataset will only have the ``Age`` column. @@ -142,26 +89,11 @@ You can use a custom serialization format to serialize your dataframes. Here's how you can register the Pandas to CSV handler, which is already available, and enable the CSV serialization by annotating the structured dataset with the CSV format: -```{code-cell} -from flytekit.types.structured import register_csv_handlers -from flytekit.types.structured.structured_dataset import CSV - -register_csv_handlers() - - -@task -def pandas_to_csv(df: pd.DataFrame) -> Annotated[StructuredDataset, CSV]: - return StructuredDataset(dataframe=df) - - -@workflow -def pandas_to_csv_wf() -> Annotated[StructuredDataset, CSV]: - pandas_df = generate_pandas_df(a=19) - return pandas_to_csv(df=pandas_df) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:lines: 57-71 ``` -+++ {"lines_to_next_cell": 0} - ## Storage driver and location By default, the data will be written to the same place that all other pointer-types (FlyteFile, FlyteDirectory, etc.) are written to. This is controlled by the output data prefix option in Flyte which is configurable on multiple levels. @@ -266,111 +198,57 @@ enabling the use of a 2D NumPy array as a valid type within structured datasets. Extend `StructuredDatasetEncoder` and implement the `encode` function. The `encode` function converts NumPy array to an intermediate format (parquet file format in this case). -```{code-cell} -class NumpyEncodingHandler(StructuredDatasetEncoder): - def encode( - self, - ctx: FlyteContext, - structured_dataset: StructuredDataset, - structured_dataset_type: StructuredDatasetType, - ) -> literals.StructuredDataset: - df = typing.cast(np.ndarray, structured_dataset.dataframe) - name = ["col" + str(i) for i in range(len(df))] - table = pa.Table.from_arrays(df, name) - path = ctx.file_access.get_random_remote_directory() - local_dir = ctx.file_access.get_random_local_directory() - local_path = os.path.join(local_dir, f"{0:05}") - pq.write_table(table, local_path) - ctx.file_access.upload_directory(local_dir, path) - return literals.StructuredDataset( - uri=path, - metadata=StructuredDatasetMetadata(structured_dataset_type=StructuredDatasetType(format=PARQUET)), - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:pyobject: NumpyEncodingHandler ``` -+++ {"lines_to_next_cell": 0} - ### NumPy decoder Extend {py:class}`StructuredDatasetDecoder` and implement the {py:meth}`~StructuredDatasetDecoder.decode` function. The {py:meth}`~StructuredDatasetDecoder.decode` function converts the parquet file to a `numpy.ndarray`. -```{code-cell} -class NumpyDecodingHandler(StructuredDatasetDecoder): - def decode( - self, - ctx: FlyteContext, - flyte_value: literals.StructuredDataset, - current_task_metadata: StructuredDatasetMetadata, - ) -> np.ndarray: - local_dir = ctx.file_access.get_random_local_directory() - ctx.file_access.get_data(flyte_value.uri, local_dir, is_multipart=True) - table = pq.read_table(local_dir) - return table.to_pandas().to_numpy() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:pyobject: NumpyDecodingHandler ``` -+++ {"lines_to_next_cell": 0} - ### NumPy renderer Create a default renderer for numpy array, then Flytekit will use this renderer to display schema of NumPy array on the Flyte deck. -```{code-cell} -class NumpyRenderer: - def to_html(self, df: np.ndarray) -> str: - assert isinstance(df, np.ndarray) - name = ["col" + str(i) for i in range(len(df))] - table = pa.Table.from_arrays(df, name) - return pd.DataFrame(table.schema).to_html(index=False) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:pyobject: NumpyRenderer ``` -+++ {"lines_to_next_cell": 0} - In the end, register the encoder, decoder and renderer with the `StructuredDatasetTransformerEngine`. Specify the Python type you want to register this encoder with (`np.ndarray`), the storage engine to register this against (if not specified, it is assumed to work for all the storage backends), and the byte format, which in this case is `PARQUET`. -```{code-cell} -StructuredDatasetTransformerEngine.register(NumpyEncodingHandler(np.ndarray, None, PARQUET)) -StructuredDatasetTransformerEngine.register(NumpyDecodingHandler(np.ndarray, None, PARQUET)) -StructuredDatasetTransformerEngine.register_renderer(np.ndarray, NumpyRenderer()) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:lines: 127-129 ``` -+++ {"lines_to_next_cell": 0} - You can now use `numpy.ndarray` to deserialize the parquet file to NumPy and serialize a task's output (NumPy array) to a parquet file. -```{code-cell} -@task -def generate_pd_df_with_str() -> pd.DataFrame: - return pd.DataFrame({"Name": ["Tom", "Joseph"]}) - - -@task -def to_numpy(sd: StructuredDataset) -> Annotated[StructuredDataset, None, PARQUET]: - numpy_array = sd.open(np.ndarray).all() - return StructuredDataset(dataframe=numpy_array) - - -@workflow -def numpy_wf() -> Annotated[StructuredDataset, None, PARQUET]: - return to_numpy(sd=generate_pd_df_with_str()) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:lines: 134-147 ``` -+++ {"lines_to_next_cell": 0} - :::{note} `pyarrow` raises an `Expected bytes, got a 'int' object` error when the dataframe contains integers. ::: You can run the code locally as follows: -```{code-cell} -if __name__ == "__main__": - sd = simple_sd_wf() - print(f"A simple Pandas dataframe workflow: {sd.open(pd.DataFrame).all()}") - print(f"Using CSV as the serializer: {pandas_to_csv_wf().open(pd.DataFrame).all()}") - print(f"NumPy encoder and decoder: {numpy_wf().open(np.ndarray).all()}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/data_types_and_io/data_types_and_io/structured_dataset.py +:caption: data_types_and_io/structured_dataset.py +:lines: 151-155 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/ diff --git a/docs/user_guide/development_lifecycle/cache_serializing.md b/docs/user_guide/development_lifecycle/cache_serializing.md index f570b1c351..6552b90d30 100644 --- a/docs/user_guide/development_lifecycle/cache_serializing.md +++ b/docs/user_guide/development_lifecycle/cache_serializing.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Cache serializing ```{eval-rst} @@ -28,43 +11,30 @@ Ensuring serialized evaluation requires a small degree of overhead to coordinate - Periodically scheduled workflow where a single task evaluation duration may span multiple scheduled executions. - Running a commonly shared task within different workflows (which receive the same inputs). -+++ {"lines_to_next_cell": 0} +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` For any {py:func}`flytekit.task` in Flyte, there is always one required import, which is: -```{code-cell} -from flytekit import task +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/task_cache_serialize.py +:caption: development_lifecycle/task_cache_serialize.py +:lines: 1 ``` -+++ {"lines_to_next_cell": 0} - Task cache serializing is disabled by default to avoid unexpected behavior for task executions. To enable use the `cache_serialize` parameter. `cache_serialize` is a switch to enable or disable serialization of the task This operation is only useful for cacheable tasks, where one may reuse output from a previous execution. Flyte requires implicitly enabling the `cache` parameter on all cache serializable tasks. Cache key definitions follow the same rules as non-serialized cache tasks. It is important to understand the implications of the task signature and `cache_version` parameter in defining cached results. -```{code-cell} -:lines_to_next_cell: 2 - -@task(cache=True, cache_serialize=True, cache_version="1.0") -def square(n: int) -> int: - """ - Parameters: - n (int): name of the parameter for the task will be derived from the name of the input variable. - The type will be automatically deduced to Types.Integer - - Return: - int: The label for the output will be automatically assigned, and the type will be deduced from the annotation - - """ - return n * n +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/task_cache_serialize.py +:caption: development_lifecycle/task_cache_serialize.py +:pyobject: square ``` In the above example calling `square(n=2)` multiple times concurrently (even in different executions or workflows) will only execute the multiplication operation once. Concurrently evaluated tasks will wait for completion of the first instance before reusing the cached results and subsequent evaluations will instantly reuse existing cache results. -+++ - ## How does serializing caches work? The cache serialize paradigm introduces a new artifact reservation system. Tasks may use this reservation system to acquire an artifact reservation, indicating that they are actively evaluating the task, and release the reservation, once the execution is completed. Flyte uses a clock-skew algorithm to define reservation timeouts. Therefore, tasks are required to periodically extend the reservation during execution. @@ -72,3 +42,5 @@ The cache serialize paradigm introduces a new artifact reservation system. Tasks The first execution of a serializable cached task will successfully acquire the artifact reservation. Execution will be performed as usual and upon completion, the results are written to the cache and reservation is released. Concurrently executed task instances (i.e. in parallel with the initial execution) will observe an active reservation, in which case the execution will wait until the next reevaluation and perform another check. Once the initial execution completes it will reuse the cached results. Subsequently executed task instances (i.e. after an execution has already completed successfully) will immediately reuse the existing cached results. Flyte handles task execution failures using a timeout on the reservation. If the task currently holding the reservation fails to extend it before it times out, another task may acquire the reservation and begin executing the task. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/development_lifecycle/ diff --git a/docs/user_guide/development_lifecycle/caching.md b/docs/user_guide/development_lifecycle/caching.md index 64f772054c..8419992f89 100644 --- a/docs/user_guide/development_lifecycle/caching.md +++ b/docs/user_guide/development_lifecycle/caching.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Caching ```{eval-rst} @@ -29,33 +12,29 @@ Task caching is useful when a user knows that many executions with the same inpu - Running the code multiple times when debugging workflows - Running the commonly shared tasks amongst different workflows, which receive the same inputs -Let's watch a brief explanation of caching and a demo in this video, followed by how task caching can be enabled . +Let's watch a brief explanation of caching and a demo in this video, followed by how task caching can be enabled. ```{eval-rst} .. youtube:: WNkThCp-gqo ``` -+++ {"lines_to_next_cell": 0} - -Import the necessary libraries. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -import time +Import the necessary libraries: -import pandas +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/task_cache.py +:caption: development_lifecycle/task_cache.py +:lines: 1-3 ``` -+++ {"lines_to_next_cell": 0} - For any {py:func}`flytekit.task` in Flyte, there is always one required import, which is: -```{code-cell} -:lines_to_next_cell: 1 - -from flytekit import HashMethod, task, workflow -from flytekit.core.node_creation import create_node -from typing_extensions import Annotated +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/task_cache.py +:caption: development_lifecycle/task_cache.py +:lines: 8-10 ``` Task caching is disabled by default to avoid unintended consequences of caching tasks with side effects. To enable caching and control its behavior, use the `cache` and `cache_version` parameters when constructing a task. @@ -64,26 +43,14 @@ Task caching is disabled by default to avoid unintended consequences of caching Bumping the `cache_version` is akin to invalidating the cache. You can manually update this version and Flyte caches the next execution instead of relying on the old cache. -```{code-cell} -@task(cache=True, cache_version="1.0") # noqa: F841 -def square(n: int) -> int: - """ - Parameters: - n (int): name of the parameter for the task will be derived from the name of the input variable. - The type will be automatically deduced to ``Types.Integer``. - - Return: - int: The label for the output will be automatically assigned, and the type will be deduced from the annotation. - - """ - return n * n +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/task_cache.py +:caption: development_lifecycle/task_cache.py +:pyobject: square ``` In the above example, calling `square(n=2)` twice (even if it's across different executions or different workflows) will only execute the multiplication operation once. The next time, the output will be made available immediately since it is captured from the previous execution with the same inputs. -+++ - If in a subsequent code update, you update the signature of the task to return the original number along with the result, it'll automatically invalidate the cache (even though the cache version remains the same). ```python @@ -92,8 +59,6 @@ def square(n: int) -> Tuple[int, int]: ... ``` -+++ - :::{note} If the user changes the task interface in any way (such as adding, removing, or editing inputs/outputs), Flyte treats that as a task functionality change. In the subsequent execution, Flyte runs the task and stores the outputs as newly cached values. ::: @@ -138,52 +103,18 @@ The format used by the store is opaque and not meant to be inspectable. The default behavior displayed by Flyte's memoization feature might not match the user intuition. For example, this code makes use of pandas dataframes: -```{code-cell} -@task -def foo(a: int, b: str) -> pandas.DataFrame: - df = pandas.DataFrame(...) - ... - return df - - -@task(cache=True, cache_version="1.0") -def bar(df: pandas.DataFrame) -> int: - ... - - -@workflow -def wf(a: int, b: str): - df = foo(a=a, b=b) - v = bar(df=df) # noqa: F841 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/task_cache.py +:caption: development_lifecycle/task_cache.py +:lines: 39-54 ``` If run twice with the same inputs, one would expect that `bar` would trigger a cache hit, but it turns out that's not the case because of how dataframes are represented in Flyte. However, with release 1.2.0, Flyte provides a new way to control memoization behavior of literals. This is done via a `typing.Annotated` call on the task signature. For example, in order to cache the result of calls to `bar`, you can rewrite the code above like this: -```{code-cell} -def hash_pandas_dataframe(df: pandas.DataFrame) -> str: - return str(pandas.util.hash_pandas_object(df)) - - -@task -def foo_1( # noqa: F811 - a: int, b: str # noqa: F821 -) -> Annotated[pandas.DataFrame, HashMethod(hash_pandas_dataframe)]: # noqa: F821 # noqa: F821 - df = pandas.DataFrame(...) # noqa: F821 - ... - return df - - -@task(cache=True, cache_version="1.0") # noqa: F811 -def bar_1(df: pandas.DataFrame) -> int: # noqa: F811 - ... # noqa: F811 - - -@workflow -def wf_1(a: int, b: str): # noqa: F811 - df = foo(a=a, b=b) # noqa: F811 - v = bar(df=df) # noqa: F841 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/task_cache.py +:caption: development_lifecycle/task_cache.py +:lines: 64-85 ``` Note how the output of task `foo` is annotated with an object of type `HashMethod`. Essentially, it represents a function that produces a hash that is used as part of the cache key calculation in calling the task `bar`. @@ -195,47 +126,11 @@ This is done by turning the literal representation into a string and using that This feature also works in local execution. -+++ - Here's a complete example of the feature: -```{code-cell} -def hash_pandas_dataframe(df: pandas.DataFrame) -> str: - return str(pandas.util.hash_pandas_object(df)) - - -@task -def uncached_data_reading_task() -> Annotated[pandas.DataFrame, HashMethod(hash_pandas_dataframe)]: - return pandas.DataFrame({"column_1": [1, 2, 3]}) - - -@task(cache=True, cache_version="1.0") -def cached_data_processing_task(df: pandas.DataFrame) -> pandas.DataFrame: - time.sleep(1) - return df * 2 - - -@task -def compare_dataframes(df1: pandas.DataFrame, df2: pandas.DataFrame): - assert df1.equals(df2) - - -@workflow -def cached_dataframe_wf(): - raw_data = uncached_data_reading_task() - - # Execute `cached_data_processing_task` twice, but force those - # two executions to happen serially to demonstrate how the second run - # hits the cache. - t1_node = create_node(cached_data_processing_task, df=raw_data) - t2_node = create_node(cached_data_processing_task, df=raw_data) - t1_node >> t2_node - - # Confirm that the dataframes actually match - compare_dataframes(df1=t1_node.o0, df2=t2_node.o0) - - -if __name__ == "__main__": - df1 = cached_dataframe_wf() - print(f"Running cached_dataframe_wf once : {df1}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/task_cache.py +:caption: development_lifecycle/task_cache.py +:lines: 97-134 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/development_lifecycle/ diff --git a/docs/user_guide/development_lifecycle/creating_a_new_project.md b/docs/user_guide/development_lifecycle/creating_a_new_project.md index 7741b810d2..32477eb8c0 100644 --- a/docs/user_guide/development_lifecycle/creating_a_new_project.md +++ b/docs/user_guide/development_lifecycle/creating_a_new_project.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Creating a new project Creates project to be used as a home for the flyte resources of tasks and workflows. diff --git a/docs/user_guide/development_lifecycle/debugging_executions.md b/docs/user_guide/development_lifecycle/debugging_executions.md index 7c8f9562d3..641a9e505b 100644 --- a/docs/user_guide/development_lifecycle/debugging_executions.md +++ b/docs/user_guide/development_lifecycle/debugging_executions.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Debugging executions The inspection of task and workflow execution would provide you log links to debug things further diff --git a/docs/user_guide/development_lifecycle/decks.md b/docs/user_guide/development_lifecycle/decks.md index 5aae4955b1..8aeb2436c2 100644 --- a/docs/user_guide/development_lifecycle/decks.md +++ b/docs/user_guide/development_lifecycle/decks.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (decks)= # Decks @@ -41,16 +22,16 @@ Additionally, you can create new decks to render your data using custom renderer Flyte Decks is an opt-in feature; to enable it, set `enable_deck` to `True` in the task parameters. ::: -To begin, import the dependencies. - -```{code-cell} -import flytekit -from flytekit import ImageSpec, task -from flytekitplugins.deck.renderer import MarkdownRenderer -from sklearn.decomposition import PCA +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +To begin, import the dependencies: + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:lines: 1-4 +``` We create a new deck named `pca` and render Markdown content along with a [PCA](https://en.wikipedia.org/wiki/Principal_component_analysis) plot. @@ -58,16 +39,11 @@ We create a new deck named `pca` and render Markdown content along with a You can begin by initializing an {ref}`ImageSpec ` object to encompass all the necessary dependencies. This approach automatically triggers a Docker build, alleviating the need for you to manually create a Docker image. -```{code-cell} -custom_image = ImageSpec(name="flyte-decks-example", packages=["plotly"], registry="ghcr.io/flyteorg") - -if custom_image.is_container(): - import plotly - import plotly.express as px +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:lines: 15-19 ``` -+++ {"lines_to_next_cell": 0} - :::{important} Replace `ghcr.io/flyteorg` with a container registry you've access to publish to. To upload the image to the local registry in the demo cluster, indicate the registry as `localhost:30000`. @@ -75,29 +51,11 @@ To upload the image to the local registry in the demo cluster, indicate the regi Note the usage of `append` to append the Plotly deck to the Markdown deck. -```{code-cell} -@task(enable_deck=True, container_image=custom_image) -def pca_plot(): - iris_df = px.data.iris() - X = iris_df[["sepal_length", "sepal_width", "petal_length", "petal_width"]] - pca = PCA(n_components=3) - components = pca.fit_transform(X) - total_var = pca.explained_variance_ratio_.sum() * 100 - fig = px.scatter_3d( - components, - x=0, - y=1, - z=2, - color=iris_df["species"], - title=f"Total Explained Variance: {total_var:.2f}%", - labels={"0": "PC 1", "1": "PC 2", "2": "PC 3"}, - ) - main_deck = flytekit.Deck("pca", MarkdownRenderer().to_html("### Principal Component Analysis")) - main_deck.append(plotly.io.to_html(fig)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:pyobject: pca_plot ``` -+++ {"lines_to_next_cell": 0} - :::{Important} To view the log output locally, the `FLYTE_SDK_LOGGING_LEVEL` environment variable should be set to 20. ::: @@ -138,44 +96,28 @@ When the task connected with a deck object is executed, these objects employ ren Creates a profile report from a Pandas DataFrame. -```{code-cell} -import pandas as pd -from flytekitplugins.deck.renderer import FrameProfilingRenderer - - -@task(enable_deck=True) -def frame_renderer() -> None: - df = pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}) - flytekit.Deck("Frame Renderer", FrameProfilingRenderer().to_html(df=df)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:lines: 44-51 ``` -+++ {"lines_to_next_cell": 0} - :::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_frame_renderer.png :alt: Frame renderer :class: with-shadow ::: -+++ {"lines_to_next_cell": 0} + #### Top-frame renderer Renders DataFrame as an HTML table. This renderer doesn't necessitate plugin installation since it's accessible within the flytekit library. -```{code-cell} -from typing import Annotated - -from flytekit.deck import TopFrameRenderer - - -@task(enable_deck=True) -def top_frame_renderer() -> Annotated[pd.DataFrame, TopFrameRenderer(1)]: - return pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:lines: 57-64 ``` -+++ {"lines_to_next_cell": 0} - :::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_top_frame_renderer.png :alt: Top frame renderer :class: with-shadow @@ -185,16 +127,11 @@ def top_frame_renderer() -> Annotated[pd.DataFrame, TopFrameRenderer(1)]: Converts a Markdown string into HTML, producing HTML as a Unicode string. -```{code-cell} -@task(enable_deck=True) -def markdown_renderer() -> None: - flytekit.current_context().default_deck.append( - MarkdownRenderer().to_html("You can install flytekit using this command: ```import flytekit```") - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:pyobject: markdown_renderer ``` -+++ {"lines_to_next_cell": 0} - :::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_markdown_renderer.png :alt: Markdown renderer :class: with-shadow @@ -210,18 +147,11 @@ The median (Q2) is indicated by a line within the box. Typically, the whiskers extend to the edges of the box, plus or minus 1.5 times the interquartile range (IQR: Q3-Q1). -```{code-cell} -from flytekitplugins.deck.renderer import BoxRenderer - - -@task(enable_deck=True) -def box_renderer() -> None: - iris_df = px.data.iris() - flytekit.Deck("Box Plot", BoxRenderer("sepal_length").to_html(iris_df)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:lines: 85-91 ``` -+++ {"lines_to_next_cell": 0} - :::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_box_renderer.png :alt: Box renderer :class: with-shadow @@ -232,26 +162,11 @@ def box_renderer() -> None: Converts a {ref}`FlyteFile ` or `PIL.Image.Image` object into an HTML string, where the image data is encoded as a base64 string. -```{code-cell} -from flytekit import workflow -from flytekit.types.file import FlyteFile -from flytekitplugins.deck.renderer import ImageRenderer - - -@task(enable_deck=True) -def image_renderer(image: FlyteFile) -> None: - flytekit.Deck("Image Renderer", ImageRenderer().to_html(image_src=image)) - - -@workflow -def image_renderer_wf( - image: FlyteFile = "https://bit.ly/3KZ95q4", -) -> None: - image_renderer(image=image) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:lines: 97-111 ``` -+++ {"lines_to_next_cell": 0} - :::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_image_renderer.png :alt: Image renderer :class: with-shadow @@ -261,20 +176,11 @@ def image_renderer_wf( Converts a Pandas dataframe into an HTML table. -```{code-cell} -from flytekitplugins.deck.renderer import TableRenderer - - -@task(enable_deck=True) -def table_renderer() -> None: - flytekit.Deck( - "Table Renderer", - TableRenderer().to_html(df=pd.DataFrame(data={"col1": [1, 2], "col2": [3, 4]}), table_width=50), - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:lines: 115-123 ``` -+++ {"lines_to_next_cell": 0} - :::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_table_renderer.png :alt: Table renderer :class: with-shadow @@ -284,23 +190,9 @@ def table_renderer() -> None: Converts source code to HTML and renders it as a Unicode string on the deck. -```{code-cell} -:lines_to_next_cell: 2 - -import inspect - -from flytekitplugins.deck.renderer import SourceCodeRenderer - - -@task(enable_deck=True) -def source_code_renderer() -> None: - file_path = inspect.getsourcefile(frame_renderer.__wrapped__) - with open(file_path, "r") as f: - source_code = f.read() - flytekit.Deck( - "Source Code Renderer", - SourceCodeRenderer().to_html(source_code), - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/decks.py +:caption: development_lifecycle/decks.py +:lines: 128-141 ``` :::{figure} https://raw.githubusercontent.com/flyteorg/static-resources/main/flytesnacks/user_guide/flyte_decks_source_code_renderer.png @@ -314,3 +206,5 @@ Don't hesitate to integrate a new renderer into [renderer.py](https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py) if your deck renderers can enhance data visibility. Feel encouraged to open a pull request and play a part in enhancing the Flyte deck renderer ecosystem! + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/development_lifecycle/ diff --git a/docs/user_guide/development_lifecycle/failure_node.md b/docs/user_guide/development_lifecycle/failure_node.md index 61756e4a8a..a3bd1ae328 100644 --- a/docs/user_guide/development_lifecycle/failure_node.md +++ b/docs/user_guide/development_lifecycle/failure_node.md @@ -9,81 +9,54 @@ The failure node feature enables you to designate a specific node to execute in For example, a workflow involves creating a cluster at the beginning, followed by the execution of tasks, and concludes with the deletion of the cluster once all tasks are completed. However, if any task within the workflow encounters an error, flyte will abort the entire workflow and won’t delete the cluster. This poses a challenge if you still need to clean up the cluster even in a task failure. -To address this issue, you can add a failure node into your workflow. This ensures that critical actions, such as deleting the cluster, are executed even in the event of failures occurring throughout the workflow execution: +To address this issue, you can add a failure node into your workflow. This ensures that critical actions, such as deleting the cluster, are executed even in the event of failures occurring throughout the workflow execution -```python -from flytekit import WorkflowFailurePolicy, task, workflow - - -@task -def create_cluster(name: str): - print(f"Creating cluster: {name}") +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/failure_node.py +:caption: development_lifecycle/failure_node.py +:lines: 1-6 ``` Create a task that will fail during execution: -```python -@task -def t1(a: int, b: str): - print(f"{a} {b}") - raise ValueError("Fail!") - - -@task -def delete_cluster(name: str): - print(f"Deleting cluster {name}") +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/failure_node.py +:caption: development_lifecycle/failure_node.py +:lines: 10-18 ``` Create a task that will be executed if any of the tasks in the workflow fail: -```python -@task -def clean_up(name: str): - print(f"Cleaning up cluster {name}") - +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/failure_node.py +:caption: development_lifecycle/failure_node.py +:pyobject: clean_up ``` Specify the `on_failure` to a cleanup task. This task will be executed if any of the tasks in the workflow fail: - :::{note} The input of `clean_up` should be the exact same as the input of the workflow. ::: -```python -@workflow(on_failure=clean_up) -def subwf(name: str): - c = create_cluster(name=name) - t = t1(a=1, b="2") - d = delete_cluster(name=name) - c >> t >> d +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/failure_node.py +:caption: development_lifecycle/failure_node.py +:pyobject: subwf ``` By setting the failure policy to `FAIL_AFTER_EXECUTABLE_NODES_COMPLETE` to ensure that the `wf1` is executed even if the subworkflow fails. In this case, both parent and child workflows will fail, resulting in the `clean_up` task being executed twice: -```python -@workflow(on_failure=clean_up, failure_policy=WorkflowFailurePolicy.FAIL_AFTER_EXECUTABLE_NODES_COMPLETE) -def wf1(name: str = "my_cluster"): - c = create_cluster(name=name) - subwf(name="another_cluster") - t = t1(a=1, b="2") - d = delete_cluster(name=name) - c >> t >> d - - -@workflow -def clean_up_wf(name: str): - return clean_up(name=name) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/failure_node.py +:caption: development_lifecycle/failure_node.py +:lines: 42-53 ``` You can also set the `on_failure` to a workflow. This workflow will be executed if any of the tasks in the workflow fail: -```python -@workflow(on_failure=clean_up_wf) -def wf2(name: str = "my_cluster"): - c = create_cluster(name=name) - t = t1(a=1, b="2") - d = delete_cluster(name=name) - c >> t >> d +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/development_lifecycle/development_lifecycle/failure_node.py +:caption: development_lifecycle/failure_node.py +:pyobject: wf2 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/development_lifecycle/ diff --git a/docs/user_guide/development_lifecycle/inspecting_executions.md b/docs/user_guide/development_lifecycle/inspecting_executions.md index 1ce09ae155..73fd024c8d 100644 --- a/docs/user_guide/development_lifecycle/inspecting_executions.md +++ b/docs/user_guide/development_lifecycle/inspecting_executions.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Inspecting executions ## Flytectl diff --git a/docs/user_guide/development_lifecycle/private_images.md b/docs/user_guide/development_lifecycle/private_images.md index 5ebd41ea73..3783f10fe8 100644 --- a/docs/user_guide/development_lifecycle/private_images.md +++ b/docs/user_guide/development_lifecycle/private_images.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (private_images)= # Private images diff --git a/docs/user_guide/development_lifecycle/running_launch_plans.md b/docs/user_guide/development_lifecycle/running_launch_plans.md index 1fb8bb4c2c..73179046f3 100644 --- a/docs/user_guide/development_lifecycle/running_launch_plans.md +++ b/docs/user_guide/development_lifecycle/running_launch_plans.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (remote_launchplan)= # Running launch plans diff --git a/docs/user_guide/development_lifecycle/running_tasks.md b/docs/user_guide/development_lifecycle/running_tasks.md index 882380109d..26c31744d0 100644 --- a/docs/user_guide/development_lifecycle/running_tasks.md +++ b/docs/user_guide/development_lifecycle/running_tasks.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (remote_task)= # Running tasks diff --git a/docs/user_guide/development_lifecycle/running_workflows.md b/docs/user_guide/development_lifecycle/running_workflows.md index 2e04714adc..631cc6d4ae 100644 --- a/docs/user_guide/development_lifecycle/running_workflows.md +++ b/docs/user_guide/development_lifecycle/running_workflows.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Running workflows Workflows on their own are not runnable directly. However, a launchplan is always bound to a workflow and you can use diff --git a/docs/user_guide/environment_setup.md b/docs/user_guide/environment_setup.md index f23d32219c..1d6a8740c7 100644 --- a/docs/user_guide/environment_setup.md +++ b/docs/user_guide/environment_setup.md @@ -98,7 +98,7 @@ You can also run the code directly from a remote source: ``` pyflyte run --remote \ - https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/hello_world.py \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/example_code/basics/basics/hello_world.py \ hello_world_wf ``` @@ -118,7 +118,7 @@ Finally, run a workflow that takes some inputs, for example the `workflow.py` ex ```{prompt} bash pyflyte run --remote \ - https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/basics/basics/workflow.py \ + https://raw.githubusercontent.com/flyteorg/flytesnacks/master/example_code/basics/basics/workflow.py \ simple_wf --x '[-3,0,3]' --y '[7,4,-2]' ``` diff --git a/docs/user_guide/extending/backend_plugins.md b/docs/user_guide/extending/backend_plugins.md index bbee56ea46..446c225f2d 100644 --- a/docs/user_guide/extending/backend_plugins.md +++ b/docs/user_guide/extending/backend_plugins.md @@ -1,21 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -%% [markdown] (extend-plugin-flyte-backend)= # Backend plugins diff --git a/docs/user_guide/extending/container_interface.md b/docs/user_guide/extending/container_interface.md index c0be559d76..56eec884ba 100644 --- a/docs/user_guide/extending/container_interface.md +++ b/docs/user_guide/extending/container_interface.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (core-extend-flyte-container-interface)= # Container interface diff --git a/docs/user_guide/extending/custom_types.md b/docs/user_guide/extending/custom_types.md index af82d1a6ec..d0c68b59e4 100644 --- a/docs/user_guide/extending/custom_types.md +++ b/docs/user_guide/extending/custom_types.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (advanced_custom_types)= # Custom types @@ -23,8 +6,7 @@ kernelspec: .. tags:: Extensibility, Contribute, Intermediate ``` -Flyte is a strongly-typed framework for authoring tasks and workflows. But there are situations when the existing -types do not directly work. This is true with any programming language! +Flyte is a strongly-typed framework for authoring tasks and workflows. But there are situations when the existing types do not directly work. This is true with any programming language! Similar to a programming language enabling higher-level concepts to describe user-specific objects such as classes in Python/Java/C++, struct in C/Golang, etc., Flytekit allows modeling user classes. The idea is to make an interface that is more productive for the @@ -37,63 +19,30 @@ The example is demonstrated in the video below: ```{eval-rst} .. youtube:: 1xExpRzz8Tw - ``` -+++ {"lines_to_next_cell": 0} - -First, we import the dependencies. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -import os -import tempfile -import typing -from typing import Type +First, we import the dependencies: -from flytekit import Blob, BlobMetadata, BlobType, FlyteContext, Literal, LiteralType, Scalar, task, workflow -from flytekit.extend import TypeEngine, TypeTransformer +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/custom_types.py +:caption: extending/custom_types.py +:lines: 1-7 ``` -+++ {"lines_to_next_cell": 0} - :::{note} `FlyteContext` is used to access a random local directory. ::: Defined type here represents a list of files on the disk. We will refer to it as `MyDataset`. -```{code-cell} -class MyDataset(object): - """ - ``MyDataset`` is a collection of files. In Flyte, this maps to a multi-part blob or directory. - """ - - def __init__(self, base_dir: str = None): - if base_dir is None: - self._tmp_dir = tempfile.TemporaryDirectory() - self._base_dir = self._tmp_dir.name - self._files = [] - else: - self._base_dir = base_dir - files = os.listdir(base_dir) - self._files = [os.path.join(base_dir, f) for f in files] - - @property - def base_dir(self) -> str: - return self._base_dir - - @property - def files(self) -> typing.List[str]: - return self._files - - def new_file(self, name: str) -> str: - new_file = os.path.join(self._base_dir, name) - self._files.append(new_file) - return new_file +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/custom_types.py +:caption: extending/custom_types.py +:pyobject: MyDataset ``` -+++ {"lines_to_next_cell": 0} - `MyDataset` represents a set of files locally. However, when a workflow consists of multiple steps, we want the data to flow between different steps. To achieve this, it is necessary to explain how the data will be transformed to Flyte's remote references. To do this, we create a new instance of @@ -104,92 +53,31 @@ The `TypeTransformer` is a Generic abstract base class. The `Generic` type argum that we want to work with. In this case, it is the `MyDataset` object. ::: -```{code-cell} -class MyDatasetTransformer(TypeTransformer[MyDataset]): - _TYPE_INFO = BlobType(format="binary", dimensionality=BlobType.BlobDimensionality.MULTIPART) - - def __init__(self): - super(MyDatasetTransformer, self).__init__(name="mydataset-transform", t=MyDataset) - - def get_literal_type(self, t: Type[MyDataset]) -> LiteralType: - """ - This is useful to tell the Flytekit type system that ``MyDataset`` actually refers to what corresponding type. - In this example, we say its of format binary (do not try to introspect) and there is more than one file in it. - """ - return LiteralType(blob=self._TYPE_INFO) - - def to_literal( - self, - ctx: FlyteContext, - python_val: MyDataset, - python_type: Type[MyDataset], - expected: LiteralType, - ) -> Literal: - """ - This method is used to convert from the given python type object ``MyDataset`` to the Literal representation. - """ - # Step 1: Upload all the data into a remote place recommended by Flyte - remote_dir = ctx.file_access.get_random_remote_directory() - ctx.file_access.upload_directory(python_val.base_dir, remote_dir) - # Step 2: Return a pointer to this remote_dir in the form of a Literal - return Literal(scalar=Scalar(blob=Blob(uri=remote_dir, metadata=BlobMetadata(type=self._TYPE_INFO)))) - - def to_python_value(self, ctx: FlyteContext, lv: Literal, expected_python_type: Type[MyDataset]) -> MyDataset: - """ - In this method, we want to be able to re-hydrate the custom object from Flyte Literal value. - """ - # Step 1: Download remote data locally - local_dir = ctx.file_access.get_random_local_directory() - ctx.file_access.download_directory(lv.scalar.blob.uri, local_dir) - # Step 2: Create the ``MyDataset`` object - return MyDataset(base_dir=local_dir) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/custom_types.py +:caption: extending/custom_types.py +:pyobject: MyDatasetTransformer ``` -+++ {"lines_to_next_cell": 0} - Before we can use MyDataset in our tasks, we need to let Flytekit know that `MyDataset` should be considered as a valid type. This is done using {py:class}`~flytekit:flytekit.extend.TypeEngine`'s `register` method. -```{code-cell} -TypeEngine.register(MyDatasetTransformer()) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/custom_types.py +:caption: extending/custom_types.py +:lines: 87 ``` -+++ {"lines_to_next_cell": 0} - The new type should be ready to use! Let us write an example generator and consumer for this new datatype. -```{code-cell} -@task -def generate() -> MyDataset: - d = MyDataset() - for i in range(3): - fp = d.new_file(f"x{i}") - with open(fp, "w") as f: - f.write(f"Contents of file{i}") - - return d - - -@task -def consume(d: MyDataset) -> str: - s = "" - for f in d.files: - with open(f) as fp: - s += fp.read() - s += "\n" - return s - - -@workflow -def wf() -> str: - return consume(d=generate()) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/custom_types.py +:caption: extending/custom_types.py +:lines: 91-114 ``` -+++ {"lines_to_next_cell": 0} - This workflow can be executed and tested locally. Flytekit will exercise the entire path even if you run it locally. -```{code-cell} -if __name__ == "__main__": - print(wf()) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/custom_types.py +:caption: extending/custom_types.py +:lines: 119-120 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/extending/ diff --git a/docs/user_guide/extending/prebuilt_container_task_plugins.md b/docs/user_guide/extending/prebuilt_container_task_plugins.md index ed51f1b7a6..bf03de5529 100644 --- a/docs/user_guide/extending/prebuilt_container_task_plugins.md +++ b/docs/user_guide/extending/prebuilt_container_task_plugins.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (prebuilt_container)= # Prebuilt container task plugins diff --git a/docs/user_guide/extending/user_container_task_plugins.md b/docs/user_guide/extending/user_container_task_plugins.md index 68cc1a859d..96ed6fb310 100644 --- a/docs/user_guide/extending/user_container_task_plugins.md +++ b/docs/user_guide/extending/user_container_task_plugins.md @@ -1,22 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - (user_container)= # User container task plugins @@ -27,17 +8,18 @@ kernelspec: A user container task plugin runs a user-defined container that has the user code. -This tutorial will walk you through writing your own sensor-style plugin that allows users to wait for a file to land -in the object store. Remember that if you follow the flyte/flytekit constructs, you will automatically make your plugin portable -across all cloud platforms that Flyte supports. +This tutorial will walk you through writing your own sensor-style plugin that allows users to wait for a file to land in the object store. Remember that if you follow the flyte/flytekit constructs, you will automatically make your plugin portable across all cloud platforms that Flyte supports. ## Sensor plugin -A sensor plugin waits for some event to happen before marking the task as success. You need not worry about the -timeout as that will be handled by the flyte engine itself when running in production. +A sensor plugin waits for some event to happen before marking the task as success. You need not worry about the timeout as that will be handled by the flyte engine itself when running in production. ### Plugin API +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` + ```python sensor = WaitForObjectStoreFile(metadata=metadata(timeout="1H", retries=10)) @@ -50,58 +32,19 @@ def wait_and_run(path: str) -> int: return do_next(path=path) ``` -```{code-cell} -import typing -from datetime import timedelta -from time import sleep - -from flytekit import TaskMetadata, task, workflow -from flytekit.extend import Interface, PythonTask, context_manager +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/user_container.py +:caption: extending/user_container.py +:lines: 1-6 ``` -+++ {"lines_to_next_cell": 0} - ### Plugin structure As illustrated above, to achieve this structure we need to create a class named `WaitForObjectStoreFile`, which derives from {py:class}`flytekit.PythonFunctionTask` as follows. -```{code-cell} -class WaitForObjectStoreFile(PythonTask): - """ - Add documentation here for your plugin. - This plugin creates an object store file sensor that waits and exits only when the file exists. - """ - - _VAR_NAME: str = "path" - - def __init__( - self, - name: str, - poll_interval: timedelta = timedelta(seconds=10), - **kwargs, - ): - super(WaitForObjectStoreFile, self).__init__( - task_type="object-store-sensor", - name=name, - task_config=None, - interface=Interface(inputs={self._VAR_NAME: str}, outputs={self._VAR_NAME: str}), - **kwargs, - ) - self._poll_interval = poll_interval - - def execute(self, **kwargs) -> typing.Any: - # No need to check for existence, as that is guaranteed. - path = kwargs[self._VAR_NAME] - ctx = context_manager.FlyteContext.current_context() - user_context = ctx.user_space_params - while True: - user_context.logging.info(f"Sensing file in path {path}...") - if ctx.file_access.exists(path): - user_context.logging.info(f"file in path {path} exists!") - return path - user_context.logging.warning(f"file in path {path} does not exists!") - sleep(self._poll_interval.seconds) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/user_container.py +:caption: extending/user_container.py +:pyobject: WaitForObjectStoreFile ``` #### Config objects @@ -122,43 +65,24 @@ In this example, we are creating a named class plugin, and hence, this construct Refer to the [spark plugin](https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-spark) for an example of a config object. -+++ ### Actual usage -```{code-cell} -sensor = WaitForObjectStoreFile( - name="my-objectstore-sensor", - metadata=TaskMetadata(retries=10, timeout=timedelta(minutes=20)), - poll_interval=timedelta(seconds=1), -) - - -@task -def print_file(path: str) -> str: - print(path) - return path - - -@workflow -def my_workflow(path: str) -> str: - return print_file(path=sensor(path=path)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/user_container.py +:caption: extending/user_container.py +:lines: 54-69 ``` -+++ {"lines_to_next_cell": 0} - And of course, you can run the workflow locally using your own new shiny plugin! -```{code-cell} -if __name__ == "__main__": - f = "/tmp/some-file" - with open(f, "w") as w: - w.write("Hello World!") - - print(my_workflow(path=f)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/extending/extending/user_container.py +:caption: extending/user_container.py +:lines: 73-78 ``` The key takeaways of a user container task plugin are: - The task object that gets serialized at compile-time is recreated using the user's code at run time. - At platform-run-time, the user-decorated function is executed. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/extending/ diff --git a/docs/user_guide/productionizing/configuring_access_to_gpus.md b/docs/user_guide/productionizing/configuring_access_to_gpus.md index f2575b5adb..60e4a35ced 100644 --- a/docs/user_guide/productionizing/configuring_access_to_gpus.md +++ b/docs/user_guide/productionizing/configuring_access_to_gpus.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (configure-gpus)= # Configuring access to GPUs diff --git a/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md b/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md index 3091e1bc82..67726a23ce 100644 --- a/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md +++ b/docs/user_guide/productionizing/configuring_logging_links_in_the_ui.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (configure-logging)= # Configuring logging links in the UI @@ -24,20 +7,16 @@ kernelspec: ``` To debug your workflows in production, you want to access logs from your tasks as they run. -These logs are different from the core Flyte platform logs, are specific to execution, and may vary from plugin to plugin; -for example, Spark may have driver and executor logs. +These logs are different from the core Flyte platform logs, are specific to execution, and may vary from plugin to plugin; for example, Spark may have driver and executor logs. -Every organization potentially uses different log aggregators, making it hard to create a one-size-fits-all solution. -Some examples of the log aggregators include cloud-hosted solutions like AWS CloudWatch, GCP Stackdriver, Splunk, Datadog, etc. +Every organization potentially uses different log aggregators, making it hard to create a one-size-fits-all solution. Some examples of the log aggregators include cloud-hosted solutions like AWS CloudWatch, GCP Stackdriver, Splunk, Datadog, etc. Flyte provides a simplified interface to configure your log provider. Flyte-sandbox -ships with the Kubernetes dashboard to visualize the logs. This may not be safe for production, hence we recommend users -explore other log aggregators. +ships with the Kubernetes dashboard to visualize the logs. This may not be safe for production, hence we recommend users explore other log aggregators. ## How to configure? -To configure your log provider, the provider needs to support `URL` links that are shareable and can be templatized. -The templating engine has access to [these](https://github.com/flyteorg/flyteplugins/blob/b0684d97a1cf240f1a44f310f4a79cc21844caa9/go/tasks/pluginmachinery/tasklog/plugin.go#L7-L16) parameters. +To configure your log provider, the provider needs to support `URL` links that are shareable and can be templatized. The templating engine has access to [these](https://github.com/flyteorg/flyteplugins/blob/b0684d97a1cf240f1a44f310f4a79cc21844caa9/go/tasks/pluginmachinery/tasklog/plugin.go#L7-L16) parameters. The parameters can be used to generate a unique URL to the logs using a templated URI that pertain to a specific task. The templated URI has access to the following parameters: diff --git a/docs/user_guide/productionizing/customizing_task_resources.md b/docs/user_guide/productionizing/customizing_task_resources.md index 39fee64d55..1f95d26219 100644 --- a/docs/user_guide/productionizing/customizing_task_resources.md +++ b/docs/user_guide/productionizing/customizing_task_resources.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Customizing task resources ```{eval-rst} @@ -24,8 +7,6 @@ kernelspec: One of the reasons to use a hosted Flyte environment is the potential of leveraging CPU, memory and storage resources, far greater than what's available locally. Flytekit makes it possible to specify these requirements declaratively and close to where the task itself is declared. -+++ - In this example, the memory required by the function increases as the dataset size increases. Large datasets may not be able to run locally, so we would want to provide hints to the Flyte backend to request for more memory. This is done by decorating the task with the hints as shown in the following code sample. @@ -47,68 +28,51 @@ To ensure that regular tasks that don't require GPUs are not scheduled on GPU no To ensure that tasks that require GPUs get the needed tolerations on their pods, set up FlytePropeller using the following [configuration](https://github.com/flyteorg/flytepropeller/blob/v0.10.5/config.yaml#L51,L56). Ensure that this toleration config matches the taint config you have configured to protect your GPU providing nodes from dealing with regular non-GPU workloads (pods). -The actual values follow the [Kubernetes convention](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-units-in-kubernetes). -Let's look at an example to understand how to customize resources. - -+++ {"lines_to_next_cell": 0} +The actual values follow the [Kubernetes convention](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-units-in-kubernetes). Let's look at an example to understand how to customize resources. -Import the dependencies. +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -```{code-cell} -import typing +Import the dependencies: -from flytekit import Resources, task, workflow +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:lines: 1-3 ``` -+++ {"lines_to_next_cell": 0} +Define a task and configure the resources to be allocated to it: -Define a task and configure the resources to be allocated to it. - -```{code-cell} -@task(requests=Resources(cpu="1", mem="100Mi"), limits=Resources(cpu="2", mem="150Mi")) -def count_unique_numbers(x: typing.List[int]) -> int: - s = set() - for i in x: - s.add(i) - return len(s) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:pyobject: count_unique_numbers ``` -+++ {"lines_to_next_cell": 0} - -Define a task that computes the square of a number. +Define a task that computes the square of a number: -```{code-cell} -@task -def square(x: int) -> int: - return x * x +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:pyobject: square ``` -+++ {"lines_to_next_cell": 0} - You can use the tasks decorated with memory and storage hints like regular tasks in a workflow. -```{code-cell} -@workflow -def my_workflow(x: typing.List[int]) -> int: - return square(x=count_unique_numbers(x=x)) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:pyobject: my_workflow ``` -+++ {"lines_to_next_cell": 0} - You can execute the workflow locally. -```{code-cell} -if __name__ == "__main__": - print(count_unique_numbers(x=[1, 1, 2])) - print(my_workflow(x=[1, 1, 2])) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:lines: 32-34 ``` :::{note} To alter the limits of the default platform configuration, change the [admin config](https://github.com/flyteorg/flyte/blob/b16ffd76934d690068db1265ac9907a278fba2ee/deployment/eks/flyte_helm_generated.yaml#L203-L213) and [namespace level quota](https://github.com/flyteorg/flyte/blob/b16ffd76934d690068db1265ac9907a278fba2ee/deployment/eks/flyte_helm_generated.yaml#L214-L240) on the cluster. ::: -+++ - (resource_with_overrides)= ## Using `with_overrides` @@ -116,58 +80,40 @@ To alter the limits of the default platform configuration, change the [admin con You can use the `with_overrides` method to override the resources allocated to the tasks dynamically. Let's understand how the resources can be initialized with an example. -+++ {"lines_to_next_cell": 0} - Import the dependencies. -```{code-cell} -import typing # noqa: E402 - -from flytekit import Resources, task, workflow # noqa: E402 +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:lines: 38-40 ``` -+++ {"lines_to_next_cell": 0} - Define a task and configure the resources to be allocated to it. You can use tasks decorated with memory and storage hints like regular tasks in a workflow. -```{code-cell} -@task(requests=Resources(cpu="1", mem="200Mi"), limits=Resources(cpu="2", mem="350Mi")) -def count_unique_numbers_1(x: typing.List[int]) -> int: - s = set() - for i in x: - s.add(i) - return len(s) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:pyobject: count_unique_numbers ``` -+++ {"lines_to_next_cell": 0} - -Define a task that computes the square of a number. +Define a task that computes the square of a number: -```{code-cell} -@task -def square_1(x: int) -> int: - return x * x +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:pyobject: square_1 ``` -+++ {"lines_to_next_cell": 0} +The `with_overrides` method overrides the old resource allocations: -The `with_overrides` method overrides the old resource allocations. - -```{code-cell} -@workflow -def my_pipeline(x: typing.List[int]) -> int: - return square_1(x=count_unique_numbers_1(x=x)).with_overrides(limits=Resources(cpu="6", mem="500Mi")) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:pyobject: my_pipeline ``` -+++ {"lines_to_next_cell": 0} - -You can execute the workflow locally. +You can execute the workflow locally: -```{code-cell} -if __name__ == "__main__": - print(count_unique_numbers_1(x=[1, 1, 2])) - print(my_pipeline(x=[1, 1, 2])) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/customizing_resources.py +:caption: productionizing/customizing_resources.py +:lines: 65-67 ``` You can see the memory allocation below. The memory limit is `500Mi` rather than `350Mi`, and the @@ -179,3 +125,5 @@ This is because the default platform CPU quota for every pod is 4. Resource allocated using "with_overrides" method ::: + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/productionizing/ diff --git a/docs/user_guide/productionizing/notifications.md b/docs/user_guide/productionizing/notifications.md index 133a402c43..6837a7b632 100644 --- a/docs/user_guide/productionizing/notifications.md +++ b/docs/user_guide/productionizing/notifications.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Notifications ```{eval-rst} @@ -22,8 +5,6 @@ kernelspec: ``` -+++ - When a workflow is completed, users can be notified by: - Email @@ -34,100 +15,45 @@ The content of these notifications is configurable at the platform level. ## Code example -When a workflow reaches a specified [terminal workflow execution phase](https://github.com/flyteorg/flytekit/blob/v0.16.0b7/flytekit/core/notification.py#L10,L15), -the {py:class}`flytekit:flytekit.Email`, {py:class}`flytekit:flytekit.PagerDuty`, or {py:class}`flytekit:flytekit.Slack` -objects can be used in the construction of a {py:class}`flytekit:flytekit.LaunchPlan`. +When a workflow reaches a specified [terminal workflow execution phase](https://github.com/flyteorg/flytekit/blob/v0.16.0b7/flytekit/core/notification.py#L10,L15), the {py:class}`flytekit:flytekit.Email`, {py:class}`flytekit:flytekit.PagerDuty`, or {py:class}`flytekit:flytekit.Slack` objects can be used in the construction of a {py:class}`flytekit:flytekit.LaunchPlan`. -```{code-cell} -from datetime import timedelta +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/lp_notifications.py +:caption: productionizing/lp_notifications.py +:lines: 1 +``` Consider the following example workflow: -```{code-cell} -from flytekit import Email, FixedRate, LaunchPlan, PagerDuty, Slack, WorkflowExecutionPhase, task, workflow - - -@task -def double_int_and_print(a: int) -> str: - return str(a * 2) - - -@workflow -def int_doubler_wf(a: int) -> str: - doubled = double_int_and_print(a=a) - return doubled +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/lp_notifications.py +:caption: productionizing/lp_notifications.py +:lines: 3-14 ``` -+++ {"lines_to_next_cell": 0} - Here are three scenarios that can help deepen your understanding of how notifications work: 1. Launch Plan triggers email notifications when the workflow execution reaches the `SUCCEEDED` phase. -```{code-cell} -int_doubler_wf_lp = LaunchPlan.get_or_create( - name="email_notifications_lp", - workflow=int_doubler_wf, - default_inputs={"a": 4}, - notifications=[ - Email( - phases=[WorkflowExecutionPhase.SUCCEEDED], - recipients_email=["admin@example.com"], - ) - ], -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/lp_notifications.py +:caption: productionizing/lp_notifications.py +:lines: 20-30 ``` -+++ {"lines_to_next_cell": 0} - 2. Notifications shine when used for scheduled workflows to alert for failures. -```{code-cell} -:lines_to_next_cell: 2 - -int_doubler_wf_scheduled_lp = LaunchPlan.get_or_create( - name="int_doubler_wf_scheduled", - workflow=int_doubler_wf, - default_inputs={"a": 4}, - notifications=[ - PagerDuty( - phases=[WorkflowExecutionPhase.FAILED, WorkflowExecutionPhase.TIMED_OUT], - recipients_email=["abc@pagerduty.com"], - ) - ], - schedule=FixedRate(duration=timedelta(days=1)), -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/lp_notifications.py +:caption: productionizing/lp_notifications.py +:lines: 33-44 ``` 3. Notifications can be combined with different permutations of terminal phases and recipient targets. -```{code-cell} -wacky_int_doubler_lp = LaunchPlan.get_or_create( - name="wacky_int_doubler", - workflow=int_doubler_wf, - default_inputs={"a": 4}, - notifications=[ - Email( - phases=[WorkflowExecutionPhase.FAILED], - recipients_email=["me@example.com", "you@example.com"], - ), - Email( - phases=[WorkflowExecutionPhase.SUCCEEDED], - recipients_email=["myboss@example.com"], - ), - Slack( - phases=[ - WorkflowExecutionPhase.SUCCEEDED, - WorkflowExecutionPhase.ABORTED, - WorkflowExecutionPhase.TIMED_OUT, - ], - recipients_email=["myteam@slack.com"], - ), - ], -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/lp_notifications.py +:caption: productionizing/lp_notifications.py +:lines: 48-70 ``` 4. You can use pyflyte register to register the launch plan and launch it in the web console to get the notifications. @@ -142,13 +68,10 @@ Choose the launch plan with notifications config :class: with-shadow ::: -+++ ### Future work -Work is ongoing to support a generic event egress system that can be used to publish events for tasks, workflows, and -workflow nodes. When this is complete, generic event subscribers can asynchronously process these events for a rich -and fully customizable experience. +Work is ongoing to support a generic event egress system that can be used to publish events for tasks, workflows, and workflow nodes. When this is complete, generic event subscribers can asynchronously process these events for a rich and fully customizable experience. ## Platform configuration changes @@ -160,8 +83,7 @@ This is only supported for Flyte instances running on AWS. ### Config #### For Sandbox -To publish notifications, you'll need to register a sendgrid api key from [sendgrid](https://sendgrid.com/), it's free for 100 emails per day. -You have to add notifications config in your sandbox config file. +To publish notifications, you'll need to register a Sendgrid api key from [Sendgrid](https://sendgrid.com/), it's free for 100 emails per day. You have to add notifications config in your sandbox config file. ```yaml # config-sandbox.yaml @@ -223,3 +145,6 @@ notifications: - **body**: Configurable email body used in notifications. The complete set of parameters that can be used for email templating are checked in [here](https://github.com/flyteorg/flyteadmin/blob/a84223dab00dfa52d8ba1ed2d057e77b6c6ab6a7/pkg/async/notifications/email.go#L18,L30). + + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/productionizing/ diff --git a/docs/user_guide/productionizing/reference_launch_plans.md b/docs/user_guide/productionizing/reference_launch_plans.md index 8ea476ef3e..b089e1838e 100644 --- a/docs/user_guide/productionizing/reference_launch_plans.md +++ b/docs/user_guide/productionizing/reference_launch_plans.md @@ -1,28 +1,10 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Reference launch plans ```{eval-rst} .. tags:: Intermediate ``` -A {py:func}`flytekit.reference_launch_plan` references previously defined, serialized, and registered Flyte launch plans. -You can reference launch plans from other projects and create workflows that use launch plans declared by others. +A {py:func}`flytekit.reference_launch_plan` references previously defined, serialized, and registered Flyte launch plans. You can reference launch plans from other projects and create workflows that use launch plans declared by others. The following example illustrates how to use reference launch plans. @@ -30,38 +12,13 @@ The following example illustrates how to use reference launch plans. Reference launch plans cannot be run locally. You must mock them out. ::: -```{code-cell} -:lines_to_next_cell: 2 - -from typing import List - -from flytekit import reference_launch_plan, workflow -from flytekit.types.file import FlyteFile - - -@reference_launch_plan( - project="flytesnacks", - domain="development", - name="data_types_and_io.file.normalize_csv_file", - version="{{ registration.version }}", -) -def normalize_csv_file( - csv_url: FlyteFile, - column_names: List[str], - columns_to_normalize: List[str], - output_location: str, -) -> FlyteFile: - ... - +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -@workflow -def reference_lp_wf() -> FlyteFile: - return normalize_csv_file( - csv_url="https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv", - column_names=["Name", "Sex", "Age", "Heights (in)", "Weight (lbs)"], - columns_to_normalize=["Age"], - output_location="", - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/reference_launch_plan.py +:caption: productionizing/reference_launch_plan.py +:lines: 1-36 ``` It's important to verify that the workflow interface corresponds to that of the referenced workflow. @@ -86,3 +43,5 @@ def normalize_csv_file(...): ... ``` ::: + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/productionizing/ diff --git a/docs/user_guide/productionizing/reference_tasks.md b/docs/user_guide/productionizing/reference_tasks.md index 057986d74a..83dfe9e2dc 100644 --- a/docs/user_guide/productionizing/reference_tasks.md +++ b/docs/user_guide/productionizing/reference_tasks.md @@ -1,31 +1,10 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - # Reference tasks ```{eval-rst} .. tags:: Intermediate ``` -A {py:func}`flytekit.reference_task` references the Flyte tasks that have already been defined, serialized, and registered. -You can reference tasks from other projects and create workflows that use tasks declared by others. -These tasks can be in their own containers, python runtimes, flytekit versions, and even different languages. +A {py:func}`flytekit.reference_task` references the Flyte tasks that have already been defined, serialized, and registered. You can reference tasks from other projects and create workflows that use tasks declared by others. These tasks can be in their own containers, python runtimes, flytekit versions, and even different languages. The following example illustrates how to use reference tasks. @@ -33,38 +12,13 @@ The following example illustrates how to use reference tasks. Reference tasks cannot be run locally. You must mock them out. ::: -```{code-cell} -:lines_to_next_cell: 2 - -from typing import List - -from flytekit import reference_task, workflow -from flytekit.types.file import FlyteFile - - -@reference_task( - project="flytesnacks", - domain="development", - name="data_types_and_io.file.normalize_columns", - version="{{ registration.version }}", -) -def normalize_columns( - csv_url: FlyteFile, - column_names: List[str], - columns_to_normalize: List[str], - output_location: str, -) -> FlyteFile: - ... - +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` -@workflow -def wf() -> FlyteFile: - return normalize_columns( - csv_url="https://people.sc.fsu.edu/~jburkardt/data/csv/biostats.csv", - column_names=["Name", "Sex", "Age", "Heights (in)", "Weight (lbs)"], - columns_to_normalize=["Age"], - output_location="", - ) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/reference_task.py +:caption: productionizing/reference_task.py +:lines: 1-36 ``` :::{note} @@ -87,3 +41,5 @@ A typical reference task would resemble the following: ... ``` ::: + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/productionizing/ diff --git a/docs/user_guide/productionizing/schedules.md b/docs/user_guide/productionizing/schedules.md index 0deda30d04..f361bb7a57 100644 --- a/docs/user_guide/productionizing/schedules.md +++ b/docs/user_guide/productionizing/schedules.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (scheduling_launch_plan)= # Schedules @@ -36,29 +19,17 @@ Check out a demo of how the Native Scheduler works: Native scheduler doesn't support [AWS syntax](http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html#CronExpressions). ::: -+++ {"lines_to_next_cell": 0} +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` Consider the following example workflow: -```{code-cell} -from datetime import datetime - -from flytekit import task, workflow - - -@task -def format_date(run_date: datetime) -> str: - return run_date.strftime("%Y-%m-%d %H:%M") - - -@workflow -def date_formatter_wf(kickoff_time: datetime): - formatted_kickoff_time = format_date(run_date=kickoff_time) - print(formatted_kickoff_time) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/lp_schedules.py +:caption: productionizing/lp_schedules.py +:lines: 1-14 ``` -+++ {"lines_to_next_cell": 0} - The `date_formatter_wf` workflow can be scheduled using either the `CronSchedule` or the `FixedRate` object. (cron-schedules)= @@ -68,60 +39,24 @@ The `date_formatter_wf` workflow can be scheduled using either the `CronSchedule [Cron](https://en.wikipedia.org/wiki/Cron) expression strings use this {ref}`syntax `. An incorrect cron schedule expression would lead to failure in triggering the schedule. -```{code-cell} -from flytekit import CronSchedule, LaunchPlan # noqa: E402 - -# creates a launch plan that runs every minute. -cron_lp = LaunchPlan.get_or_create( - name="my_cron_scheduled_lp", - workflow=date_formatter_wf, - schedule=CronSchedule( - # Note that the ``kickoff_time_input_arg`` matches the workflow input we defined above: kickoff_time - # But in case you are using the AWS scheme of schedules and not using the native scheduler then switch over the schedule parameter with cron_expression - schedule="*/1 * * * *", # Following schedule runs every min - kickoff_time_input_arg="kickoff_time", - ), -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/lp_schedules.py +:caption: productionizing/lp_schedules.py +:lines: 17-29 ``` The `kickoff_time_input_arg` corresponds to the workflow input `kickoff_time`. Specifying this argument means that Flyte will pass in the kick-off time of the cron schedule into the `kickoff_time` argument of the `date_formatter_wf` workflow. -+++ - ## Fixed rate intervals -If you prefer to use an interval rather than a cron scheduler to schedule your workflows, you can use the fixed-rate scheduler. -A fixed-rate scheduler runs at the specified interval. +If you prefer to use an interval rather than a cron scheduler to schedule your workflows, you can use the fixed-rate scheduler. A fixed-rate scheduler runs at the specified interval. Here's an example: -```{code-cell} -from datetime import timedelta # noqa: E402 - -from flytekit import FixedRate, LaunchPlan # noqa: E402 - - -@task -def be_positive(name: str) -> str: - return f"You're awesome, {name}" - - -@workflow -def positive_wf(name: str): - reminder = be_positive(name=name) - print(f"{reminder}") - - -fixed_rate_lp = LaunchPlan.get_or_create( - name="my_fixed_rate_lp", - workflow=positive_wf, - # Note that the workflow above doesn't accept any kickoff time arguments. - # We just omit the ``kickoff_time_input_arg`` from the FixedRate schedule invocation - schedule=FixedRate(duration=timedelta(minutes=10)), - fixed_inputs={"name": "you"}, -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/lp_schedules.py +:caption: productionizing/lp_schedules.py +:lines: 34-57 ``` This fixed-rate scheduler runs every ten minutes. Similar to a cron scheduler, a fixed-rate scheduler also accepts `kickoff_time_input_arg` (which is omitted in this example). @@ -136,16 +71,12 @@ After initializing your launch plan, [activate the specific version of the launc flytectl update launchplan -p flyteexamples -d development {{ name_of_lp }} --version --activate ``` -+++ - Verify if your launch plan was activated: ```bash flytectl get launchplan -p flytesnacks -d development ``` -+++ - ## Deactivating a schedule You can [archive/deactivate the launch plan](https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_update_launchplan.html) to deschedule any scheduled job associated with it. @@ -154,8 +85,6 @@ You can [archive/deactivate the launch plan](https://docs.flyte.org/projects/fly flytectl update launchplan -p flyteexamples -d development {{ name_of_lp }} --version --archive ``` -+++ - ## Platform configuration changes for AWS scheduler The Scheduling feature can be run using the Flyte native scheduler which comes with Flyte. If you intend to use the AWS scheduler then it requires additional infrastructure to run, so these will have to be created and configured. The following sections are only required if you use the AWS scheme for the scheduler. You can still run the Flyte native scheduler on AWS. @@ -186,8 +115,6 @@ scheduler: scheduleNamePrefix: "flyte" ``` -+++ - - **scheme**: in this case because AWS is the only cloud back-end supported for scheduling workflows, only `"aws"` is a valid value. By default, the no-op scheduler is used. - **region**: this specifies which region initialized AWS clients should use when creating CloudWatch rules. - **scheduleRole** This is the IAM role ARN with permissions set to `Allow` @@ -219,9 +146,9 @@ scheduler: accountId: "{{ YOUR ACCOUNT ID }}" ``` -+++ - - **scheme**: in this case because AWS is the only cloud back-end supported for executing scheduled workflows, only `"aws"` is a valid value. By default, the no-op executor is used and in case of sandbox we use `"local"` scheme which uses the Flyte native scheduler. - **region**: this specifies which region AWS clients should use when creating an SQS subscriber client. - **scheduleQueueName**: this is the name of the SQS Queue you've allocated to scheduling workflows. - **accountId**: Your AWS [account id](https://docs.aws.amazon.com/IAM/latest/UserGuide/console_account-alias.html#FindingYourAWSId). + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/productionizing/ diff --git a/docs/user_guide/productionizing/secrets.md b/docs/user_guide/productionizing/secrets.md index d631594cc7..9957f5cdaf 100644 --- a/docs/user_guide/productionizing/secrets.md +++ b/docs/user_guide/productionizing/secrets.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - (secrets)= # Secrets @@ -32,8 +15,6 @@ different types of secrets, but for users writing Python tasks, you can only acc secure secrets either as environment variables or as a file injected into the running container. -+++ - ## Creating secrets with a secrets manager :::{admonition} Prerequisites @@ -71,27 +52,19 @@ define secrets using a [configuration file](https://kubernetes.io/docs/tasks/con or tools like [Kustomize](https://kubernetes.io/docs/tasks/configmap-secret/managing-secret-using-kustomize/). ::: -+++ - ## Using secrets in tasks +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` + Once you've defined a secret on the Flyte backend, `flytekit` exposes a class called {py:class}`~flytekit.Secret`s, which allows you to request a secret -from the configured secret manager. +from the configured secret manager: -```{code-cell} -import os -from typing import Tuple - -import flytekit -from flytekit import Secret, task, workflow -from flytekit.testing import SecretsManager - -secret = Secret( - group="", - key="", - mount_requirement=Secret.MountType.ENV_VAR, -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/use_secrets.py +:caption: productionizing/use_secrets.py +:lines: 1-6, 49-53 ``` Secrets consists of `group`, `key`, and `mounting_requirement` arguments, @@ -103,9 +76,9 @@ In the code below we specify two variables, `SECRET_GROUP` and `SECRET_NAME`, which maps onto the `user-info` secret that we created with `kubectl` above, with a key called `user_secret`. -```{code-cell} -SECRET_GROUP = "user-info" -SECRET_NAME = "user_secret" +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/use_secrets.py +:caption: productionizing/use_secrets.py +:lines: 66-67 ``` Now we declare the secret in the `secret_requests` argument of the @@ -119,13 +92,9 @@ invoking the {py:func}`flytekit.current_context` function, as shown below. At runtime, flytekit looks inside the task pod for an environment variable or a mounted file with a predefined name/path and loads the value. -```{code-cell} -@task(secret_requests=[Secret(group=SECRET_GROUP, key=SECRET_NAME)]) -def secret_task() -> str: - context = flytekit.current_context() - secret_val = context.secrets.get(SECRET_GROUP, SECRET_NAME) - print(secret_val) - return secret_val +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/use_secrets.py +:caption: productionizing/use_secrets.py +:pyobject: secret_task ``` :::{warning} @@ -156,30 +125,18 @@ the same secret: ``` In this case, the secret group will be `user-info`, with three available -secret keys: `user_secret`, `username`, and `password`. +secret keys: `user_secret`, `username`, and `password`: -```{code-cell} -USERNAME_SECRET = "username" -PASSWORD_SECRET = "password" +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/use_secrets.py +:caption: productionizing/use_secrets.py +:lines: 107-108 ``` -+++ {"lines_to_next_cell": 0} - The Secret structure allows passing two fields, matching the key and the group, as previously described: -```{code-cell} -@task( - secret_requests=[ - Secret(key=USERNAME_SECRET, group=SECRET_GROUP), - Secret(key=PASSWORD_SECRET, group=SECRET_GROUP), - ] -) -def user_info_task() -> Tuple[str, str]: - context = flytekit.current_context() - secret_username = context.secrets.get(SECRET_GROUP, USERNAME_SECRET) - secret_pwd = context.secrets.get(SECRET_GROUP, PASSWORD_SECRET) - print(f"{secret_username}={secret_pwd}") - return secret_username, secret_pwd +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/use_secrets.py +:caption: productionizing/use_secrets.py +:lines: 113-124 ``` :::{warning} @@ -198,40 +155,16 @@ In these scenarios you can specify the `mount_requirement=Secret.MountType.FILE` In the following example we force the mounting to be an environment variable: -```{code-cell} -@task( - secret_requests=[ - Secret( - group=SECRET_GROUP, - key=SECRET_NAME, - mount_requirement=Secret.MountType.ENV_VAR, - ) - ] -) -def secret_file_task() -> Tuple[str, str]: - secret_manager = flytekit.current_context().secrets - - # get the secrets filename - f = secret_manager.get_secrets_file(SECRET_GROUP, SECRET_NAME) - - # get secret value from an environment variable - secret_val = secret_manager.get(SECRET_GROUP, SECRET_NAME) - - # returning the filename and the secret_val - return f, secret_val +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/use_secrets.py +:caption: productionizing/use_secrets.py +:lines: 139-158 ``` -+++ {"lines_to_next_cell": 0} - These tasks can be used in your workflow as usual -```{code-cell} -@workflow -def my_secret_workflow() -> Tuple[str, str, str, str, str]: - x = secret_task() - y, z = user_info_task() - f, s = secret_file_task() - return x, y, z, f, s +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/use_secrets.py +:caption: productionizing/use_secrets.py +:pyobject: my_secret_workflow ``` ### Testing with mock secrets @@ -239,18 +172,9 @@ def my_secret_workflow() -> Tuple[str, str, str, str, str]: The simplest way to test secret accessibility is to export the secret as an environment variable. There are some helper methods available to do so: -```{code-cell} -if __name__ == "__main__": - sec = SecretsManager() - os.environ[sec.get_secrets_env_var(SECRET_GROUP, SECRET_NAME)] = "value" - os.environ[sec.get_secrets_env_var(SECRET_GROUP, USERNAME_SECRET)] = "username_value" - os.environ[sec.get_secrets_env_var(SECRET_GROUP, PASSWORD_SECRET)] = "password_value" - x, y, z, f, s = my_secret_workflow() - assert x == "value" - assert y == "username_value" - assert z == "password_value" - assert f == sec.get_secrets_file(SECRET_GROUP, SECRET_NAME) - assert s == "value" +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/productionizing/productionizing/use_secrets.py +:caption: productionizing/use_secrets.py +:lines: 172-182 ``` ## Using secrets in task templates @@ -291,8 +215,6 @@ sql_query = SQLAlchemyTask( ) ``` -+++ - :::{note} Here the `secret_connect_args` map to the [SQLAlchemy engine configuration](https://docs.sqlalchemy.org/en/20/core/engines.html) @@ -302,7 +224,6 @@ argument names for the username and password. You can then use the `sql_query` task inside a workflow to grab data and perform downstream transformations on it. -+++ ## How secrets injection works @@ -386,18 +307,15 @@ When using the AWS secret management plugin, secrets need to be specified by nam ### Vault secrets manager -When using the Vault secret manager, make sure you have Vault Agent deployed on your cluster as described in this -[step-by-step tutorial](https://learn.hashicorp.com/tutorials/vault/kubernetes-sidecar). +When using the Vault secret manager, make sure you have Vault Agent deployed on your cluster as described in this [step-by-step tutorial](https://learn.hashicorp.com/tutorials/vault/kubernetes-sidecar). Vault secrets can only be mounted as files and will become available under `"/etc/flyte/secrets/SECRET_GROUP/SECRET_NAME"`. Vault comes with various secrets engines. Currently Flyte supports working with both version 1 and 2 of the `Key Vault engine ` as well as the `databases secrets engine `. You can use use the `group_version` parameter to specify which secret backend engine to use. Available choices are: "kv1", "kv2", "db": -+++ {"lines_to_next_cell": 0} - -How to request secrets with the Vault secret manager +#### Requesting secrets with the Vault secret manager -```{code-cell} +```python secret = Secret( group="", key="", @@ -434,12 +352,11 @@ If Flyte administrator wants to set up annotations for the entire system, they c ### Vertical scaling -To scale the Webhook to be able to process the number/rate of pods you need, you may need to configure a vertical [pod -autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler). +To scale the Webhook to be able to process the number/rate of pods you need, you may need to configure a vertical [pod autoscaler](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler). ### Horizontal scaling -The Webhook does not make any external API Requests in response to Pod mutation requests. It should be able to handle traffic -quickly. For horizontal scaling, adding additional replicas for the Pod in the -deployment should be sufficient. A single `MutatingWebhookConfiguration` object will be used, the same TLS certificate -will be shared across the pods and the Service created will automatically load balance traffic across the available pods. +The Webhook does not make any external API Requests in response to Pod mutation requests. It should be able to handle traffic quickly. For horizontal scaling, adding additional replicas for the Pod in the +deployment should be sufficient. A single `MutatingWebhookConfiguration` object will be used, the same TLS certificate will be shared across the pods and the Service created will automatically load balance traffic across the available pods. + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/productionizing/ diff --git a/docs/user_guide/productionizing/spot_instances.md b/docs/user_guide/productionizing/spot_instances.md index 85024f6996..864cfbbd3c 100644 --- a/docs/user_guide/productionizing/spot_instances.md +++ b/docs/user_guide/productionizing/spot_instances.md @@ -1,31 +1,10 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - -+++ {"lines_to_next_cell": 0} - -# Spot instances + # Spot instances ```{eval-rst} .. tags:: AWS, GCP, Intermediate ``` -+++ - ## What are spot instances? Spot instances are unused EC2 capacity in AWS. [Spot instances](https://aws.amazon.com/ec2/spot/?cards.sort-by=item.additionalFields.startDateTime&cards.sort-order=asc) can result in up to 90% savings on on-demand prices. The caveat is that these instances can be preempted at any point and no longer be available for use. This can happen due to: @@ -52,7 +31,6 @@ This can be done by setting taints and tolerations using the [config](https://gi When your spot/preemptible instance is terminated, ASG attempts to launch a replacement instance to maintain the desired capacity for the group. ::: -+++ ## What are interruptible tasks? @@ -68,8 +46,6 @@ def add_one_and_print(value_to_print: int) -> int: return value_to_print + 1 ``` -+++ - By setting this value, Flyte will schedule your task on an auto-scaling group (ASG) with only spot instances. :::{note} @@ -87,6 +63,4 @@ If your task does NOT exhibit the following properties, you can set `interruptib In a nutshell, you should use spot/preemptible instances when you want to reduce the total cost of running jobs at the expense of potential delays in execution due to restarts. -+++ - % TODO: Write "How to Recover From Interruptions?" section diff --git a/docs/user_guide/productionizing/workflow_labels_and_annotations.md b/docs/user_guide/productionizing/workflow_labels_and_annotations.md index 5b9fe6d4c6..a290b2afa5 100644 --- a/docs/user_guide/productionizing/workflow_labels_and_annotations.md +++ b/docs/user_guide/productionizing/workflow_labels_and_annotations.md @@ -1,20 +1,3 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Workflow labels and annotations ```{eval-rst} diff --git a/docs/user_guide/testing/mocking_tasks.md b/docs/user_guide/testing/mocking_tasks.md index 07b4f0239e..b22db0eb51 100644 --- a/docs/user_guide/testing/mocking_tasks.md +++ b/docs/user_guide/testing/mocking_tasks.md @@ -1,101 +1,49 @@ ---- -jupytext: - cell_metadata_filter: all - formats: md:myst - main_language: python - notebook_metadata_filter: all - text_representation: - extension: .md - format_name: myst - format_version: 0.13 - jupytext_version: 1.16.1 -kernelspec: - display_name: Python 3 - language: python - name: python3 ---- - # Mocking tasks -A lot of the tasks that you write you can run locally, but some of them you will not be able to, usually because they -are tasks that depend on a third-party only available on the backend. Hive tasks are a common example, as most users -will not have access to the service that executes Hive queries from their development environment. However, it's still -useful to be able to locally run a workflow that calls such a task. In these instances, flytekit provides a couple -of utilities to help navigate this. - -```{code-cell} -import datetime +A lot of the tasks that you write you can run locally, but some of them you will not be able to, usually because they are tasks that depend on a third-party only available on the backend. Hive tasks are a common example, as most users will not have access to the service that executes Hive queries from their development environment. However, it's still useful to be able to locally run a workflow that calls such a task. In these instances, flytekit provides a couple of utilities to help navigate this. -import pandas -from flytekit import SQLTask, TaskMetadata, kwtypes, task, workflow -from flytekit.testing import patch, task_mock -from flytekit.types.schema import FlyteSchema +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -+++ {"lines_to_next_cell": 0} +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/testing/testing/mocking.py +:caption: testing/mocking.py +:lines: 1-6 +``` -This is a generic SQL task (and is by default not hooked up to any datastore nor handled by any plugin), and must -be mocked. +This is a generic SQL task (and is by default not hooked up to any datastore nor handled by any plugin), and must be mocked: -```{code-cell} -sql = SQLTask( - "my-query", - query_template="SELECT * FROM hive.city.fact_airport_sessions WHERE ds = '{{ .Inputs.ds }}' LIMIT 10", - inputs=kwtypes(ds=datetime.datetime), - outputs=kwtypes(results=FlyteSchema), - metadata=TaskMetadata(retries=2), -) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/testing/testing/mocking.py +:caption: testing/mocking.py +:lines: 10-16 ``` -+++ {"lines_to_next_cell": 0} +This is a task that can run locally: -This is a task that can run locally - -```{code-cell} -@task -def t1() -> datetime.datetime: - return datetime.datetime.now() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/testing/testing/mocking.py +:caption: testing/mocking.py +:pyobject: t1 ``` -+++ {"lines_to_next_cell": 0} - Declare a workflow that chains these two tasks together. -```{code-cell} -@workflow -def my_wf() -> FlyteSchema: - dt = t1() - return sql(ds=dt) +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/testing/testing/mocking.py +:caption: testing/mocking.py +:pyobject: my_wf ``` -+++ {"lines_to_next_cell": 0} - -Without a mock, calling the workflow would typically raise an exception, but with the `task_mock` construct, which -returns a `MagicMock` object, we can override the return value. +Without a mock, calling the workflow would typically raise an exception, but with the `task_mock` construct, which returns a `MagicMock` object, we can override the return value. -```{code-cell} -def main_1(): - with task_mock(sql) as mock: - mock.return_value = pandas.DataFrame(data={"x": [1, 2], "y": ["3", "4"]}) - assert (my_wf().open().all() == pandas.DataFrame(data={"x": [1, 2], "y": ["3", "4"]})).all().all() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/testing/testing/mocking.py +:caption: testing/mocking.py +:pyobject: main_1 ``` -+++ {"lines_to_next_cell": 0} +There is another utility as well called `patch` which offers the same functionality, but in the traditional Python patching style, where the first argument is the `MagicMock` object. -There is another utility as well called `patch` which offers the same functionality, but in the traditional Python -patching style, where the first argument is the `MagicMock` object. - -```{code-cell} -def main_2(): - @patch(sql) - def test_user_demo_test(mock_sql): - mock_sql.return_value = pandas.DataFrame(data={"x": [1, 2], "y": ["3", "4"]}) - assert (my_wf().open().all() == pandas.DataFrame(data={"x": [1, 2], "y": ["3", "4"]})).all().all() - - test_user_demo_test() - - -if __name__ == "__main__": - main_1() - main_2() +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/master/examples/testing/testing/mocking.py +:caption: testing/mocking.py +:lines: 45-56 ``` + +[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/testing/