Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Ingo Müller <[email protected]>
  • Loading branch information
westonpace and ingomueller-net authored Aug 1, 2024
1 parent 56ba6b2 commit a674434
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 9 deletions.
7 changes: 2 additions & 5 deletions site/docs/relations/basics.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,12 @@
# Basics

Substrait is designed to allow a user to describe arbitrarily complex data transformations. These transformations are composed of one
or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them into zero or more output transformations. Substrait defines a core set of transformations, but users are also able to extend the operations with their own specialized operations.
Substrait is designed to allow a user to describe arbitrarily complex data transformations. These transformations are composed of one or more relational operations. Relational operations are well-defined transformation operations that work by taking zero or more input datasets and transforming them into zero or more output transformations. Substrait defines a core set of transformations, but users are also able to extend the operations with their own specialized operations.

## Plans

A plan is a tree of relations. The root of the tree is the final output of the plan. Each node in the tree is a relational operation. The children of a node are the inputs to the operation. The leaves of the tree are the input datasets to the plan.

Plans can be composed together using reference relations. This allows for the construction of common plans that can be reused in multiple
places. If a plan has no cycles (there is only one plan or each reference relation only references later plans) then the plan will form a
DAG (Directed Acyclic Graph).
Plans can be composed together using reference relations. This allows for the construction of common plans that can be reused in multiple places. If a plan has no cycles (there is only one plan or each reference relation only references later plans) then the plan will form a DAG (Directed Acyclic Graph).

## Relational Operators

Expand Down
7 changes: 3 additions & 4 deletions site/docs/serialization/basics.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Basics

Substrait is designed to be serialized into various different formats. Currently we support a binary serialization for
transmission of plans between programs (e.g. IPC or network communication) and a text serialization for debugging and
human readability. Other formats may be added in the future.
transmission of plans between programs (e.g. IPC or network communication) and a text serialization for debugging and human readability. Other formats may be added in the future.

These formats serialize a collection of plans. Substrait does not define how a collection of plans is to be interpreted.
For example, the following scenarios are all valid uses of a collection of plans:
Expand All @@ -11,7 +10,7 @@ For example, the following scenarios are all valid uses of a collection of plans
top-level node of the root plan defines the output of the query. Non-root plans may be included as common subplans
which are referenced from the root plan.
- A transpiler may convert plans from one dialect to another. It could take, as input, a single root plan. Then
it could output a serialized binary containing mulitple root plans. Each root plan is a representation of the
it could output a serialized binary containing multiple root plans. Each root plan is a representation of the
input plan in a different dialect.
- A distributed scheduler might expect 1+ root plans. Each root plan describes a different stage of computation.

Expand All @@ -21,6 +20,6 @@ Libraries should make sure to thoroughly describe the way plan collections will

We often refer to query plans as a graph of nodes (typically a DAG unless the query is recursive). However, we
encode this graph as a collection of trees with a single root tree that references other trees (which may also
transtively reference other trees). Plan serializations all have some way to indicate which plan(s) are "root"
transitively reference other trees). Plan serializations all have some way to indicate which plan(s) are "root"
plans. Any plan that is not a root plan and is not referenced (directly or transitively) by some root plan
can safely be ignored.

0 comments on commit a674434

Please sign in to comment.