Skip to content

Commit

Permalink
update pkg, add proposal doc
Browse files Browse the repository at this point in the history
Signed-off-by: tuannh982 <[email protected]>
  • Loading branch information
tuannh982 committed Aug 22, 2023
1 parent 987ec9a commit d8d7365
Show file tree
Hide file tree
Showing 20 changed files with 492 additions and 66 deletions.
7 changes: 0 additions & 7 deletions cascades/memo/implementation_rule.go

This file was deleted.

11 changes: 0 additions & 11 deletions cascades/memo/transformation_rule.go

This file was deleted.

13 changes: 0 additions & 13 deletions cascades/physicalplan/plan.go

This file was deleted.

184 changes: 184 additions & 0 deletions planner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
Proposal: separate between AST & execution, refactor query planner
===

## Background

Quotes from https://github.com/thanos-io/promql-engine/issues/5

```
We currently translate the AST directly to a physical plan. Having an in-between logical plan will allow us to run optimizers before the query is executed.
The logical plan would have a one to one mapping with the AST and will contain the parameters of each AST node.
Query optimizers can then transform the logical plan based on predefined heuristics. One example would be optimizing series selects in binary operations so that we do as few network calls as possible.
Finally, we would build the physical plan from the optimized logical plan instead doing it from the AST directly.
```

Here's our current query lifecycle

```mermaid
flowchart TD
Query["query string"]
AST["parser.Expr"]
Plan1["logicalplan.Plan"]
Plan2["logicalplan.Plan"]
Operator["model.VectorOperator"]
Query -->|parsring| AST
AST -->|logicalplan.New| Plan1
Plan1 -->|optimize| Plan2
Plan2 -->|execution.New| Operator
```

The `logicalplan.Plan` is just a wrapper of `parser.Expr`, and the conversion from `logicalplan.Plan` to `model.VectorOperator` is actually direct conversion from `parser.Expr` to `model.VectorOperator`.

Another point, is our optimizers are heuristic optimizers, hence it's could not optimize for some complex queries, and could not use the data statistic to perform the optimization

## Proposal

We will implement the 2-stages planner according to Volcano planner

```mermaid
flowchart TD
Query["query string"]
AST["parser.Expr"]
LogicalPlan1["LogicalPlan"]
Operator["model.VectorOperator"]
subgraph plan["Planner"]
subgraph explore["Exploration Phase"]
LogicalPlan2["LogicalPlan"]
end
subgraph implement["Implementation Phase"]
PhysicalPlan
end
end
Query -->|parsring| AST
AST -->|ast2plan| LogicalPlan1
LogicalPlan1 --> LogicalPlan2
LogicalPlan2 --> PhysicalPlan
PhysicalPlan --> Operator
```

### Exploration phase

The exploration phase, used to explore all possible transformation of the original logical plan

```mermaid
flowchart TD
LogicalPlan1["LogicalPlan"]
LogicalPlan2["LogicalPlan"]
LogicalPlan1 -->|fire transformation rules| LogicalPlan2
```

**Define**:
- `Group`: The `Equivalent Group`, or the `Equivalent Set`, is a group of multiple equivalent logical plans
- `GroupExpr`: representing a logical plan node (basically it's just wrap around the logical plan, with some additional information)

```go
type Group struct {
// logical
Equivalents map[ID]*GroupExpr // The equivalent expressions.
ExplorationMark
}

type GroupExpr struct {
Expr logicalplan.LogicalPlan // The logical plan bind to the expression.
Children []*Group // The children group of the expression, noted that it must be in the same order with LogicalPlan.Children().
AppliedTransformations utils.Set[TransformationRule]
ExplorationMark
}
```

Here is the interface of transformation rule

```go
type TransformationRule interface {
Match(expr *GroupExpr) bool // Check if the transformation can be applied to the expression
Transform(expr *GroupExpr) *GroupExpr // Transform the expression
}
```

```go
for _, rule := range rules {
if rule.Match(equivalentExpr) {
if !equivalentExpr.AppliedTransformations.Contains(rule) {
transformedExpr := rule.Transform(o.memo, equivalentExpr)
// add new equivalent expr to group
group.Equivalents[transformedExpr.ID] = transformedExpr
equivalentExpr.AppliedTransformations.Add(rule)
// reset group exploration state
transformedExpr.SetExplore(round, false)
group.SetExplore(round, false)
}
}
}
```

### Implementation phase

After exploration phase, we have the expanded logical plan (including the original plan and the transformed plans)

Then we will find the implementation which has the lowest implementation cost


```mermaid
flowchart TD
LogicalPlan2["LogicalPlan"]
PhysicalPlan
Operator["model.VectorOperator"]
LogicalPlan2 -->|find best implementation| PhysicalPlan
PhysicalPlan -->|get the actual implementation| Operator
```

The physical plan represent the actual implementation of a logical plan (the `Children` property in `PhysicalPlan` is used for cost calculation)

```go
type PhysicalPlan interface {
SetChildren(children []PhysicalPlan) // set child implementations, also update the operator and cost.
Children() []PhysicalPlan // Return the saved child implementations from the last CalculateCost call.
Operator() model.VectorOperator // Return the saved physical operator set from the last CalculateCost.
Cost() cost.Cost // Return the saved cost from the last CalculateCost call.
}
```

For each logical plan, we will have several implementations

```go
type ImplementationRule interface {
ListImplementations(expr *GroupExpr) []physicalplan.PhysicalPlan // List all implementation for the expression
}
```

And we will find the best implementation, via simple dynamic programming

```go
var possibleImpls []physicalplan.PhysicalPlan
for _, rule := range rules {
possibleImpls = append(possibleImpls, rule.ListImplementations(expr)...)
}
```

```go
var currentBest *memo.GroupImplementation
for _, impl := range possibleImpls {
impl.SetChildren(childImpls)
calculatedCost := impl.Cost()
if groupImpl != nil {
if costModel.IsBetter(currentBest.Cost, calculatedCost) {
currentBest.SelectedExpr = expr
currentBest.Implementation = impl
currentBest.Cost = calculatedCost
}
} else {
currentBest = &memo.GroupImplementation{
SelectedExpr: expr,
Cost: calculatedCost,
Implementation: impl,
}
}
}
```
5 changes: 4 additions & 1 deletion cascades/cost/cost.go → planner/cost/cost.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
package cost

type Cost interface{}
type Cost struct {
CpuCost float64
MemoryCost float64
}

type CostModel interface {
IsBetter(currentCost Cost, newCost Cost) bool
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
10 changes: 5 additions & 5 deletions cascades/memo/group.go → planner/memo/group.go
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
package memo

import (
"github.com/thanos-io/promql-engine/cascades/cost"
"github.com/thanos-io/promql-engine/cascades/logicalplan"
"github.com/thanos-io/promql-engine/cascades/physicalplan"
"github.com/thanos-io/promql-engine/cascades/utils"
"github.com/thanos-io/promql-engine/planner/cost"
"github.com/thanos-io/promql-engine/planner/logicalplan"
"github.com/thanos-io/promql-engine/planner/physicalplan"
"github.com/thanos-io/promql-engine/planner/utils"
"sync/atomic"
)

Expand Down Expand Up @@ -38,7 +38,7 @@ type Group struct {
type GroupImplementation struct {
SelectedExpr *GroupExpr
Cost cost.Cost
Implementation physicalplan.Implementation
Implementation physicalplan.PhysicalPlan
}

type GroupExpr struct {
Expand Down
7 changes: 7 additions & 0 deletions planner/memo/implementation_rule.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
package memo

import "github.com/thanos-io/promql-engine/planner/physicalplan"

type ImplementationRule interface {
ListImplementations(expr *GroupExpr) []physicalplan.PhysicalPlan // List all implementation for the expression
}
11 changes: 6 additions & 5 deletions cascades/memo/memo.go → planner/memo/memo.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
package memo

import (
"github.com/thanos-io/promql-engine/cascades/logicalplan"
"github.com/thanos-io/promql-engine/cascades/utils"
"github.com/thanos-io/promql-engine/planner/logicalplan"
"github.com/thanos-io/promql-engine/planner/utils"
)

type Memo interface {
Expand Down Expand Up @@ -40,9 +40,10 @@ func (m *memo) GetOrCreateGroupExpr(node logicalplan.LogicalPlan) *GroupExpr {
} else {
id := m.groupExprIDGenerator.Generate()
expr := &GroupExpr{
ID: id,
Expr: node,
Children: childGroups,
ID: id,
Expr: node,
Children: childGroups,
AppliedTransformations: make(utils.Set[TransformationRule]),
}
m.GroupExprs[node] = expr
return expr
Expand Down
2 changes: 1 addition & 1 deletion cascades/memo/memo_test.go → planner/memo/memo_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ package memo

import (
"github.com/stretchr/testify/assert"
"github.com/thanos-io/promql-engine/cascades/logicalplan"
"github.com/thanos-io/promql-engine/parser"
"github.com/thanos-io/promql-engine/planner/logicalplan"
"golang.org/x/exp/maps"
"reflect"
"testing"
Expand Down
11 changes: 11 additions & 0 deletions planner/memo/transformation_rule.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
package memo

import (
"github.com/thanos-io/promql-engine/planner/utils"
)

type TransformationRule interface {
utils.Hashable
Match(expr *GroupExpr) bool // Check if the transformation can be applied to the expression
Transform(memo Memo, expr *GroupExpr) *GroupExpr // Transform the expression
}
13 changes: 13 additions & 0 deletions planner/physicalplan/plan.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
package physicalplan

import (
"github.com/thanos-io/promql-engine/execution/model"
"github.com/thanos-io/promql-engine/planner/cost"
)

type PhysicalPlan interface {
SetChildren(children []PhysicalPlan) // set child implementations, also update the operator and cost.
Children() []PhysicalPlan // Return the saved child implementations
Operator() model.VectorOperator // Return the saved physical operator
Cost() cost.Cost // Return the saved cost
}
Loading

0 comments on commit d8d7365

Please sign in to comment.