update pkg, add proposal doc

Signed-off-by: tuannh982 <[email protected]>
thanos-io · Aug 22, 2023 · d8d7365 · d8d7365
1 parent 987ec9a
commit d8d7365
Show file tree

Hide file tree

Showing 20 changed files with 492 additions and 66 deletions.
diff --git a/cascades/memo/implementation_rule.go b/cascades/memo/implementation_rule.go
diff --git a/cascades/memo/transformation_rule.go b/cascades/memo/transformation_rule.go
diff --git a/cascades/physicalplan/plan.go b/cascades/physicalplan/plan.go
diff --git a/planner/README.md b/planner/README.md
@@ -0,0 +1,184 @@
+Proposal: separate between AST & execution, refactor query planner
+===
+
+## Background
+
+Quotes from https://github.com/thanos-io/promql-engine/issues/5
+
+```
+We currently translate the AST directly to a physical plan. Having an in-between logical plan will allow us to run optimizers before the query is executed.
+
+The logical plan would have a one to one mapping with the AST and will contain the parameters of each AST node.
+Query optimizers can then transform the logical plan based on predefined heuristics. One example would be optimizing series selects in binary operations so that we do as few network calls as possible.
+
+Finally, we would build the physical plan from the optimized logical plan instead doing it from the AST directly.
+```
+
+Here's our current query lifecycle
+
+```mermaid
+flowchart TD
+Query["query string"] 
+AST["parser.Expr"]
+Plan1["logicalplan.Plan"]
+Plan2["logicalplan.Plan"]
+Operator["model.VectorOperator"]
+
+Query -->|parsring| AST
+AST -->|logicalplan.New| Plan1
+Plan1 -->|optimize| Plan2
+Plan2 -->|execution.New| Operator
+```
+
+The `logicalplan.Plan` is just a wrapper of `parser.Expr`, and the conversion from `logicalplan.Plan` to `model.VectorOperator` is actually direct conversion from `parser.Expr` to `model.VectorOperator`.
+
+Another point, is our optimizers are heuristic optimizers, hence it's could not optimize for some complex queries, and could not use the data statistic to perform the optimization
+
+## Proposal
+
+We will implement the 2-stages planner according to Volcano planner
+
+```mermaid
+flowchart TD
+Query["query string"] 
+AST["parser.Expr"]
+LogicalPlan1["LogicalPlan"]
+Operator["model.VectorOperator"]
+
+subgraph plan["Planner"]
+    subgraph explore["Exploration Phase"]
+        LogicalPlan2["LogicalPlan"]
+    end
+    subgraph implement["Implementation Phase"]
+        PhysicalPlan
+    end
+end
+
+Query -->|parsring| AST
+AST -->|ast2plan| LogicalPlan1
+LogicalPlan1 --> LogicalPlan2
+LogicalPlan2 --> PhysicalPlan
+PhysicalPlan --> Operator
+```
+
+### Exploration phase
+
+The exploration phase, used to explore all possible transformation of the original logical plan
+
+```mermaid
+flowchart TD
+LogicalPlan1["LogicalPlan"]
+LogicalPlan2["LogicalPlan"]
+
+LogicalPlan1 -->|fire transformation rules| LogicalPlan2
+```
+
+**Define**:
+- `Group`: The `Equivalent Group`, or the `Equivalent Set`, is a group of multiple equivalent logical plans
+- `GroupExpr`: representing a logical plan node (basically it's just wrap around the logical plan, with some additional information)
+
+```go
+type Group struct {
+	// logical
+	Equivalents map[ID]*GroupExpr // The equivalent expressions.
+	ExplorationMark
+}
+
+type GroupExpr struct {
+	Expr                   logicalplan.LogicalPlan // The logical plan bind to the expression.
+	Children               []*Group                // The children group of the expression, noted that it must be in the same order with LogicalPlan.Children().
+	AppliedTransformations utils.Set[TransformationRule]
+	ExplorationMark
+}
+```
+
+Here is the interface of transformation rule
+
+```go
+type TransformationRule interface {
+	Match(expr *GroupExpr) bool                      // Check if the transformation can be applied to the expression
+	Transform(expr *GroupExpr) *GroupExpr // Transform the expression
+}
+```
+
+```go
+for _, rule := range rules {
+    if rule.Match(equivalentExpr) {
+        if !equivalentExpr.AppliedTransformations.Contains(rule) {
+            transformedExpr := rule.Transform(o.memo, equivalentExpr)
+            // add new equivalent expr to group
+            group.Equivalents[transformedExpr.ID] = transformedExpr
+            equivalentExpr.AppliedTransformations.Add(rule)
+            // reset group exploration state
+            transformedExpr.SetExplore(round, false)
+            group.SetExplore(round, false)
+        }
+    }
+}
+```
+
+### Implementation phase
+
+After exploration phase, we have the expanded logical plan (including the original plan and the transformed plans)
+
+Then we will find the implementation which has the lowest implementation cost
+
+
+```mermaid
+flowchart TD
+LogicalPlan2["LogicalPlan"]
+PhysicalPlan
+Operator["model.VectorOperator"]
+
+LogicalPlan2 -->|find best implementation| PhysicalPlan
+PhysicalPlan -->|get the actual implementation| Operator
+```
+
+The physical plan represent the actual implementation of a logical plan (the `Children` property in `PhysicalPlan` is used for cost calculation)
+
+```go
+type PhysicalPlan interface {
+	SetChildren(children []PhysicalPlan) // set child implementations, also update the operator and cost.
+	Children() []PhysicalPlan            // Return the saved child implementations from the last CalculateCost call.
+	Operator() model.VectorOperator        // Return the saved physical operator set from the last CalculateCost.
+	Cost() cost.Cost                       // Return the saved cost from the last CalculateCost call.
+}
+```
+
+For each logical plan, we will have several implementations
+
+```go
+type ImplementationRule interface {
+	ListImplementations(expr *GroupExpr) []physicalplan.PhysicalPlan // List all implementation for the expression
+}
+```
+
+And we will find the best implementation, via simple dynamic programming
+
+```go
+var possibleImpls []physicalplan.PhysicalPlan
+for _, rule := range rules {
+    possibleImpls = append(possibleImpls, rule.ListImplementations(expr)...)
+}
+```
+
+```go
+var currentBest *memo.GroupImplementation
+for _, impl := range possibleImpls {
+    impl.SetChildren(childImpls)
+    calculatedCost := impl.Cost()
+    if groupImpl != nil {
+        if costModel.IsBetter(currentBest.Cost, calculatedCost) {
+            currentBest.SelectedExpr = expr
+            currentBest.Implementation = impl
+            currentBest.Cost = calculatedCost
+        }
+    } else {
+        currentBest = &memo.GroupImplementation{
+            SelectedExpr:   expr,
+            Cost:           calculatedCost,
+            Implementation: impl,
+        }
+    }
+}
+```
diff --git a/cascades/cost/cost.go → planner/cost/cost.go b/cascades/cost/cost.go → planner/cost/cost.go
@@ -1,6 +1,9 @@
 package cost
 
-type Cost interface{}
+type Cost struct {
+	CpuCost    float64
+	MemoryCost float64
+}
 
 type CostModel interface {
 	IsBetter(currentCost Cost, newCost Cost) bool

diff --git a/cascades/logicalplan/ast2plan.go → planner/logicalplan/ast2plan.go b/cascades/logicalplan/ast2plan.go → planner/logicalplan/ast2plan.go
diff --git a/cascades/logicalplan/ast2plan_test.go → planner/logicalplan/ast2plan_test.go b/cascades/logicalplan/ast2plan_test.go → planner/logicalplan/ast2plan_test.go
diff --git a/cascades/logicalplan/plan.go → planner/logicalplan/plan.go b/cascades/logicalplan/plan.go → planner/logicalplan/plan.go
diff --git a/cascades/memo/explore.go → planner/memo/explore.go b/cascades/memo/explore.go → planner/memo/explore.go
diff --git a/cascades/memo/explore_test.go → planner/memo/explore_test.go b/cascades/memo/explore_test.go → planner/memo/explore_test.go
diff --git a/cascades/memo/group.go → planner/memo/group.go b/cascades/memo/group.go → planner/memo/group.go
@@ -1,10 +1,10 @@
 package memo
 
 import (
-	"github.com/thanos-io/promql-engine/cascades/cost"
-	"github.com/thanos-io/promql-engine/cascades/logicalplan"
-	"github.com/thanos-io/promql-engine/cascades/physicalplan"
-	"github.com/thanos-io/promql-engine/cascades/utils"
+	"github.com/thanos-io/promql-engine/planner/cost"
+	"github.com/thanos-io/promql-engine/planner/logicalplan"
+	"github.com/thanos-io/promql-engine/planner/physicalplan"
+	"github.com/thanos-io/promql-engine/planner/utils"
 	"sync/atomic"
 )
 
@@ -38,7 +38,7 @@ type Group struct {
 type GroupImplementation struct {
 	SelectedExpr   *GroupExpr
 	Cost           cost.Cost
-	Implementation physicalplan.Implementation
+	Implementation physicalplan.PhysicalPlan
 }
 
 type GroupExpr struct {

diff --git a/planner/memo/implementation_rule.go b/planner/memo/implementation_rule.go
@@ -0,0 +1,7 @@
+package memo
+
+import "github.com/thanos-io/promql-engine/planner/physicalplan"
+
+type ImplementationRule interface {
+	ListImplementations(expr *GroupExpr) []physicalplan.PhysicalPlan // List all implementation for the expression
+}
diff --git a/cascades/memo/memo.go → planner/memo/memo.go b/cascades/memo/memo.go → planner/memo/memo.go
@@ -1,8 +1,8 @@
 package memo
 
 import (
-	"github.com/thanos-io/promql-engine/cascades/logicalplan"
-	"github.com/thanos-io/promql-engine/cascades/utils"
+	"github.com/thanos-io/promql-engine/planner/logicalplan"
+	"github.com/thanos-io/promql-engine/planner/utils"
 )
 
 type Memo interface {
@@ -40,9 +40,10 @@ func (m *memo) GetOrCreateGroupExpr(node logicalplan.LogicalPlan) *GroupExpr {
 	} else {
 		id := m.groupExprIDGenerator.Generate()
 		expr := &GroupExpr{
-			ID:       id,
-			Expr:     node,
-			Children: childGroups,
+			ID:                     id,
+			Expr:                   node,
+			Children:               childGroups,
+			AppliedTransformations: make(utils.Set[TransformationRule]),
 		}
 		m.GroupExprs[node] = expr
 		return expr

diff --git a/cascades/memo/memo_test.go → planner/memo/memo_test.go b/cascades/memo/memo_test.go → planner/memo/memo_test.go
@@ -2,8 +2,8 @@ package memo
 
 import (
 	"github.com/stretchr/testify/assert"
-	"github.com/thanos-io/promql-engine/cascades/logicalplan"
 	"github.com/thanos-io/promql-engine/parser"
+	"github.com/thanos-io/promql-engine/planner/logicalplan"
 	"golang.org/x/exp/maps"
 	"reflect"
 	"testing"

diff --git a/planner/memo/transformation_rule.go b/planner/memo/transformation_rule.go
@@ -0,0 +1,11 @@
+package memo
+
+import (
+	"github.com/thanos-io/promql-engine/planner/utils"
+)
+
+type TransformationRule interface {
+	utils.Hashable
+	Match(expr *GroupExpr) bool                      // Check if the transformation can be applied to the expression
+	Transform(memo Memo, expr *GroupExpr) *GroupExpr // Transform the expression
+}
diff --git a/planner/physicalplan/plan.go b/planner/physicalplan/plan.go
@@ -0,0 +1,13 @@
+package physicalplan
+
+import (
+	"github.com/thanos-io/promql-engine/execution/model"
+	"github.com/thanos-io/promql-engine/planner/cost"
+)
+
+type PhysicalPlan interface {
+	SetChildren(children []PhysicalPlan) // set child implementations, also update the operator and cost.
+	Children() []PhysicalPlan            // Return the saved child implementations
+	Operator() model.VectorOperator      // Return the saved physical operator
+	Cost() cost.Cost                     // Return the saved cost
+}