Skip to content

Latest commit

 

History

History
66 lines (50 loc) · 1.83 KB

spark-sql-Optimizer-LimitPushDown.adoc

File metadata and controls

66 lines (50 loc) · 1.83 KB

LimitPushDown Logical Optimization

LimitPushDown is a LogicalPlan optimization rule that transforms the following logical plans:

  • LocalLimit with Union

  • LocalLimit with Join

LimitPushDown is part of Operator Optimizations batch in the base Optimizer.

// test datasets
scala> val ds1 = spark.range(4)
ds1: org.apache.spark.sql.Dataset[Long] = [value: bigint]

scala> val ds2 = spark.range(2)
ds2: org.apache.spark.sql.Dataset[Long] = [value: bigint]

// Case 1. Rather than `LocalLimit` of `Union` do `Union` of `LocalLimit`
scala> ds1.union(ds2).limit(2).explain(true)
== Parsed Logical Plan ==
GlobalLimit 2
+- LocalLimit 2
   +- Union
      :- Range (0, 4, step=1, splits=Some(8))
      +- Range (0, 2, step=1, splits=Some(8))

== Analyzed Logical Plan ==
id: bigint
GlobalLimit 2
+- LocalLimit 2
   +- Union
      :- Range (0, 4, step=1, splits=Some(8))
      +- Range (0, 2, step=1, splits=Some(8))

== Optimized Logical Plan ==
GlobalLimit 2
+- LocalLimit 2
   +- Union
      :- LocalLimit 2
      :  +- Range (0, 4, step=1, splits=Some(8))
      +- LocalLimit 2
         +- Range (0, 2, step=1, splits=Some(8))

== Physical Plan ==
CollectLimit 2
+- Union
   :- *LocalLimit 2
   :  +- *Range (0, 4, step=1, splits=Some(8))
   +- *LocalLimit 2
      +- *Range (0, 2, step=1, splits=Some(8))

apply Method

Caution
FIXME

Creating LimitPushDown Instance

LimitPushDown takes the following when created:

LimitPushDown initializes the internal registries and counters.

Note
LimitPushDown is created when