Skip to content

Latest commit

 

History

History
58 lines (41 loc) · 2.92 KB

spark-sql-Optimizer-OptimizeIn.adoc

File metadata and controls

58 lines (41 loc) · 2.92 KB

OptimizeIn Logical Optimization

OptimizeIn is a logical optimization that transforms logical plans with In predicate expressions as follows:

  1. Replaces an In expression that has an empty list and the value expression not nullable to false

  2. Eliminates duplicates of Literal expressions in an In predicate expression that is inSetConvertible

  3. Replaces an In predicate expression that is inSetConvertible with InSet expressions when the number of literal expressions in the list expression is greater than spark.sql.optimizer.inSetConversionThreshold internal configuration property (default: 10)

OptimizeIn is part of Operator Optimizations batch in the standard batches of the base Spark Optimizer.

Technically, OptimizeIn is a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan].

// Use Catalyst DSL to define a logical plan

// HACK: Disable symbolToColumn implicit conversion
// It is imported automatically in spark-shell (and makes demos impossible)
// implicit def symbolToColumn(s: Symbol): org.apache.spark.sql.ColumnName
trait ThatWasABadIdea
implicit def symbolToColumn(ack: ThatWasABadIdea) = ack

import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._

import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
val rel = LocalRelation('a.int, 'b.int, 'c.int)

import org.apache.spark.sql.catalyst.expressions.{In, Literal}
val plan = rel
  .where(In('a, Seq[Literal](1, 2, 3)))
  .analyze
scala> println(plan.numberedTreeString)
00 Filter a#6 IN (1,2,3)
01 +- LocalRelation <empty>, [a#6, b#7, c#8]

// In --> InSet
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", 0)

import org.apache.spark.sql.catalyst.optimizer.OptimizeIn
val optimizedPlan = OptimizeIn(plan)
scala> println(optimizedPlan.numberedTreeString)
00 Filter a#6 INSET (1,2,3)
01 +- LocalRelation <empty>, [a#6, b#7, c#8]

Applying OptimizeIn To Logical Plan (Executing OptimizeIn) — apply Method

apply(plan: LogicalPlan): LogicalPlan
Note
apply is part of Rule Contract to apply a rule to a TreeNode, e.g. logical plan.

apply…​FIXME