Catalyst Expression — Executable Node in Catalyst Tree

Expression is a executable node (in a Catalyst tree) that can evaluate a result value given input values, i.e. can produce a JVM object per InternalRow.

Note	`Expression` is often called a Catalyst expression even though it is merely built using (not be part of) the Catalyst — Tree Manipulation Framework.

// evaluating an expression
// Use Literal expression to create an expression from a Scala object
import org.apache.spark.sql.catalyst.expressions.Expression
import org.apache.spark.sql.catalyst.expressions.Literal
val e: Expression = Literal("hello")

import org.apache.spark.sql.catalyst.expressions.EmptyRow
val v: Any = e.eval(EmptyRow)

// Convert to Scala's String
import org.apache.spark.unsafe.types.UTF8String
scala> val s = v.asInstanceOf[UTF8String].toString
s: String = hello

Expression can generate a Java source code that is then used in evaluation.

Expression is deterministic when always evaluates the same result for the same inputs. By default, an expression is deterministic if all the child expressions are (which for leaf expressions with no child expressions is true).

Note	A deterministic expression is like a pure function in functional programming languages.

scala> spark.version
res0: String = 2.3.0

val e = $"a".expr
scala> :type e
org.apache.spark.sql.catalyst.expressions.Expression

scala> println(e.deterministic)
true

verboseString is…FIXME

Table 1. Specialized Expressions

Name	Scala Kind	Behaviour	Examples
`BinaryExpression`	abstract class		UnixTimestamp
CodegenFallback	trait	Does not support code generation and falls back to interpreted mode	CallMethodViaReflection
`ExpectsInputTypes`	trait
`ExtractValue`	trait	Marks `UnresolvedAliases` to be resolved to Aliases with "pretty" SQLs when ResolveAliases is executed	GetArrayItem GetArrayStructFields GetMapValue GetStructField
`LeafExpression`	abstract class	Has no child expressions (and hence "terminates" the expression tree).	Attribute Literal
NamedExpression		Can later be referenced in a dataflow graph.
Nondeterministic	trait
`NonSQLExpression`	trait	Expression with no SQL representation Gives the only custom sql method that is non-overridable (i.e. `final`). When requested SQL representation, `NonSQLExpression` transforms Attributes to be `PrettyAttribute`s to build text representation.	ScalaUDAF StaticInvoke TimeWindow
`Predicate`	trait	Result data type is always boolean	`And` `AtLeastNNonNulls` Exists In InSet
`TernaryExpression`	abstract class
`TimeZoneAwareExpression`	trait	Timezone-aware expressions	UnixTimestamp JsonToStructs
`UnaryExpression`	abstract class		ExplodeBase Inline JsonToStructs
`Unevaluable`	trait	Cannot be evaluated, i.e. eval and doGenCode are not supported and report an `UnsupportedOperationException`. `Unevaluable` expressions have to be replaced by some other expressions during analysis or optimization or they fail analysis. /** Example: Analysis failure due to an Unevaluable expression UnresolvedFunction is an Unevaluable expression Using Catalyst DSL to create a UnresolvedFunction */ import org.apache.spark.sql.catalyst.dsl.expressions._ val f = 'f.function() import org.apache.spark.sql.catalyst.dsl.plans._ val logicalPlan = table("t1").select(f) scala> println(logicalPlan.numberedTreeString) 00 'Project [unresolvedalias('f(), None)] 01 +- 'UnresolvedRelation `t1` scala> spark.sessionState.analyzer.execute(logicalPlan) org.apache.spark.sql.AnalysisException: Undefined function: 'f'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1198) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1198) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1197) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15.applyOrElse(Analyzer.scala:1195)	AggregateExpression `CurrentDatabase` TimeWindow UnresolvedFunction WindowExpression WindowSpecDefinition

Expression Contract

package org.apache.spark.sql.catalyst.expressions

abstract class Expression extends TreeNode[Expression] {
  // only required methods that have no implementation
  def dataType: DataType
  def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode
  def eval(input: InternalRow = EmptyRow): Any
  def nullable: Boolean
}

Table 2. (Subset of) Expression Contract

Method Description

canonicalized

checkInputDataTypes

childrenResolved

dataType

Data type of the result of evaluating an expression

doGenCode

Code-generated expression evaluation that generates a Java source code (that is used to evaluate the expression in a more optimized way not directly using eval).

Used when Expression is requested to genCode.

eval

Interpreted (non-code-generated) expression evaluation that evaluates an expression to a JVM object for a given internal binary row (without generating a corresponding Java code.)

Note	By default accepts `EmptyRow`, i.e. `null`.

eval is a slower "relative" of the code-generated (non-interpreted) expression evaluation.

foldable

genCode

Generates the Java source code for code-generated (non-interpreted) expression evaluation (on an input internal row in a more optimized way not directly using eval).

Similar to doGenCode but supports expression reuse (aka subexpression elimination).

genCode is a faster "relative" of the interpreted (non-code-generated) expression evaluation.

nullable

prettyName

references

resolved

semanticEquals

semanticHash

`reduceCodeSize` Internal Method

reduceCodeSize(ctx: CodegenContext, eval: ExprCode): Unit

reduceCodeSize does its work only when all of the following are met:

Length of the generate code is above 1024
INPUT_ROW of the input CodegenContext is defined
currentVars of the input CodegenContext is not defined

Caution

FIXME When would the above not be met? What’s so special about such an expression?

reduceCodeSize sets the value of the input ExprCode to the fresh term name for the value name.

In the end, reduceCodeSize sets the code of the input ExprCode to the following:

[javaType] [newValue] = [funcFullName]([INPUT_ROW]);

The funcFullName is the fresh term name for the name of the current expression node.

Tip	Use the expression node name to search for the function that corresponds to the expression in a generated code.

Note	`reduceCodeSize` is used exclusively when `Expression` is requested to generate the Java source code for code-generated expression evaluation.

`flatArguments` Method

flatArguments: Iterator[Any]

flatArguments…FIXME

Note	`flatArguments` is used when…FIXME

SQL Representation — `sql` Method

sql: String

sql gives a SQL representation.

Internally, sql gives a text representation with prettyName followed by sql of children in the round brackets and concatenated using the comma (,).

import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions.Sentences
val sentences = Sentences("Hi there! Good morning.", "en", "US")

import org.apache.spark.sql.catalyst.expressions.Expression
val expr: Expression = count("*") === 5 && count(sentences) === 5
scala> expr.sql
res0: String = ((count('*') = 5) AND (count(sentences('Hi there! Good morning.', 'en', 'US')) = 5))

Note	`sql` is used when…FIXME

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-sql-Expression.adoc

spark-sql-Expression.adoc

Catalyst Expression — Executable Node in Catalyst Tree

Expression Contract

`reduceCodeSize` Internal Method

`flatArguments` Method

SQL Representation — `sql` Method

Files

spark-sql-Expression.adoc

Latest commit

History

spark-sql-Expression.adoc

File metadata and controls

Catalyst Expression — Executable Node in Catalyst Tree

Expression Contract

reduceCodeSize Internal Method

flatArguments Method

SQL Representation — sql Method

`reduceCodeSize` Internal Method

`flatArguments` Method

SQL Representation — `sql` Method