Skip to content

Latest commit

 

History

History
404 lines (271 loc) · 15.8 KB

spark-sql-SessionCatalog.adoc

File metadata and controls

404 lines (271 loc) · 15.8 KB

SessionCatalog — Session-Scoped Catalog of Relational Entities

SessionCatalog is the catalog of (the metadata of) session-scoped relational temporary and permanent relational entities, i.e. databases, tables, views, partitions, and functions.

For the metadata of permanent entities (i.e. tables) SessionCatalog requests ExternalCatalog.

Note
SessionCatalog is a layer over ExternalCatalog in a SparkSession which allows for different metastores (i.e. in-memory or hive) to be used.

SessionCatalog is available through SessionState (of a SparkSession).

scala> spark.version
res0: String = 2.3.0

scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog

SessionCatalog is created when SessionState sets catalog.

Table 1. SessionCatalog’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

currentDb

FIXME

Used when…​FIXME

tableRelationCache

A cache of fully-qualified table names to table relation plans (i.e. LogicalPlan).

Used when SessionCatalog refreshes a table

tempTables

FIXME

Used when…​FIXME

tempViews

Registry of temporary views (i.e. non-global temporary tables)

requireTableExists Internal Method

requireTableExists(name: TableIdentifier): Unit

requireTableExists…​FIXME

Note
requireTableExists is used when…​FIXME

databaseExists Method

databaseExists(db: String): Boolean

databaseExists…​FIXME

Note
databaseExists is used when…​FIXME

listTables Method

listTables(db: String): Seq[TableIdentifier]
listTables(db: String, pattern: String): Seq[TableIdentifier]

listTables…​FIXME

Note
listTables is used when…​FIXME

isTemporaryTable Method

isTemporaryTable(name: TableIdentifier): Boolean

isTemporaryTable…​FIXME

Note
isTemporaryTable is used when…​FIXME

alterPartitions Method

alterPartitions(tableName: TableIdentifier, parts: Seq[CatalogTablePartition]): Unit

alterPartitions…​FIXME

Note
alterPartitions is used when…​FIXME

listPartitions Method

listPartitions(
  tableName: TableIdentifier,
  partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]

listPartitions…​FIXME

Note
listPartitions is used when…​FIXME

alterTable Method

alterTable(tableDefinition: CatalogTable): Unit

alterTable…​FIXME

Note

alterTable is used when the following logical commands are executed:

  • AlterTableSetPropertiesCommand, AlterTableUnsetPropertiesCommand, AlterTableChangeColumnCommand, AlterTableSerDePropertiesCommand, AlterTableRecoverPartitionsCommand, AlterTableSetLocationCommand, AlterViewAsCommand (for permanent views)

Altering Table Statistics in Metastore (and Invalidating Internal Cache) — alterTableStats Method

alterTableStats(identifier: TableIdentifier, newStats: Option[CatalogStatistics]): Unit

alterTableStats requests ExternalCatalog to alter the statistics of the table (per identifier) followed by invalidating the table relation cache.

alterTableStats reports a NoSuchDatabaseException if the database does not exist.

alterTableStats reports a NoSuchTableException if the table does not exist.

Note

alterTableStats is used when the following logical commands are executed:

tableExists Method

tableExists(name: TableIdentifier): Boolean

tableExists…​FIXME

Note
tableExists is used when…​FIXME

functionExists Method

Caution
FIXME
Note

functionExists is used in:

listFunctions Method

Caution
FIXME

Invalidating Table Relation Cache (aka Refreshing Table) — refreshTable Method

refreshTable(name: TableIdentifier): Unit

refreshTable…​FIXME

Note
refreshTable is used when…​FIXME

createTempFunction Method

Caution
FIXME

loadFunctionResources Method

Caution
FIXME

alterTempViewDefinition Method

alterTempViewDefinition(name: TableIdentifier, viewDefinition: LogicalPlan): Boolean

alterTempViewDefinition alters the temporary view by updating an in-memory temporary table (when a database is not specified and the table has already been registered) or a global temporary table (when a database is specified and it is for global temporary tables).

Note
"Temporary table" and "temporary view" are synonyms.

alterTempViewDefinition returns true when an update could be executed and finished successfully.

createTempView Method

Caution
FIXME

createGlobalTempView Method

Caution
FIXME

createTable Method

Caution
FIXME

Creating SessionCatalog Instance

SessionCatalog takes the following when created:

SessionCatalog initializes the internal registries and counters.

Finding Function by Name (Using FunctionRegistry) — lookupFunction Method

lookupFunction(
  name: FunctionIdentifier,
  children: Seq[Expression]): Expression

lookupFunction finds a function by name.

For a function with no database defined that exists in FunctionRegistry, lookupFunction requests FunctionRegistry to find the function (by its unqualified name, i.e. with no database).

If the name function has the database defined or does not exist in FunctionRegistry, lookupFunction uses the fully-qualified function name to check if the function exists in FunctionRegistry (by its fully-qualified name, i.e. with a database).

For other cases, lookupFunction requests ExternalCatalog to find the function and loads its resources. It then creates a corresponding temporary function and looks up the function again.

Note
lookupFunction is used exclusively when Analyzer resolves functions.

Finding Relation in Catalogs (and Creating SubqueryAlias per Table Type) — lookupRelation Method

lookupRelation(name: TableIdentifier): LogicalPlan

lookupRelation finds the name table in the catalogs (i.e. GlobalTempViewManager, ExternalCatalog or registry of temporary views) and gives a SubqueryAlias per table type.

scala> spark.version
res0: String = 2.3.0

scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog

import spark.sessionState.{catalog => c}
import org.apache.spark.sql.catalyst.TableIdentifier

// Global temp view
val db = spark.sharedState.globalTempViewManager.database
// Make the example reproducible (and so "replace")
spark.range(1).createOrReplaceGlobalTempView("gv1")
val gv1 = TableIdentifier(table = "gv1", database = Some(db))
val plan = c.lookupRelation(gv1)
scala> println(plan.numberedTreeString)
00 SubqueryAlias gv1
01 +- Range (0, 1, step=1, splits=Some(8))

val metastore = spark.sharedState.externalCatalog

// Regular table
val db = spark.catalog.currentDatabase
metastore.dropTable(db, table = "t1", ignoreIfNotExists = true, purge = true)
sql("CREATE TABLE t1 (id LONG) USING parquet")
val t1 = TableIdentifier(table = "t1", database = Some(db))
val plan = c.lookupRelation(t1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias t1
01 +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

// Regular view (not temporary view)
// Make the example reproducible
metastore.dropTable(db, table = "v1", ignoreIfNotExists = true, purge = true)
import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType}
val v1 = TableIdentifier(table = "v1", database = Some(db))
import org.apache.spark.sql.types.StructType
val schema = new StructType().add($"id".long)
val storage = CatalogStorageFormat(locationUri = None, inputFormat = None, outputFormat = None, serde = None, compressed = false, properties = Map())
val tableDef = CatalogTable(
  identifier = v1,
  tableType = CatalogTableType.VIEW,
  storage,
  schema,
  viewText = Some("SELECT 1") /** Required or RuntimeException reported */)
metastore.createTable(tableDef, ignoreIfExists = false)
val plan = c.lookupRelation(v1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias v1
01 +- View (`default`.`v1`, [id#77L])
02    +- 'Project [unresolvedalias(1, None)]
03       +- OneRowRelation

// Temporary view
spark.range(1).createOrReplaceTempView("v2")
val v2 = TableIdentifier(table = "v2", database = None)
val plan = c.lookupRelation(v2)
scala> println(plan.numberedTreeString)
00 SubqueryAlias v2
01 +- Range (0, 1, step=1, splits=Some(8))

Internally, lookupRelation looks up the name table using:

  1. GlobalTempViewManager when the database name of the table matches the name of GlobalTempViewManager

    1. Gives SubqueryAlias or reports a NoSuchTableException

  2. ExternalCatalog when the database name of the table is specified explicitly or the registry of temporary views does not contain the table

    1. Gives SubqueryAlias with View when the table is a view (aka temporary table)

    2. Gives SubqueryAlias with UnresolvedCatalogRelation otherwise

  3. The registry of temporary views

    1. Gives SubqueryAlias with the logical plan per the table as registered in the registry of temporary views

Note
lookupRelation considers default to be the name of the database if the name table does not specify the database explicitly.
Note

lookupRelation is used when:

Retrieving Table Metadata from External Catalog (Metastore) — getTableMetadata Method

getTableMetadata(name: TableIdentifier): CatalogTable

getTableMetadata simply requests external catalog (metastore) for the table metadata.

Before requesting the external metastore, getTableMetadata makes sure that the database and table (of the input TableIdentifier) both exist. If either does not exist, getTableMetadata reports a NoSuchDatabaseException or NoSuchTableException, respectively.

Retrieving Table Metadata — getTempViewOrPermanentTableMetadata Method

getTempViewOrPermanentTableMetadata(name: TableIdentifier): CatalogTable

Internally, getTempViewOrPermanentTableMetadata branches off per database.

When a database name is not specified, getTempViewOrPermanentTableMetadata finds a local temporary view and creates a CatalogTable (with VIEW table type and an undefined storage) or retrieves the table metadata from an external catalog.

With the database name of the GlobalTempViewManager, getTempViewOrPermanentTableMetadata requests GlobalTempViewManager for the global view definition and creates a CatalogTable (with the name of GlobalTempViewManager in table identifier, VIEW table type and an undefined storage) or reports a NoSuchTableException.

With the database name not of GlobalTempViewManager, getTempViewOrPermanentTableMetadata simply retrieves the table metadata from an external catalog.

Note

getTempViewOrPermanentTableMetadata is used when:

Reporting NoSuchDatabaseException When Specified Database Does Not Exist — requireDbExists Internal Method

requireDbExists(db: String): Unit

requireDbExists reports a NoSuchDatabaseException if the specified database does not exist. Otherwise, requireDbExists does nothing.