SessionCatalog
is the catalog of (the metadata of) session-scoped relational temporary and permanent relational entities, i.e. databases, tables, views, partitions, and functions.
For the metadata of permanent entities (i.e. tables) SessionCatalog
requests ExternalCatalog.
Note
|
SessionCatalog is a layer over ExternalCatalog in a SparkSession which allows for different metastores (i.e. in-memory or hive ) to be used.
|
SessionCatalog
is available through SessionState (of a SparkSession).
scala> spark.version
res0: String = 2.3.0
scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog
Name | Description |
---|---|
FIXME Used when…FIXME |
|
A cache of fully-qualified table names to table relation plans (i.e. Used when |
|
FIXME Used when…FIXME |
|
Registry of temporary views (i.e. non-global temporary tables) |
requireTableExists(name: TableIdentifier): Unit
requireTableExists
…FIXME
Note
|
requireTableExists is used when…FIXME
|
databaseExists(db: String): Boolean
databaseExists
…FIXME
Note
|
databaseExists is used when…FIXME
|
listTables(db: String): Seq[TableIdentifier]
listTables(db: String, pattern: String): Seq[TableIdentifier]
listTables
…FIXME
Note
|
listTables is used when…FIXME
|
isTemporaryTable(name: TableIdentifier): Boolean
isTemporaryTable
…FIXME
Note
|
isTemporaryTable is used when…FIXME
|
alterPartitions(tableName: TableIdentifier, parts: Seq[CatalogTablePartition]): Unit
alterPartitions
…FIXME
Note
|
alterPartitions is used when…FIXME
|
listPartitions(
tableName: TableIdentifier,
partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]
listPartitions
…FIXME
Note
|
listPartitions is used when…FIXME
|
alterTable(tableDefinition: CatalogTable): Unit
alterTable
…FIXME
Note
|
|
alterTableStats(identifier: TableIdentifier, newStats: Option[CatalogStatistics]): Unit
alterTableStats
requests ExternalCatalog to alter the statistics of the table (per identifier
) followed by invalidating the table relation cache.
alterTableStats
reports a NoSuchDatabaseException
if the database does not exist.
alterTableStats
reports a NoSuchTableException
if the table does not exist.
Note
|
|
tableExists(name: TableIdentifier): Boolean
tableExists
…FIXME
Note
|
tableExists is used when…FIXME
|
Caution
|
FIXME |
Note
|
|
refreshTable(name: TableIdentifier): Unit
refreshTable
…FIXME
Note
|
refreshTable is used when…FIXME
|
alterTempViewDefinition(name: TableIdentifier, viewDefinition: LogicalPlan): Boolean
alterTempViewDefinition
alters the temporary view by updating an in-memory temporary table (when a database is not specified and the table has already been registered) or a global temporary table (when a database is specified and it is for global temporary tables).
Note
|
"Temporary table" and "temporary view" are synonyms. |
alterTempViewDefinition
returns true
when an update could be executed and finished successfully.
SessionCatalog
takes the following when created:
-
Hadoop’s Configuration
SessionCatalog
initializes the internal registries and counters.
lookupFunction(
name: FunctionIdentifier,
children: Seq[Expression]): Expression
lookupFunction
finds a function by name
.
For a function with no database defined that exists in FunctionRegistry, lookupFunction
requests FunctionRegistry
to find the function (by its unqualified name, i.e. with no database).
If the name
function has the database defined or does not exist in FunctionRegistry
, lookupFunction
uses the fully-qualified function name
to check if the function exists in FunctionRegistry (by its fully-qualified name, i.e. with a database).
For other cases, lookupFunction
requests ExternalCatalog to find the function and loads its resources. It then creates a corresponding temporary function and looks up the function again.
Note
|
lookupFunction is used exclusively when Analyzer resolves functions.
|
lookupRelation(name: TableIdentifier): LogicalPlan
lookupRelation
finds the name
table in the catalogs (i.e. GlobalTempViewManager, ExternalCatalog or registry of temporary views) and gives a SubqueryAlias
per table type.
scala> spark.version
res0: String = 2.3.0
scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog
import spark.sessionState.{catalog => c}
import org.apache.spark.sql.catalyst.TableIdentifier
// Global temp view
val db = spark.sharedState.globalTempViewManager.database
// Make the example reproducible (and so "replace")
spark.range(1).createOrReplaceGlobalTempView("gv1")
val gv1 = TableIdentifier(table = "gv1", database = Some(db))
val plan = c.lookupRelation(gv1)
scala> println(plan.numberedTreeString)
00 SubqueryAlias gv1
01 +- Range (0, 1, step=1, splits=Some(8))
val metastore = spark.sharedState.externalCatalog
// Regular table
val db = spark.catalog.currentDatabase
metastore.dropTable(db, table = "t1", ignoreIfNotExists = true, purge = true)
sql("CREATE TABLE t1 (id LONG) USING parquet")
val t1 = TableIdentifier(table = "t1", database = Some(db))
val plan = c.lookupRelation(t1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias t1
01 +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
// Regular view (not temporary view)
// Make the example reproducible
metastore.dropTable(db, table = "v1", ignoreIfNotExists = true, purge = true)
import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType}
val v1 = TableIdentifier(table = "v1", database = Some(db))
import org.apache.spark.sql.types.StructType
val schema = new StructType().add($"id".long)
val storage = CatalogStorageFormat(locationUri = None, inputFormat = None, outputFormat = None, serde = None, compressed = false, properties = Map())
val tableDef = CatalogTable(
identifier = v1,
tableType = CatalogTableType.VIEW,
storage,
schema,
viewText = Some("SELECT 1") /** Required or RuntimeException reported */)
metastore.createTable(tableDef, ignoreIfExists = false)
val plan = c.lookupRelation(v1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias v1
01 +- View (`default`.`v1`, [id#77L])
02 +- 'Project [unresolvedalias(1, None)]
03 +- OneRowRelation
// Temporary view
spark.range(1).createOrReplaceTempView("v2")
val v2 = TableIdentifier(table = "v2", database = None)
val plan = c.lookupRelation(v2)
scala> println(plan.numberedTreeString)
00 SubqueryAlias v2
01 +- Range (0, 1, step=1, splits=Some(8))
Internally, lookupRelation
looks up the name
table using:
-
GlobalTempViewManager when the database name of the table matches the name of
GlobalTempViewManager
-
Gives
SubqueryAlias
or reports aNoSuchTableException
-
-
ExternalCatalog when the database name of the table is specified explicitly or the registry of temporary views does not contain the table
-
Gives
SubqueryAlias
withView
when the table is a view (aka temporary table) -
Gives
SubqueryAlias
withUnresolvedCatalogRelation
otherwise
-
-
The registry of temporary views
-
Gives
SubqueryAlias
with the logical plan per the table as registered in the registry of temporary views
-
Note
|
lookupRelation considers default to be the name of the database if the name table does not specify the database explicitly.
|
Note
|
|
getTableMetadata(name: TableIdentifier): CatalogTable
getTableMetadata
simply requests external catalog (metastore) for the table metadata.
getTempViewOrPermanentTableMetadata(name: TableIdentifier): CatalogTable
Internally, getTempViewOrPermanentTableMetadata
branches off per database.
When a database name is not specified, getTempViewOrPermanentTableMetadata
finds a local temporary view and creates a CatalogTable (with VIEW
table type and an undefined storage) or retrieves the table metadata from an external catalog.
With the database name of the GlobalTempViewManager, getTempViewOrPermanentTableMetadata
requests GlobalTempViewManager for the global view definition and creates a CatalogTable (with the name of GlobalTempViewManager
in table identifier, VIEW
table type and an undefined storage) or reports a NoSuchTableException
.
With the database name not of GlobalTempViewManager
, getTempViewOrPermanentTableMetadata
simply retrieves the table metadata from an external catalog.
Note
|
|
Reporting NoSuchDatabaseException When Specified Database Does Not Exist — requireDbExists
Internal Method
requireDbExists(db: String): Unit
requireDbExists
reports a NoSuchDatabaseException
if the specified database does not exist. Otherwise, requireDbExists
does nothing.