BucketSpec
is a metadata of a bucketing of a table:
Note
|
The number of buckets has to be between 0 and 100000 exclusive.
|
BucketSpec
is created when:
-
DataFrameWriter
is requested to saveAsTable (and does getBucketSpec) -
HiveExternalCatalog
is requested to getBucketSpecFromTableProperties and tableMetaToTableProps -
HiveClientImpl
is requested to retrieve a table metadata -
SparkSqlAstBuilder
is requested to visitBucketSpec (forCREATE TABLE
SQL statement withCLUSTERED BY
andINTO n BUCKETS
with optionalSORTED BY
clauses)
BucketSpec
uses the following text representation (i.e. toString
):
[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]
import org.apache.spark.sql.catalyst.catalog.BucketSpec
val bucketSpec = BucketSpec(
numBuckets = 8,
bucketColumnNames = Seq("col1"),
sortColumnNames = Seq("col2"))
scala> println(bucketSpec)
8 buckets, bucket columns: [col1], sort columns: [col2]
toLinkedHashMap: mutable.LinkedHashMap[String, String]
toLinkedHashMap
gives a collection of pairs:
-
Num Buckets
-
Bucket Columns
-
Sort Columns
scala> println(bucketSpec.toLinkedHashMap)
Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`])
Note
|
|