BucketSpec — Bucketing Specification

BucketSpec is a metadata of a bucketing of a table:

Number of buckets
Bucket column names
Column names for sorting

Note	The number of buckets has to be between `0` and `100000` exclusive.

BucketSpec is created when:

DataFrameWriter is requested to saveAsTable (and does getBucketSpec)
HiveExternalCatalog is requested to getBucketSpecFromTableProperties and tableMetaToTableProps
HiveClientImpl is requested to retrieve a table metadata
SparkSqlAstBuilder is requested to visitBucketSpec (for CREATE TABLE SQL statement with CLUSTERED BY and INTO n BUCKETS with optional SORTED BY clauses)

BucketSpec uses the following text representation (i.e. toString):

[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]

import org.apache.spark.sql.catalyst.catalog.BucketSpec
val bucketSpec = BucketSpec(
  numBuckets = 8,
  bucketColumnNames = Seq("col1"),
  sortColumnNames = Seq("col2"))
scala> println(bucketSpec)
8 buckets, bucket columns: [col1], sort columns: [col2]

`toLinkedHashMap` Method

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap gives a collection of pairs:

Num Buckets
Bucket Columns
Sort Columns

scala> println(bucketSpec.toLinkedHashMap)
Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`])

Note	`toLinkedHashMap` is used when: `CatalogTable` is requested to toLinkedHashMap `DescribeTableCommand` is requested to run with a non-empty partitionSpec and the isExtended flag on (that boils down to describeFormattedDetailedPartitionInfo).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-sql-BucketSpec.adoc

spark-sql-BucketSpec.adoc

BucketSpec — Bucketing Specification

`toLinkedHashMap` Method

Files

spark-sql-BucketSpec.adoc

Latest commit

History

spark-sql-BucketSpec.adoc

File metadata and controls

BucketSpec — Bucketing Specification

toLinkedHashMap Method

`toLinkedHashMap` Method