Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 2.52 KB

spark-sql-BucketSpec.adoc

File metadata and controls

66 lines (47 loc) · 2.52 KB

BucketSpec — Bucketing Specification

BucketSpec is a metadata of a bucketing of a table:

  • Number of buckets

  • Bucket column names

  • Column names for sorting

Note
The number of buckets has to be between 0 and 100000 exclusive.

BucketSpec is created when:

  1. DataFrameWriter is requested to saveAsTable (and does getBucketSpec)

  2. HiveExternalCatalog is requested to getBucketSpecFromTableProperties and tableMetaToTableProps

  3. HiveClientImpl is requested to retrieve a table metadata

  4. SparkSqlAstBuilder is requested to visitBucketSpec (for CREATE TABLE SQL statement with CLUSTERED BY and INTO n BUCKETS with optional SORTED BY clauses)

BucketSpec uses the following text representation (i.e. toString):

[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]
import org.apache.spark.sql.catalyst.catalog.BucketSpec
val bucketSpec = BucketSpec(
  numBuckets = 8,
  bucketColumnNames = Seq("col1"),
  sortColumnNames = Seq("col2"))
scala> println(bucketSpec)
8 buckets, bucket columns: [col1], sort columns: [col2]

toLinkedHashMap Method

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap gives a collection of pairs:

  • Num Buckets

  • Bucket Columns

  • Sort Columns

scala> println(bucketSpec.toLinkedHashMap)
Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`])
Note

toLinkedHashMap is used when: