Skip to content

Latest commit

 

History

History
90 lines (62 loc) · 2.49 KB

spark-sql-JDBCRDD.adoc

File metadata and controls

90 lines (62 loc) · 2.49 KB

JDBCRDD

JDBCRDD is a RDD of internal binary rows that represents a structured query over a table in a database accessed via JDBC.

Note
JDBCRDD represents a "SELECT requiredColumns FROM table" query.

JDBCRDD is created exclusively when JDBCRDD is requested to scanTable (when JDBCRelation is requested to build a scan).

Table 1. JDBCRDD’s Internal Properties (e.g. Registries, Counters and Flags)
Name Description

columnList

Column names

Used when…​FIXME

filterWhereClause

Filters as a SQL WHERE clause

Used when…​FIXME

Computing Partition (in TaskContext) — compute Method

compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]
Note
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).

compute…​FIXME

resolveTable Method

resolveTable(options: JDBCOptions): StructType

resolveTable…​FIXME

Note
resolveTable is used exclusively when JDBCRelation is requested for the schema.

scanTable Method

scanTable(
  sc: SparkContext,
  schema: StructType,
  requiredColumns: Array[String],
  filters: Array[Filter],
  parts: Array[Partition],
  options: JDBCOptions): RDD[InternalRow]

scanTable…​FIXME

Note
scanTable is used when…​FIXME

Creating JDBCRDD Instance

JDBCRDD takes the following when created:

  • SparkContext

  • Function to create a Connection (() ⇒ Connection)

  • Schema (StructType)

  • Array of column names

  • Array of Filter predicates

  • Array of Spark Core’s Partitions

  • Connection URL

  • JDBCOptions

JDBCRDD initializes the internal registries and counters.

getPartitions Method

getPartitions: Array[Partition]
Note
getPartitions is part of Spark Core’s RDD Contract to…​FIXME

getPartitions simply returns the partitions (this JDBCRDD was created with).