CSVFileFormat
is a TextBasedFileFormat for csv format (i.e. registers itself to handle files in csv format and converts them to Spark SQL rows).
spark.read.format("csv").load("csv-datasets")
// or the same as above using a shortcut
spark.read.csv("csv-datasets")
CSVFileFormat
uses CSV options (that in turn are used to configure the underlying CSV parser from uniVocity-parsers project).
Option | Default Value | Description |
---|---|---|
|
Alias of encoding |
|
|
One character to…FIXME |
|
Compression codec that can be either one of the known aliases or a fully-qualified class name. Alias of compression |
||
|
||
Compression codec that can be either one of the known aliases or a fully-qualified class name. Alias of codec |
||
|
Uses |
|
|
Alias of sep |
|
|
Alias of charset |
|
|
||
|
||
|
||
|
||
|
||
|
||
|
Possible values:
|
|
|
||
|
||
|
||
(empty string) |
||
|
||
|
Alias of delimiter |
|
|
Uses timeZone and |
|
|
||
|
prepareWrite(
sparkSession: SparkSession,
job: Job,
options: Map[String, String],
dataSchema: StructType): OutputWriterFactory
Note
|
prepareWrite is part of FileFormat Contract that is used when FileFormatWriter is requested to write the result of a structured query.
|
prepareWrite
…FIXME
buildReader(
sparkSession: SparkSession,
dataSchema: StructType,
partitionSchema: StructType,
requiredSchema: StructType,
filters: Seq[Filter],
options: Map[String, String],
hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]
Note
|
buildReader is part of FileFormat Contract to…FIXME
|
buildReader
…FIXME