CSVFileFormat

CSVFileFormat is a TextBasedFileFormat for csv format (i.e. registers itself to handle files in csv format and converts them to Spark SQL rows).

spark.read.format("csv").load("csv-datasets")

// or the same as above using a shortcut
spark.read.csv("csv-datasets")

CSVFileFormat uses CSV options (that in turn are used to configure the underlying CSV parser from uniVocity-parsers project).

Table 1. CSVFileFormat’s Options

Option	Default Value	Description
`charset`	`UTF-8`	Alias of encoding
`charToEscapeQuoteEscaping`	`\\`	One character to…FIXME
`codec`		Compression codec that can be either one of the known aliases or a fully-qualified class name. Alias of compression
`columnNameOfCorruptRecord`
`comment`	`\u0000`
`compression`		Compression codec that can be either one of the known aliases or a fully-qualified class name. Alias of codec
`dateFormat`	`yyyy-MM-dd`	Uses `en_US` locale
`delimiter`	`,` (comma)	Alias of sep
`encoding`	`UTF-8`	Alias of charset
`escape`	`\\`
`escapeQuotes`	`true`
`header`
`ignoreLeadingWhiteSpace`	`false` (for reading) `true` (for writing)
`ignoreTrailingWhiteSpace`	`false` (for reading) `true` (for writing)
`inferSchema`
`maxCharsPerColumn`	`-1`
`maxColumns`	`20480`
`mode`	`PERMISSIVE`	Possible values: `DROPMALFORMED` `PERMISSIVE` (default) `FAILFAST`
`multiLine`	`false`
`nanValue`	`NaN`
`negativeInf`	`-Inf`
`nullValue`	(empty string)
`positiveInf`	`Inf`
`sep`	`,` (comma)	Alias of delimiter
`timestampFormat`	`yyyy-MM-dd’T’HH:mm:ss.SSSXXX`	Uses timeZone and `en_US` locale
`timeZone`	spark.sql.session.timeZone
`quote`	`\"`
`quoteAll`	`false`

`prepareWrite` Method

prepareWrite(
  sparkSession: SparkSession,
  job: Job,
  options: Map[String, String],
  dataSchema: StructType): OutputWriterFactory

Note	`prepareWrite` is part of FileFormat Contract that is used when `FileFormatWriter` is requested to write the result of a structured query.

prepareWrite…FIXME

Building Catalyst Data Reader — `buildReader` Method

buildReader(
  sparkSession: SparkSession,
  dataSchema: StructType,
  partitionSchema: StructType,
  requiredSchema: StructType,
  filters: Seq[Filter],
  options: Map[String, String],
  hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]

Note	`buildReader` is part of FileFormat Contract to…FIXME

buildReader…FIXME

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-sql-CSVFileFormat.adoc

spark-sql-CSVFileFormat.adoc

CSVFileFormat

`prepareWrite` Method

Building Catalyst Data Reader — `buildReader` Method

Files

spark-sql-CSVFileFormat.adoc

Latest commit

History

spark-sql-CSVFileFormat.adoc

File metadata and controls

CSVFileFormat

prepareWrite Method

Building Catalyst Data Reader — buildReader Method

`prepareWrite` Method

Building Catalyst Data Reader — `buildReader` Method