Skip to content

Commit

Permalink
feat(json): Support more streaming JSON formats (WIP) (#2016)
Browse files Browse the repository at this point in the history
  • Loading branch information
ibgreen authored Mar 1, 2022
1 parent 314f752 commit 786ec6b
Show file tree
Hide file tree
Showing 12 changed files with 3,717 additions and 4,053 deletions.
2 changes: 2 additions & 0 deletions docs/table-of-contents.json
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@
{"entry": "modules/images/docs/api-reference/image-writer"},
{"entry": "modules/json/docs/api-reference/json-loader"},
{"entry": "modules/json/docs/api-reference/ndjson-loader"},
{"entry": "modules/json/docs/api-reference/geojson-loader"},
{"entry": "modules/json/docs/api-reference/ndgeojson-loader"},
{"entry": "modules/kml/docs/api-reference/gpx-loader"},
{"entry": "modules/kml/docs/api-reference/kml-loader"},
{"entry": "modules/kml/docs/api-reference/tcx-loader"},
Expand Down
14 changes: 14 additions & 0 deletions docs/whats-new.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,20 @@ Target Release Date: Q1 2022.

- ArrowLoader now recognizes recommended Arrow MIME types `application/vnd.apache.arrow.file`, `application/vnd.apache.arrow.stream`.

**@loaders.gl/json**

- [`GeoJSONLoader`](modules/json/docs/api-reference/geojson-loader)
- [`NDJSONLoader`](modules/json/docs/api-reference/ndjson-loader) - Now supports JSONL, JSONSeq etc.
- [`NDGeoJSONLoader`](modules/json/docs/api-reference/ndgeojson-loader) - Now supports JSONL, JSONSeq etc.

**@loaders.gl/terrain**

- Fix winding order of "skirt" geometries, to prevent them from being incorrectly hidden by GPU face culling during rendering.

**@loaders.gl/worker-utils**

- Experimental support for Node.js workers.

## v3.1

Release Date: Dec 7, 2021.
Expand Down
89 changes: 50 additions & 39 deletions modules/json/docs/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# Overview

The `@loaders.gl/json` module handles tabular data stored in the [JSON file format](https://www.json.org/json-en.html).
The `@loaders.gl/json` module parses JSON. It can parse arbitrary JSON data but is optimized for:

- loading tabular data stored in JSON arrays.
- loading tabular geospatial data stored in GeoJSON.
- loading tabular data from various streaming JSON and GeoJSON formats, such as new-line delimited JSON.

The JSON loaders also support batched parsing which can be useful when loading very large tabular JSON files
to avoid blocking for tens of seconds.

## Installation

Expand All @@ -10,44 +17,48 @@ npm install @loaders.gl/core @loaders.gl/json

## Loaders and Writers

| Loader |
| --------------------------------------------------------------- |
| [`JSONLoader`](modules/json/docs/api-reference/json-loader) |
| [`NDJSONLoader`](modules/json/docs/api-reference/ndjson-loader) |
| Loader |
| --------------------------------------------------------------------- |
| [`JSONLoader`](modules/json/docs/api-reference/json-loader) |
| [`NDJSONLoader`](modules/json/docs/api-reference/ndjson-loader) |
| [`GeoJSONLoader`](modules/json/docs/api-reference/geojson-loader) |
| [`NDGeoJSONLoader`](modules/json/docs/api-reference/ndgeojson-loader) |

## Additional APIs

See table category.

## Module Roadmap

### General Improvements

Error messages: `JSON.parse` tends to have unhelpful error messages

### Support More Streaming JSON Formats

- Overview of [JSON Streaming Formats](https://en.wikipedia.org/wiki/JSON_streaming) (Wikipedia).

- [Line-delimited JSON](http://jsonlines.org/) (LDJSON) (aka JSON lines) (JSONL).

### Autodetection of streaming JSON

A number of hints can be used to determine if the data is formatted using a streaming JSON format

- if the filename extension is `.jsonl`
- if the MIMETYPE is `application/json-seq`
- if the first value in the file is a number, assume the file is length prefixed.

For data in non-streaming JSON format, the presence of a top-level array will start streaming of objects.

For embedded arrays, a path specifier may need to be supplied (or could look for first array).

### MIME Types and File Extensions

| Format | Extension | MIME Media Type [RFC4288](https://www.ietf.org/rfc/rfc4288.txt) |
| ------------------------------- | --------- | --------------------------------------------------------------- |
| Standard JSON | `.json` | `application/json` |
| Line-delimited JSON | `.jsonl` | - |
| NewLine delimited JSON | `.ndjson` | `application/x-ndjson` |
| Record separator-delimited JSON | - | `application/json-seq` |
- See [table category](/docs/specifications/category-table).
- See [GIS category](/docs/specifications/category-gis).

## JSON Format Notes

The classic JSON format was designed for simplicity and is supported by standard libraries in many programming languages.

Several [JSON Streaming Formats](https://en.wikipedia.org/wiki/JSON_streaming) (Wikipedia) have emerged, that typically
place one JSON object on each line of a file. These are convenient to use when streaming data and are
supported by via the `NDJSONLoader` and `NDGeoJSONLoader`.

At the moment, auto-detection between streaming and classic JSON based on file contents
is not implemented, so two separate loaders are provided.
The two loaders look for different file extensions or MIME types as specified in the table below,
allowing correct distinctions to be made in usage.

| Format | Extension | MIME Media Type | Support |
| ------------------------------------------------- | ------------ | -------------------------- | ------------------------------------------------------------ | --- |
| [JSON][format_json] | `.json` | `application/json` | `JSONLoader` |
| [NewLine Delimited JSON][format_ndjson] | `.ndjson` | `application/x-ndjson` | `NDJSONLoader` |
| [JSON Lines][format_jsonlines] | `.jsonl` | `application/x-ldjson` | `NDJSONLoader` |
| [JSON Text Sequences][format_json_seq] | | `application/json-seq` | `NDJSONLoader`. Partial records must not span multiple lines. | |
| [GeoJSON][format_geojson] | `.json` | `application/geo+json` | `JSONLoader` |
| [Newline Delimited GeoJSON][format_ndgeojson] | `.ndgeojson` | | `NDJSONLoader` |
| [GeoJSON Lines][format_geojson] | `.geojsonl` | | `NDJSONLoader` |
| [GeoJSON Text Sequences][format_geojson_text_seq] | | `application/geo+json-seq` | `NDJSONLoader` |

[format_json]: https://www.json.org/json-en.html
[format_ndjson]: http://ndjson.org/
[format_jsonlines]: http://jsonlines.org/
[format_json_seq]: https://datatracker.ietf.org/doc/html/rfc7464
[format_geojson]: https://geojson.org/
[format_ndgeojson]: https://stevage.github.io/ndgeojson/
[format_geojsonl]: https://www.placemark.io/documentation/geojsonl
[format_geojson_text_seq]: https://datatracker.ietf.org/doc/html/rfc8142
[rfc4288]: https://www.ietf.org/rfc/rfc4288.txt
114 changes: 114 additions & 0 deletions modules/json/docs/api-reference/geojson-loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# GeoJSONLoader

Streaming loader for GeoJSON encoded files.

| Loader | Characteristic |
| -------------- | ---------------------------------------------------- |
| File Extension | `.geojson` |
| Media Type | `application/geo+json` |
| File Type | Text |
| File Format | [GeoJSON][format_geojson] |
| Data Format | [Classic Table](/docs/specifications/category-table) |
| Supported APIs | `load`, `parse`, `parseSync`, `parseInBatches` |

[format_geojson]: https://geojson.org

## Usage

For simple usage, you can load and parse a JSON file atomically:

```js
import {GeoJSONLoader} from '@loaders.gl/json';
import {load} from '@loaders.gl/core';

const data = await load(url, GeoJSONLoader, {json: options});
```

For larger files, GeoJSONLoader supports streaming JSON parsing, in which case it will yield "batches" of rows from one array.
To parse a stream of GeoJSON, the user can specify the `options.json.jsonpaths` to stream the `features` array.

```js
import {GeoJSONLoader} from '@loaders.gl/json';
import {loadInBatches} from '@loaders.gl/core';

const batches = await loadInBatches('geojson.json', GeoJSONLoader, {json: {jsonpaths: ['$.features']}});

for await (const batch of batches) {
// batch.data will contain a number of rows
for (const feature of batch.data) {
switch (feature.geometry.type) {
case 'Polygon':
...
}
}
}
```

If no JSONPath is specified the loader will stream the first array it encounters in the JSON payload.

When batch parsing an embedded JSON array as a table, it is possible to get access to the containing object supplying the `{metadata: true}` option.

The loader will yield an initial and a final batch with `batch.container` providing the container object and `batch.batchType` set to `partial-result` and `final-result` respectively.

```js
import {GeoJSONLoader} from '@loaders.gl/json';
import {loadInBatches} from '@loaders.gl/core';

const batches = await loadInBatches('geojson.json', GeoJSONLoader);

for await (const batch of batches) {
switch (batch.batchType) {
case 'partial-result': // contains fields seen so far
case 'final-result': // contains all fields except the streamed array
console.log(batch.container);
break;
case 'data:
// batch.data will contain a number of rows
for (const feature of batch.data) {
switch (feature.geometry.type) {
case 'Polygon':
...
}
}
}
}
```
## Data Format
Parsed batches are of the format
```ts
{
batchType: 'metadata' | 'partial-result' | 'final-result' | undefined;
jsonpath: string;
// standard batch payload
data: any[] | any;
bytesUsed: number;
batchCount: number;
}
```
## Options
Supports table category options such as `batchType` and `batchSize`.
| Option | From | Type | Default | Description |
| ---------------------- | ------------------------------------------------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------- |
| `json.table` | [![Website shields.io](https://img.shields.io/badge/v2.0-blue.svg?style=flat-square)] | `boolean` | `false` | Parses non-streaming JSON as table, i.e. return the first embedded array in the JSON. Always `true` during batched/streaming parsing. |
| `json.jsonpaths` | [![Website shields.io](https://img.shields.io/badge/v2.2-blue.svg?style=flat-square)] | `string[]` | `[]` | A list of JSON paths (see below) indicating the array that can be streamed. |
| `metadata` (top level) | [![Website shields.io](https://img.shields.io/badge/v2.2-blue.svg?style=flat-square)] | `boolean` | If `true`, yields an initial and final batch containing the partial and final result (i.e. the root object, excluding the array being streamed). |
## JSONPaths
A minimal subset of the JSONPath syntax is supported, to specify which array in a JSON object should be streamed as batchs.
`$.component1.component2.component3`
- No support for wildcards, brackets etc. Only paths starting with `$` (JSON root) are supported.
- Regardless of the paths provided, only arrays will be streamed.
## Attribution
This loader is based on a fork of dscape's [`clarinet`](https://github.com/dscape/clarinet) under BSD 2-clause license.
8 changes: 6 additions & 2 deletions modules/json/docs/api-reference/json-loader.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,26 @@ Streaming loader for JSON encoded files.

| Loader | Characteristic |
| -------------- | ---------------------------------------------------- |
| File Extension | `.json`, |
| File Extension | `.json` |
| Media Type | `application/json`
| File Type | Text |
| File Format | [JSON](https://www.json.org/json-en.html) |
| Data Format | [Classic Table](/docs/specifications/category-table) |
| Supported APIs | `load`, `parse`, `parseSync`, `parseInBatches` |

## Usage

For simple usage, you can load and parse a JSON file atomically:

```js
import {JSONLoader} from '@loaders.gl/json';
import {load} from '@loaders.gl/core';

const data = await load(url, JSONLoader, {json: options});
```

The JSONLoader supports streaming JSON parsing, in which case it will yield "batches" of rows from one array. To e.g. parse a stream of GeoJSON, the user can specify the `options.json.jsonpaths` to stream the `features` array.
For larger files, JSONLoader supports streaming JSON parsing, in which case it will yield "batches" of rows from one array.
To parse a stream of GeoJSON, the user can specify the `options.json.jsonpaths` to stream the `features` array.

```js
import {JSONLoader} from '@loaders.gl/json';
Expand Down
19 changes: 19 additions & 0 deletions modules/json/docs/api-reference/ndgeojson-loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# NDGeoJSONLoader

For GeoJSON, the root level FeatureCollection object is removed with a simple array of features, one per line

Streaming loader for NDJSON encoded files and related formats (LDJSON and JSONL).

| Loader | Characteristic |
| -------------- | ---------------------------------------------------- |
| File Extension | `.ndgeojson`, `.geojsonl`, `.ldgeojson` |
| Media Type | `application/geo+x-ndjson`, `application/geo+x-ldjson`, `application/geo+json-seq` |
| File Type | Text |
| File Format | [NDJSON][format_ndjson], [LDJSON][format_], [][format_] |
| Data Format | [Classic Table](/docs/specifications/category-table) |
| Supported APIs | `load`, `parse`, `parseSync`, `parseInBatches` |

[format_geojsonl]: https://www.placemark.io/documentation/geojsonl
[format_geojsonseq]:
[format_ldjson]: http://ndjson.org/
[format_jsonjson]: http://ndjson.org/
13 changes: 9 additions & 4 deletions modules/json/docs/api-reference/ndjson-loader.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
# NDJSONLoader

Streaming loader for NDJSON encoded files.
Streaming loader for NDJSON encoded files and related formats (LDJSON and JSONL).


| Loader | Characteristic |
| -------------- | ---------------------------------------------------- |
| File Extension | `.ndjson`, |
| File Extension | `.ndjson`, `.jsonl`, `.ldjson` |
| Media Type | `application/x-ndjson`, `application/x-ldjson`, `application/json-seq` |
| File Type | Text |
| File Format | [NDJSON](http://ndjson.org/) |
| File Format | [NDJSON][format_ndjson], [LDJSON][format_], [][format_] |
| Data Format | [Classic Table](/docs/specifications/category-table) |
| Supported APIs | `load`, `parse`, `parseSync`, `parseInBatches` |

[format_ndjson]: http://ndjson.org/
[format_ldjson]: http://ndjson.org/
[format_jsonjson]: http://ndjson.org/

## Usage

```js
Expand Down Expand Up @@ -54,4 +60,3 @@ Each element in the `data` array corresponds to a line (Object) in the NDJSON da
## Options

Supports the table category options such as `batchSize`.

13 changes: 0 additions & 13 deletions modules/json/src/json-loader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,19 +35,6 @@ export const JSONLoader: LoaderWithParser = {
version: VERSION,
extensions: ['json', 'geojson'],
mimeTypes: ['application/json'],
// TODO - support various line based JSON formats
/*
extensions: {
json: null,
jsonl: {stream: true},
ndjson: {stream: true}
},
mimeTypes: {
'application/json': null,
'application/json-seq': {stream: true},
'application/x-ndjson': {stream: true}
},
*/
category: 'table',
text: true,
parse,
Expand Down
53 changes: 0 additions & 53 deletions modules/json/src/jsonl-loader.ts

This file was deleted.

Loading

0 comments on commit 786ec6b

Please sign in to comment.