Skip to content

Commit

Permalink
Add support for DuckDB database files to sql and DuckDBClient.of (#1065)
Browse files Browse the repository at this point in the history
* Add support for DuckDB database files to sql and DuckDBClient.of

closes #1057

* let DuckDB handle any other file as a database file

* document

* a bit more doc

* clarify attach, append an example database and the associated (but inert) data loader.

* doc edits

* .{db,ddb,duckdb}

---------

Co-authored-by: Mike Bostock <[email protected]>
  • Loading branch information
Fil and mbostock authored Mar 18, 2024
1 parent a631835 commit a020a7b
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 3 deletions.
18 changes: 16 additions & 2 deletions docs/lib/duckdb.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# DuckDB

<div class="tip">The most convenient way to use DuckDB in Observable is the built-in <a href="../sql">SQL code blocks</a> and <a href="../sql#sql-literals"><code>sql</code> tagged template literal</a>. Use <code>DuckDBClient</code> or DuckDB-Wasm directly, as shown here, if you need greater control.</div>

DuckDB is “an in-process SQL OLAP Database Management System. [DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) brings DuckDB to every browser thanks to WebAssembly.” DuckDB-Wasm is available by default as `duckdb` in Markdown, but you can explicitly import it as:

```js echo
Expand All @@ -12,7 +14,7 @@ For convenience, we provide a [`DatabaseClient`](https://observablehq.com/@obser
import {DuckDBClient} from "npm:@observablehq/duckdb";
```

To get a DuckDB client, pass zero or more named tables to `DuckDBClient.of`. Each table can be expressed as a [`FileAttachment`](../javascript/files), [Arquero table](./arquero), [Arrow table](./arrow), an array of objects, or a promise to the same. For example, below we load a sample of 250,000 stars from the [Gaia Star Catalog](https://observablehq.com/@cmudig/peeking-into-the-gaia-star-catalog) as a [Apache Parquet](https://parquet.apache.org/) file:
To get a DuckDB client, pass zero or more named tables to `DuckDBClient.of`. Each table can be expressed as a [`FileAttachment`](../javascript/files), [Arquero table](./arquero), [Arrow table](./arrow), an array of objects, or a promise to the same. For file attachments, the following formats are supported: [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), and [Apache Parquet](./lib/arrow#apache-parquet). For example, below we load a sample of 250,000 stars from the [Gaia Star Catalog](https://observablehq.com/@cmudig/peeking-into-the-gaia-star-catalog) as a Parquet file:

```js echo
const db = DuckDBClient.of({gaia: FileAttachment("gaia-sample.parquet")});
Expand Down Expand Up @@ -53,7 +55,17 @@ Plot.plot({
})
```

For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import).
You can also [attach](https://duckdb.org/docs/sql/statements/attach) a complete database saved as DuckDB file, typically using the `.db` file extension (or `.ddb` or `.duckdb`). In this case, the associated name (below `base`) is a _schema_ name rather than a _table_ name.

```js echo
const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")});
```

```js echo
db2.queryRow(`SELECT COUNT() FROM base.events`)
```

For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import). DuckDB offers many affordances to make this easier (in many cases it detects the file format and uses the correct loader automatically).

```js run=false
const db = await DuckDBClient.of();
Expand All @@ -70,6 +82,8 @@ As an alternative to `db.sql`, there’s also `db.query`:
db.query("SELECT * FROM gaia LIMIT 10")
```

<div class="note">The <code>db.sql</code> and <code>db.query</code> methods return a promise to an <a href="./arrow">Arrow table</a>. This columnar representation is much more efficient than an array-of-objects. You can inspect the contents of an Arrow table using <a href="../inputs/table"><code>Inputs.table</code></a> and pass the data to <a href="./plot">Plot</a>.</div>

And `db.queryRow`:

```js echo
Expand Down
Binary file added docs/lib/quakes.db
Binary file not shown.
1 change: 1 addition & 0 deletions docs/lib/quakes.db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
duckdb docs/lib/quakes.db -c "CREATE TABLE events AS (FROM 'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv');"
2 changes: 1 addition & 1 deletion docs/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sql:

# SQL <a href="https://github.com/observablehq/framework/releases/tag/v1.2.0" target="_blank" class="observablehq-version-badge" data-version="^1.2.0" title="Added in v1.2.0"></a>

Observable Framework includes built-in support for client-side SQL powered by [DuckDB](./lib/duckdb). You can use SQL to query data from [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), and [Apache Parquet](./lib/arrow#apache-parquet) files, which can either be static or generated by [data loaders](./loaders).
Observable Framework includes built-in support for client-side SQL powered by [DuckDB](./lib/duckdb). You can use SQL to query data from [CSV](./lib/csv), [TSV](./lib/csv), [JSON](./javascript/files#json), [Apache Arrow](./lib/arrow), [Apache Parquet](./lib/arrow#apache-parquet), and DuckDB database files, which can either be static or generated by [data loaders](./loaders).

To use SQL, first register the desired tables in the page’s [front matter](./markdown#front-matter) using the **sql** option. Each key is a table name, and each value is the path to the corresponding data file. For example, to register a table named `gaia` from a Parquet file:

Expand Down
3 changes: 3 additions & 0 deletions src/client/stdlib/duckdb.js
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,9 @@ async function insertFile(database, name, file, options) {
if (/\.parquet$/i.test(file.name)) {
return await connection.query(`CREATE VIEW '${name}' AS SELECT * FROM parquet_scan('${file.name}')`);
}
if (/\.(db|ddb|duckdb)$/i.test(file.name)) {
return await connection.query(`ATTACH '${file.name}' AS ${name} (READ_ONLY)`);
}
throw new Error(`unknown file type: ${file.mimeType}`);
}
} finally {
Expand Down

0 comments on commit a020a7b

Please sign in to comment.