Skip to content

Latest commit

 

History

History
215 lines (155 loc) · 6.43 KB

duckdbex_sandbox.livemd

File metadata and controls

215 lines (155 loc) · 6.43 KB

Duckdbex sandbox

Mix.install([{:duckdbex, "~> 0.2.5"}])

Starting with DuckDB

Let's open DuckDB database in the memory. To do this, you do not need to specify any parameters in the Duckdbex.open/1 function.

{:ok, db} = Duckdbex.open()

Now we need create a coonection to the opened database

{:ok, conn} = Duckdbex.connection(db)

So, now we can use this connection conn to make Queries. Lets create the simple users table. At now, we are not interested in the results of this query, and just ignoring the the result _r.

{:ok, _r} =
  Duckdbex.query(conn, "CREATE OR REPLACE TABLE users(id INTEGER PRIMARY KEY, name VARCHAR);")

and add two users to it

{:ok, _r} = Duckdbex.query(conn, "INSERT INTO users VALUES (1, 'Alex'), (2, 'Sofia');")

Let's get everything we just put in. Duckdbex.query will return the reference to the result, so to get real data we must call Duckdbex.fetch_all and pass the reference into.

# Making query
{:ok, res} = Duckdbex.query(conn, "SELECT * FROM users;")

# Fetching result
# here we pass the reference from previous call
Duckdbex.fetch_all(res)

In principle, this is all :), but embedded DuckDB is interesting not only for this, but for the ability to support analytical query workloads, handy import/export using CSV and Parquet files format, nested/composite data types

Nested / Composite Types

DuckDB supports three nested data types: LIST, STRUCT and MAP. Each supports different use cases and has a different structure. Let's try some of them.

MAP and Appender

Create a table with MAP column type

sql = "CREATE TABLE books(id INTEGER PRIMARY KEY, details MAP(STRING, STRING));"
{:ok, _r} = Duckdbex.query(conn, sql)

Suppose we want to insert a lot of data into this table. Folowing the DuckDB recommendations for bulk data insertions we will use Appender for task. The Appender is tied to a connection, and will use the transaction context of that connection when appending. An Appender always appends to a single table in the database file. So, let's create the Appender for books table.

{:ok, appender} = Duckdbex.appender(conn, "books")

Now we are preparing the huge amount of books data. MAPs must have a single type for all keys, and a single type for all values. In contrast, STRUCTs must have string keys, but each key may have a value of a different type.

a_looot_of_books_list = [
  [
    1,
    %{
      "title" => "Computer Architecture: A Quantitative Approach",
      "paperback pages" => "856",
      "dimensions" => "7.5 x 1.75 x 9 inches"
    }
  ],
  [
    2,
    %{
      "title" => "The Pragmatic Programmer",
      "publicher" => "David Thomas"
    }
  ]
]

Ok, now we are inserting this data into table using our Appender

Duckdbex.appender_add_rows(appender, a_looot_of_books_list)

Any values added to the appender are cached prior to being inserted into the database system for performance reasons. That means that, while appending, the rows might not be immediately visible in the system. The cache can be manually flushed using the Duckdbex.appender_flush/1 method. Let's flush them.

Duckdbex.appender_flush(appender)

It's time to get our data back.

{:ok, res} = Duckdbex.query(conn, "SELECT * FROM books")
Duckdbex.fetch_all(res)

UNION

A UNION type (not to be confused with the SQL UNION operator) is a nested type capable of holding one of multiple “alternative” values, much like the union in C. The main difference being that these UNION types are tagged unions and thus always carry a discriminator “tag” which signals which alternative it is currently holding, even if the inner value itself is null. Create a table with a union column.

sql = """
  CREATE OR REPLACE TABLE secret_question(
    user_id INT,
    question UNION(age INT, name VARCHAR, sequence INT[]));
"""

{:ok, _r} = Duckdbex.query(conn, sql)

Insert data into table

sql = """
  INSERT INTO secret_question
  VALUES (1, 42) , (2, 'Ring of Power'), (3, [42, 24, 22, 44]);
"""

{:ok, _r} = Duckdbex.query(conn, sql)

and get it back

{:ok, res} = Duckdbex.query(conn, "SELECT * from secret_question;")
Duckdbex.fetch_all(res)

CSV

Let's assume we have a CSV file with this data

FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
1988-01-03|AA|New York, NY|Los Angeles, CA

We can load and query this data easily. At first we will create the file with the data.

File.write!("test.csv", """
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
1988-01-03|AA|New York, NY|Los Angeles, CA
""")

Read a CSV file from disk, using auto-infer option

{:ok, res} = Duckdbex.query(conn, "SELECT * FROM 'test.csv';")
Duckdbex.fetch_all(res)

The DuckDB CSV reader can automatically infer which configuration flags to use by analyzing the CSV file. This will work correctly in most situations, and should be the first option attempted. In rare situations where the CSV reader cannot figure out the correct configuration it is possible to manually configure the CSV reader to correctly parse the CSV file.

The same way we can write this data back into file

{:ok, _r} =
  Duckdbex.query(conn, """
    CREATE OR REPLACE TABLE ontime(
      FlightDate DATE,
      UniqueCarrier VARCHAR,
      OriginCityName VARCHAR,
      DestCityName VARCHAR);
  """)

{:ok, _r} = Duckdbex.query(conn, "COPY ontime FROM 'test.csv' (AUTO_DETECT TRUE);")

{:ok, res} = Duckdbex.query(conn, "SELECT * FROM ontime;")
Duckdbex.fetch_all(res)
Duckdbex.query(
  conn,
  "COPY (SELECT * FROM ontime) TO 'test_2.csv' WITH (HEADER 1, DELIMITER '|');"
)

File.read!("test_2.csv")

Extensions

DuckDB has a number of extensions available for use. Not all of them are included by default in every distribution, but DuckDB has a mechanism that allows for remote installation.

So, let's try the parquet extension. At first we must install it:

{:ok, _} = Duckdbex.query(conn, "INSTALL 'parquet';")

Extension will be downloaded to the local host. Next, we must load it into our app:

{:ok, _} = Duckdbex.query(conn, "LOAD 'parquet';")