Skip to content

Latest commit

 

History

History
72 lines (46 loc) · 3.84 KB

README.md

File metadata and controls

72 lines (46 loc) · 3.84 KB

opendosm-clickhouse

Access OpenDOSM (Department of Statistics Malaysia) data from ClickHouse.

How it works

OpenDOSM provides public GCS buckets where you can download the dataset as CSV or Parquet files. This project uses dbt to pull these files into ClickHouse so that you can interact with it using SQL.

Why?

Mostly for fun. I work as a data engineer, and this is the kind of things I do at work. Besides that, the data can be used for various purposes, e.g. as training set for quantitative economics models.

Usage

You need to have Docker installed.

Run:

scripts/start.sh

Then go to http://localhost:8123/play?user=default#c2VsZWN0ICogZnJvbSBkb3NtLmNwaV9jb3Jl.

Or use clickhouse-client:

$ docker compose exec clickhouse-server clickhouse-client

ClickHouse client version 23.1.3.5 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.1.3 revision 54461.

Warnings:
 * Linux is not using a fast clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource

a077a96c66c7 :) select * from dosm.cpi_core

SELECT *
FROM dosm.cpi_core

Query id: d882e59d-335f-44f5-b5a8-c607d36aee40

┌───────date─┬────────────overall─┬──────food_beverage─┬─clothing_footwear─┬──housing_utilities─┬────────furnishings─┬─────────────health─┬──────────transport─┬─────communication─┬─recreation_culture─┬──────────education─┬────────hospitality─┬───────────────misc─┐
│ 2015-01-01 │              109.3 │              114.8 │              98.6 │                111 │              106.8 │              111.7 │              104.8 │              97.2 │              105.3 │              111.1 │              118.6 │              106.2 │
│ 2015-02-01 │              109.7 │              115.1 │              98.2 │              112.2 │              106.6 │              112.1 │              104.9 │              97.2 │              105.7 │              111.7 │              118.8 │              106.7 │
...
│ 2022-12-01 │              127.2 │              146.1 │              93.4 │              130.1 │              124.4 │              126.3 │              119.8 │              97.4 │              115.6 │              122.9 │              143.9 │              119.9 │
└────────────┴────────────────────┴────────────────────┴───────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴───────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┘

96 rows in set. Elapsed: 0.007 sec.

You can also use Grafana at http://localhost:3000 to visualize the data.

To list all tables:

SHOW TABLES FROM dosm

TODO

  • Use something like Lightdash instead of Grafana
  • Add price checker dataset
  • Use incremental materialization for daily data