Skip to content

Commit

Permalink
Merge branch 'main' into 202407-notebook-on-system-fields
Browse files Browse the repository at this point in the history
  • Loading branch information
petermarshallio authored Nov 4, 2024
2 parents a711db3 + 750c1bb commit 5f79b4a
Show file tree
Hide file tree
Showing 57 changed files with 4,153 additions and 966 deletions.
17 changes: 17 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-json
- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
hooks:
- id: codespell
name: codespell
description: Checks for common misspellings in text files.
entry: codespell --ignore-words=ignore-spelling-words.txt
language: python
types: [text]
14 changes: 11 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,13 @@ The "Learn Druid" repository contains all manner of resources to help you learn

It contains:

* Jupyter Notebooks that guide you through query, ingestion, and data management with Apache Druid.
* [Jupyter Notebooks](notebooks) that guide you through query, ingestion, and data management with Apache Druid.
* A Docker Compose file to get you up and running with a learning lab.

Suggestions or comments? Call into the [discussions](https://github.com/implydata/learn-druid/discussions). Found a problem or want to request a notebook? Raise an [issue](https://github.com/implydata/learn-druid/issues). Want to contribute? Raise a [PR](https://github.com/implydata/learn-druid/pulls).


[Contributions](contributing.md) to this community resource are welcome! Contribute your own notebook on a topic that's not listed here, and check out the [issue](https://github.com/implydata/learn-druid/issues) list, where you'll find bugs and enhancement requests.

Come meet your friendly Apache Druid [community](https://druid.apache.org/community) if you have any questions about the functionality you see here.

## Pre-requisites
Expand All @@ -46,7 +48,7 @@ To use the "Learn Druid" Docker Compose, you need:
To get started quickly:

1. Clone the repository:

```bash
git clone https://github.com/implydata/learn-druid
```
Expand Down Expand Up @@ -98,6 +100,12 @@ To stop all services:
docker compose --profile all-services down
```

To stop all services without keeping any data:

```bash
docker compose --profile all-services down -v
```

Run the notebooks against an existing Apache Druid database using the `DRUID_HOST` parameter and the `jupyter` profile.

```bash
Expand Down
60 changes: 46 additions & 14 deletions contributing.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,60 @@
# Contributing

You may want to update the Jupyter image to access new or updated tutorial notebooks,
include new Python packages, or update configuration files.
As a resource for developers working with Apache Druid, committers welcome contributions from across the world!

To build the custom Jupyter image locally:
## Build a notebook

1. Clone the Druid repo if you haven't already.
2. Navigate to `examples/quickstart/jupyter-notebooks` in your Druid source repo.
3. Edit the image definition in `Dockerfile`.
4. Navigate to the `docker-jupyter` directory.
5. Generate the new build using the following command:
Here are some general guidelines on making a notebook.

### Use the standard template

The [contributing](https://github.com/implydata/learn-druid/tree/main/notebooks/99-contributing) folder contains a notebook template as a starting point. You'll find boilerplate elements including:

* Setting the connection to Druid, Kafka, and the data generator.
* Starter elements for ingesting from example data sets or the data generator.
* Clean-up elements, like dropping tables, stopping streaming ingestion, and halting data generator jobs.
* Reusable code elements that other contributors have found useful.

And don't forget that the template itself is open to contribution!

### Raise a PR

Please install and run the [pre-commit](https://pre-commit.com/) before raising PRs.

```bash
pip install pre-commit
pre-commit install
```

When you have a notebook and you're ready for feedback, it's a good idea to raise a draft PR first. Feel free to use the comments section to ask for initial feedback, or drop into the docs channel in the official Apache Druid Slack channel.

And when it's ready to go, finalize your PR. Add reviewers, get formal feedback, make any necessary changes, etc. in the usual way.

## Good things to know...

### Test with a specific version of Apache Druid

Use the `DRUID_VERSION` environment variable to set the specific version of Druid that you would like to build.

The version is pulled from Imply's [Docker Hub](https://hub.docker.com/r/imply/druid/tags) repository, where multi-architecture builds of Apache Druid with necessary extensions and configurations are published.

```shell
DRUID_VERSION=27.0.0 docker compose --profile all-services -f docker-compose-local.yaml up -d --build
DRUID_VERSION=27.0.0 docker compose --profile all-services -d
```

You can change the value of `DRUID_VERSION` or the profile used from the Docker Compose file.
6. To test all notebooks,
make sure that docker compose is down and all volumes have been deleted, then start tests with:
Use the same route to run a locally-built Docker image by using the appropriate tag.

### Run automated tests on notebooks

Make sure that docker compose is down and all volumes have been deleted, then start tests with:

```shell
cd tests
./test-notebooks.sh
```
7. To test single notebook:
```

To test single notebook:

```shell
cd tests
./test-notebooks.sh ../notebooks/<path to test notebook>
Expand Down
10 changes: 5 additions & 5 deletions docker-compose-local.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ services:
environment:
- ZOO_MY_ID=1
- ALLOW_ANONYMOUS_LOGIN=yes

kafka:
image: bitnami/kafka:3.6.2
container_name: kafka-broker
Expand Down Expand Up @@ -78,7 +78,7 @@ services:
volumes:
- druid_shared:/opt/shared
- coordinator_var:/opt/druid/var
depends_on:
depends_on:
- zookeeper
- postgres
ports:
Expand All @@ -95,7 +95,7 @@ services:
volumes:
- broker_var:/opt/druid/var
- druid_shared:/opt/shared
depends_on:
depends_on:
- zookeeper
- postgres
- coordinator
Expand All @@ -113,7 +113,7 @@ services:
volumes:
- druid_shared:/opt/shared
- historical_var:/opt/druid/var
depends_on:
depends_on:
- zookeeper
- postgres
- coordinator
Expand All @@ -131,7 +131,7 @@ services:
volumes:
- druid_shared:/opt/shared
- middle_var:/opt/druid/var
depends_on:
depends_on:
- zookeeper
- postgres
- coordinator
Expand Down
52 changes: 38 additions & 14 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ volumes:
metadata_data: {}
middle_var: {}
historical_var: {}
historical_2_var: {}
broker_var: {}
coordinator_var: {}
router_var: {}
Expand All @@ -33,7 +34,7 @@ services:
postgres:
image: postgres:alpine3.20
container_name: postgres
profiles: ["druid-jupyter", "all-services"]
profiles: ["druid-jupyter", "all-services", "tiered-druid-jupyter"]
volumes:
- metadata_data:/var/lib/postgresql/data
environment:
Expand All @@ -45,11 +46,11 @@ services:
zookeeper:
image: zookeeper:3.9.2
container_name: zookeeper
profiles: ["druid-jupyter", "kafka-jupyter", "all-services"]
profiles: ["druid-jupyter", "kafka-jupyter", "all-services", "tiered-druid-jupyter"]
environment:
- ZOO_MY_ID=1
- ALLOW_ANONYMOUS_LOGIN=yes

kafka:
image: bitnami/kafka:3.6.2
container_name: kafka-broker
Expand All @@ -73,11 +74,11 @@ services:
coordinator:
image: imply/druid:${DRUID_VERSION:-30.0.0}
container_name: coordinator
profiles: ["druid-jupyter", "all-services"]
profiles: ["druid-jupyter", "all-services", "tiered-druid-jupyter"]
volumes:
- druid_shared:/opt/shared
- coordinator_var:/opt/druid/var
depends_on:
depends_on:
- zookeeper
- postgres
ports:
Expand All @@ -90,11 +91,11 @@ services:
broker:
image: imply/druid:${DRUID_VERSION:-30.0.0}
container_name: broker
profiles: ["druid-jupyter", "all-services"]
profiles: ["druid-jupyter", "all-services", "tiered-druid-jupyter"]
volumes:
- broker_var:/opt/druid/var
- druid_shared:/opt/shared
depends_on:
depends_on:
- zookeeper
- postgres
- coordinator
Expand All @@ -108,11 +109,11 @@ services:
historical:
image: imply/druid:${DRUID_VERSION:-30.0.0}
container_name: historical
profiles: ["druid-jupyter", "all-services"]
profiles: ["druid-jupyter", "all-services", "tiered-druid-jupyter"]
volumes:
- druid_shared:/opt/shared
- historical_var:/opt/druid/var
depends_on:
depends_on:
- zookeeper
- postgres
- coordinator
Expand All @@ -123,14 +124,37 @@ services:
env_file:
- environment

historical_slow:
image: imply/druid:${DRUID_VERSION:-30.0.0}
container_name: historical_slow
profiles: ["tiered-druid-jupyter"]
volumes:
- druid_shared:/opt/shared
- historical_2_var:/opt/druid/var
depends_on:
- zookeeper
- postgres
- coordinator
- historical
ports:
- "8084:8084"
command:
- historical
env_file:
- environment
environment:
DRUID_SINGLE_NODE_CONF: "nano-quickstart"
druid_plaintextport: "8084"
druid_server_tier: "slow"

middlemanager:
image: imply/druid:${DRUID_VERSION:-30.0.0}
container_name: middlemanager
profiles: ["druid-jupyter", "all-services"]
profiles: ["druid-jupyter", "all-services", "tiered-druid-jupyter"]
volumes:
- druid_shared:/opt/shared
- middle_var:/opt/druid/var
depends_on:
depends_on:
- zookeeper
- postgres
- coordinator
Expand All @@ -145,7 +169,7 @@ services:
router:
image: imply/druid:${DRUID_VERSION:-30.0.0}
container_name: router
profiles: ["druid-jupyter", "all-services"]
profiles: ["druid-jupyter", "all-services", "tiered-druid-jupyter"]
volumes:
- router_var:/opt/druid/var
depends_on:
Expand All @@ -162,7 +186,7 @@ services:
jupyter:
image: imply/druid-notebook:latest
container_name: jupyter
profiles: ["jupyter", "kafka-jupyter", "druid-jupyter", "all-services"]
profiles: ["jupyter", "kafka-jupyter", "druid-jupyter", "all-services", "tiered-druid-jupyter"]
environment:
JUPYTER_ENABLE_LAB: "yes"
JUPYTER_TOKEN: "docker"
Expand All @@ -179,7 +203,7 @@ services:
datagen:
image: imply/datagen:latest
container_name: datagen
profiles: ["jupyter", "kafka-jupyter", "druid-jupyter", "all-services"]
profiles: ["jupyter", "kafka-jupyter", "druid-jupyter", "all-services", "tiered-druid-jupyter"]
ports:
- "${DATAGEN_PORT:-9999}:9999"
volumes:
Expand Down
1 change: 0 additions & 1 deletion environment
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,3 @@ druid_export_storage_baseDir=/opt/shared/exports


DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration>

8 changes: 8 additions & 0 deletions ignore-spelling-words.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
EGE
MKE
MOT
SAV
AGS

Rouge
Nome
5 changes: 2 additions & 3 deletions jupyter-img/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ RUN pip install requests \
pip install bokeh \
pip install kafka-python \
pip install sortedcontainers \
pip install tqdm
pip install tqdm

# Install druidapi client from apache/druid
# Local install requires sudo privileges
# Local install requires sudo privileges
USER root
ADD druidapi /home/jovyan/druidapi
WORKDIR /home/jovyan/druidapi
Expand All @@ -56,4 +56,3 @@ RUN mkdir -p /home/jovyan/notebooks

WORKDIR /home/jovyan/notebooks
USER jovyan

Loading

0 comments on commit 5f79b4a

Please sign in to comment.