Add new planner, JDBC driver, and DDL machinery

linkedin · Nov 26, 2024 · 7d2a3f1 · 7d2a3f1
1 parent 67fda04
commit 7d2a3f1
Show file tree

Hide file tree

Showing 160 changed files with 9,055 additions and 1,471 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -1,8 +1,6 @@
 FROM eclipse-temurin:18
 WORKDIR /home/
-ADD ./hoptimator-cli-integration/build/distributions/hoptimator-cli-integration.tar ./
 ADD ./hoptimator-operator-integration/build/distributions/hoptimator-operator-integration.tar ./
 ADD ./etc/* ./
 ENTRYPOINT ["/bin/sh", "-c"]
-CMD ["./hoptimator-cli-integration/bin/hoptimator-cli-integration -n '' -p '' -u jdbc:calcite:model=model.yaml"]
 
diff --git a/Makefile b/Makefile
@@ -1,19 +1,25 @@
 
+install:
+	./gradlew installDist
+
 build:
 	./gradlew build
 	docker build . -t hoptimator
 	docker build hoptimator-flink-runner -t hoptimator-flink-runner
 
-bounce: build undeploy deploy deploy-samples deploy-config
+bounce: build undeploy deploy deploy-samples deploy-config deploy-demo
 
 integration-tests:
-	./bin/hoptimator --run=./integration-tests.sql
-	echo "\nPASS"
+	echo "\nNOTHING TO DO FOR NOW"
 
 clean:
 	./gradlew clean
 
+deploy-demo:
+	kubectl apply -f ./deploy/samples/demodb.yaml
+
 deploy: deploy-config
+	kubectl apply -f ./hoptimator-k8s/src/main/resources/
 	kubectl apply -f ./deploy
 
 undeploy:

diff --git a/README.md b/README.md
@@ -1,87 +1,68 @@
 # Hoptimator
 
-Multi-hop declarative data pipelines
+## Intro
 
-## What is Hoptimator?
+Hoptimator gives you a SQL interface to a Kubernetes cluster. You can install databases, query tables, create views, and deploy data pipelines using just SQL.
 
-Hoptimator is an SQL-based control plane for complex data pipelines.
-
-Hoptimator turns high-level SQL _subscriptions_ into multi-hop data pipelines. Pipelines may involve an auto-generated Flink job (or similar) and any arbitrary resources required for the job to run. 
-
-## How does it work?
-
-Hoptimator has a pluggable _adapter_ framework, which lets you wire-up arbtitary data sources. Adapters loosely correspond to connectors in the underlying compute engine (e.g. Flink Connectors), but they may include custom control plane logic. For example, an adapter may create a cache or a CDC stream as part of a pipeline. This enables a single pipeline to span multiple "hops" across different systems (as opposed to, say, a single Flink job).
-
-Hoptimator's pipelines tend to have the following general shape:
-
-                                    _________
-    topic1 ----------------------> |         |
-    table2 --> CDC ---> topic2 --> | SQL job | --> topic4
-    table3 --> rETL --> topic3 --> |_________|
+To install a database, use `kubectl`:
 
+```
+ $ kubectl apply -f my-database.yaml
+```
 
-The three data sources on the left correspond to three different adapters:
+(`create database` is coming soon!)
 
-1. `topic1` can be read directly from a Flink job, so the first adapter simply configures a Flink connector.
-2. `table2` is inefficient for bulk access, so the second adapter creates a CDC stream (`topic2`) and configures a Flink connector to read from _that_.
-3. `table3` is in cold storage, so the third adapter creates a reverse-ETL job to re-ingest the data into Kafka.
+Then use Hoptimator DDL to create a materialized view:
 
-In order to deploy such a pipeline, you only need to write one SQL query, called a _subscription_. Pipelines are constructed automatically based on subscriptions.
+```
+ > create materialized view my.foo as select * from ads.page_views;
+```
 
-## Quick Start
+Views created via DDL show up in Kubernetes as `views`:
 
-### Prerequistes
+```
+ $ kubectl get views
+ NAME     SCHEMA    VIEW   SQL
+ my-foo   DEFAULT   FOO    SELECT *...
 
-1. `docker` is installed and docker daemon is running
-2. `kubectl` is installed and cluster is running
-   1. `minikube` can be used for a local cluster
-3. `helm` for Kubernetes is installed
+```
 
-### Run
+Materialized views result in `pipelines`:
 
 ```
-  $ make quickstart
-  ... wait a while ...
-  $ ./bin/hoptimator
-  > !intro
-  > !q
+ $ kubectl get pipelines
+ NAME     SQL               STATUS
+ my-foo   INSERT INTO...    Ready.
 ```
 
-## Subscriptions
+## Quickstart
 
-Subscriptions are SQL views that are automatically materialized by a pipeline.
+Hoptimator requires a Kubernetes cluster. To connect from outside a Kubernetes cluster, make sure your `kubectl` is properly configured.
 
 ```
-  $ kubectl apply -f deploy/samples/subscriptions.yaml
+  $ make install            # build and install SQL CLI
+  $ make deploy deploy-demo # install CRDs and K8s objects
+  $ ./hoptimator
+  > !intro
 ```
 
-In response, the operator will deploy a Flink job and other resources:
+## The SQL CLI
 
-```
-  $ kubectl get subscriptions
-  $ kubectl get flinkdeployments
-  $ kubectl get kafkatopics
-```
+The `./hoptimator` script launches the [sqlline](https://github.com/julianhyde/sqlline) SQL CLI pre-configured to connect to `:jdbc:hoptimator://`. The CLI includes some additional commands. See `!intro`.
 
-You can verify the job is running by inspecting the output:
+## The JDBC Driver
 
-```
-  $ ./bin/hoptimator
-  > !tables
-  > SELECT * FROM RAWKAFKA."products" LIMIT 5;
-  > !q
-```
+To use Hoptimator from Java code, or from anything that supports JDBC, use the `:jdbc:hoptimator://` JDBC driver.
 
 ## The Operator
 
-Hoptimator-operator is a Kubernetes operator that orchestrates multi-hop data pipelines based on Subscriptions (a custom resource). When a Subscription is deployed, the operator:
+`hoptimator-operator` turns materialized views into real data pipelines.
 
-1. creates a _plan_ based on the Subscription SQL. The plan includes a set of _resources_ that make up a _pipeline_.
-2. deploys each resource in the pipeline. This may involve creating Kafka topics, Flink jobs, etc.
-3. reports Subscription status, which depends on the status of each resource in the pipeline.
+## Extending Hoptimator
 
-The operator is extensible via _adapters_. Among other responsibilities, adapters can implement custom control plane logic (see `ControllerProvider`), or they can depend on external operators. For example, the Kafka adapter actively manages Kafka topics using a custom controller. The Flink adapter defers to [flink-kubernetes-operator](https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/) to manage Flink jobs.
+Hoptimator can be extended via `TableTemplates`:
 
-## The CLI
+```
+ $ kubectl apply -f my-table-template.yaml
+```
 
-Hoptimator includes a SQL CLI based on [sqlline](https://github.com/julianhyde/sqlline). This is primarily for testing and debugging purposes, but it can also be useful for runnig ad-hoc queries. The CLI leverages the same adapters as the operator, but it doesn't deploy anything. Instead, queries run as local, in-process Flink jobs.
diff --git a/deploy/hoptimator-pod.yaml b/deploy/hoptimator-pod.yaml
diff --git a/deploy/samples/demodb.yaml b/deploy/samples/demodb.yaml
@@ -0,0 +1,35 @@
+apiVersion: hoptimator.linkedin.com/v1alpha1
+kind: Database
+metadata:
+  name: ads-database
+spec:
+  schema: ADS
+  url: jdbc:demodb://ads
+  dialect: Calcite
+
+---
+
+apiVersion: hoptimator.linkedin.com/v1alpha1
+kind: Database
+metadata:
+  name: profile-database
+spec:
+  schema: PROFILE
+  url: jdbc:demodb://profile
+  dialect: Calcite
+
+---
+
+apiVersion: hoptimator.linkedin.com/v1alpha1
+kind: TableTemplate
+metadata:
+  name: demodb-template
+spec:
+  databases:
+  - profile-database
+  - ads-database
+  connector: |
+    connector = demo
+    database = {{database}}
+    table = {{table}}
+
diff --git a/etc/integration-tests.sql b/etc/integration-tests.sql
diff --git a/generate-models.sh b/generate-models.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+
+docker pull ghcr.io/kubernetes-client/java/crd-model-gen:v1.0.6
+
+docker run \
+  --rm \
+  --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
+  --mount type=bind,src="$(pwd)",dst="$(pwd)" \
+  -ti \
+  --network host \
+  ghcr.io/kubernetes-client/java/crd-model-gen:v1.0.6 \
+  /generate.sh -o "$(pwd)/hoptimator-k8s" -n "" -p "com.linkedin.hoptimator.k8s" \
+  -u "$(pwd)/hoptimator-k8s/src/main/resources/databases.crd.yaml" \
+  -u "$(pwd)/hoptimator-k8s/src/main/resources/pipelines.crd.yaml" \
+  -u "$(pwd)/hoptimator-k8s/src/main/resources/tabletemplates.crd.yaml" \
+  -u "$(pwd)/hoptimator-k8s/src/main/resources/views.crd.yaml" \
+  -u "$(pwd)/hoptimator-k8s/src/main/resources/subscriptions.crd.yaml" \
+  && echo "done."
diff --git a/gradle/libs.versions.toml b/gradle/libs.versions.toml
@@ -1,31 +1,31 @@
 [libraries]
 assertj = "org.assertj:assertj-core:3.12.0"
 avro = "org.apache.avro:avro:1.10.2"
-calciteAvatica = "org.apache.calcite.avatica:avatica:1.23.0"
-calciteCore = "org.apache.calcite:calcite-core:1.34.0"
-flinkClients = "org.apache.flink:flink-clients:1.16.2"
-flinkConnectorBase = "org.apache.flink:flink-connector-base:1.16.2"
-flinkCore = "org.apache.flink:flink-core:1.16.2"
-flinkCsv = "org.apache.flink:flink-csv:1.16.2"
-flinkStreamingJava = "org.apache.flink:flink-streaming-java:1.16.2"
-flinkTableApiJava = "org.apache.flink:flink-table-api-java:1.16.2"
-flinkTableApiJavaBridge = "org.apache.flink:flink-table-api-java-bridge:1.16.2"
-flinkTableCommon = "org.apache.flink:flink-table-common:1.16.2"
-flinkTablePlanner = "org.apache.flink:flink-table-planner_2.12:1.16.2"
-flinkTableRuntime = "org.apache.flink:flink-table-runtime:1.16.2"
-flinkMetricsDropwizard = "org.apache.flink:flink-metrics-dropwizard:1.16.2"
-flinkConnectorKafka = "org.apache.flink:flink-sql-connector-kafka:1.16.2"
-flinkConnectorMySqlCdc = "com.ververica:flink-sql-connector-mysql-cdc:2.3.0"
+calcite-avatica = "org.apache.calcite.avatica:avatica:1.23.0"
+calcite-core = "org.apache.calcite:calcite-core:1.34.0"
+calcite-server = "org.apache.calcite:calcite-server:1.34.0"
+flink-clients = "org.apache.flink:flink-clients:1.16.2"
+flink-connector-base = "org.apache.flink:flink-connector-base:1.16.2"
+flink-core = "org.apache.flink:flink-core:1.16.2"
+flink-csv = "org.apache.flink:flink-csv:1.16.2"
+flink-streaming-java = "org.apache.flink:flink-streaming-java:1.16.2"
+flink-table-api-java = "org.apache.flink:flink-table-api-java:1.16.2"
+flink-table-api-java-bridge = "org.apache.flink:flink-table-api-java-bridge:1.16.2"
+flink-table-common = "org.apache.flink:flink-table-common:1.16.2"
+flink-table-planner = "org.apache.flink:flink-table-planner_2.12:1.16.2"
+flink-table-runtime = "org.apache.flink:flink-table-runtime:1.16.2"
+flink-connector-kafka = "org.apache.flink:flink-sql-connector-kafka:1.16.2"
+flink-connector-mysql-cdc = "com.ververica:flink-sql-connector-mysql-cdc:2.3.0"
 gson = "com.google.code.gson:gson:2.9.0"
 jackson = "com.fasterxml.jackson.core:jackson-core:2.14.1"
-jacksonYaml = "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.14.1"
-javaxAnnotationApi = "javax.annotation:javax.annotation-api:1.3.2"
+jackson-dataformat-yaml = "com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.14.1"
+javax-annotation-api = "javax.annotation:javax.annotation-api:1.3.2"
 junit = "junit:junit:4.12"
-kafkaClients = "org.apache.kafka:kafka-clients:2.7.1"
-kubernetesClient = "io.kubernetes:client-java:16.0.2"
-kubernetesExtendedClient = "io.kubernetes:client-java-extended:16.0.2"
-slf4jSimple = "org.slf4j:slf4j-simple:1.7.30"
-slf4jApi = "org.slf4j:slf4j-api:1.7.30"
+kafka-clients = "org.apache.kafka:kafka-clients:2.7.1"
+kubernetes-client = "io.kubernetes:client-java:16.0.2"
+kubernetes-extended-client = "io.kubernetes:client-java-extended:16.0.2"
+slf4j-simple = "org.slf4j:slf4j-simple:1.7.30"
+slf4j-api = "org.slf4j:slf4j-api:1.7.30"
 sqlline = "sqlline:sqlline:1.12.0"
-commonsCli = 'commons-cli:commons-cli:1.4'
-
+commons-cli = "commons-cli:commons-cli:1.4"
+quidem = "net.hydromatic:quidem:0.11"
diff --git a/hoptimator b/hoptimator
@@ -0,0 +1,8 @@
+#!/bin/sh
+
+BASEDIR="$( cd "$( dirname "$0" )" && pwd )"
+
+$BASEDIR/hoptimator-cli/build/install/hoptimator-cli/bin/hoptimator-cli sqlline.SqlLine \
+  -ac sqlline.HoptimatorAppConfig \
+  -u jdbc:hoptimator:// -n "" -p "" -nn "Hoptimator" $@
+
diff --git a/hoptimator-api/build.gradle b/hoptimator-api/build.gradle
@@ -0,0 +1,7 @@
+plugins {
+  id 'java'
+}
+
+dependencies {
+  // plz keep it this way
+}
diff --git a/hoptimator-api/src/main/java/com/linkedin/hoptimator/Catalog.java b/hoptimator-api/src/main/java/com/linkedin/hoptimator/Catalog.java
@@ -0,0 +1,12 @@
+package com.linkedin.hoptimator;
+
+import java.sql.Wrapper;
+import java.sql.SQLException;
+
+/** Registers a set of tables, possibly within schemas and sub-schemas. */
+public interface Catalog {
+
+  String name();
+  String description();
+  void register(Wrapper parentSchema) throws SQLException;
+}
diff --git a/hoptimator-api/src/main/java/com/linkedin/hoptimator/CatalogProvider.java b/hoptimator-api/src/main/java/com/linkedin/hoptimator/CatalogProvider.java
@@ -0,0 +1,8 @@
+package com.linkedin.hoptimator;
+
+import java.util.Collection;
+
+public interface CatalogProvider {
+
+  Collection<Catalog> catalogs();
+}
diff --git a/hoptimator-api/src/main/java/com/linkedin/hoptimator/Connector.java b/hoptimator-api/src/main/java/com/linkedin/hoptimator/Connector.java
@@ -0,0 +1,9 @@
+package com.linkedin.hoptimator;
+
+import java.util.Map;
+import java.sql.SQLException;
+
+public interface Connector<T> {
+
+  Map<String, String> configure(T t) throws SQLException;
+}
diff --git a/hoptimator-api/src/main/java/com/linkedin/hoptimator/ConnectorProvider.java b/hoptimator-api/src/main/java/com/linkedin/hoptimator/ConnectorProvider.java
@@ -0,0 +1,8 @@
+package com.linkedin.hoptimator;
+
+import java.util.Collection;
+
+public interface ConnectorProvider {
+
+  <T> Collection<Connector<T>> connectors(Class<T> clazz);
+}
diff --git a/hoptimator-api/src/main/java/com/linkedin/hoptimator/Database.java b/hoptimator-api/src/main/java/com/linkedin/hoptimator/Database.java
@@ -0,0 +1,8 @@
+package com.linkedin.hoptimator;
+
+/** A collection of tables, as populated by a Catalog. */ 
+public interface Database {
+
+  /** Name of the database. */
+  String databaseName();
+}
diff --git a/hoptimator-api/src/main/java/com/linkedin/hoptimator/Deployable.java b/hoptimator-api/src/main/java/com/linkedin/hoptimator/Deployable.java
@@ -0,0 +1,14 @@
+package com.linkedin.hoptimator;
+
+import java.util.List;
+import java.sql.SQLException;
+
+public interface Deployable {
+
+  void create() throws SQLException;
+  void delete() throws SQLException;
+  void update() throws SQLException;
+
+  /** Render a list of specs, usually YAML. */
+  List<String> specify() throws SQLException;
+}