Skip to content

Commit

Permalink
ETL/CDC: Updates about DynamoDB, Kinesis, MongoDB, and Rockset
Browse files Browse the repository at this point in the history
  • Loading branch information
amotl committed Sep 16, 2024
1 parent 7aadde8 commit 1fc7f8d
Show file tree
Hide file tree
Showing 4 changed files with 90 additions and 45 deletions.
4 changes: 4 additions & 0 deletions docs/_include/links.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[Amazon DynamoDB Streams]: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
[Amazon Kinesis Data Streams]: https://docs.aws.amazon.com/streams/latest/dev/introduction.html
[BM25]: https://en.wikipedia.org/wiki/Okapi_BM25
[cloud-datashader-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb
[cloud-datashader-github]: https://github.com/crate/cratedb-examples/blob/amo/cloud-datashader/topic/timeseries/explore/cloud-datashader.ipynb
Expand All @@ -7,6 +8,8 @@
[Datashader]: https://datashader.org/
[Dynamic Database Schemas]: https://cratedb.com/product/features/dynamic-schemas
[DynamoDB CDC Relay]: https://cratedb-toolkit.readthedocs.io/io/dynamodb/cdc.html
[DynamoDB CDC Relay with AWS Lambda]: https://cratedb-toolkit.readthedocs.io/io/dynamodb/cdc-lambda.html
[DynamoDB Table Loader]: https://cratedb-toolkit.readthedocs.io/io/dynamodb/loader.html
[Geospatial Data Model]: https://cratedb.com/data-model/geospatial
[Geospatial Database]: https://cratedb.com/geospatial-spatial-database
[HNSW]: https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world
Expand All @@ -26,6 +29,7 @@
[langchain-rag-sql-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[MongoDB CDC Relay]: https://cratedb-toolkit.readthedocs.io/io/mongodb/cdc.html
[MongoDB Change Streams]: https://www.mongodb.com/docs/manual/changeStreams/
[MongoDB Table Loader]: https://cratedb-toolkit.readthedocs.io/io/mongodb/loader.html
[Multi-model Database]: https://cratedb.com/solutions/multi-model-database
[nearest neighbor search]: https://en.wikipedia.org/wiki/Nearest_neighbor_search
[Nested Data Structure]: https://cratedb.com/product/features/nested-data-structure
Expand Down
35 changes: 28 additions & 7 deletions docs/integrate/cdc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,18 @@ to use them optimally.
Please also have a look at support for [generic ETL](#etl) solutions.
:::

## Amazon Kinesis
You can use Amazon Kinesis Data Streams to collect and process large streams of data
records in real time. A typical Kinesis Data Streams application reads data from a
data stream as data records.

As such, a common application is to relay DynamoDB table change stream events to a
Kinesis Stream, and consume that from an adapter to a consolidation database.
:::{div}
- About: [Amazon Kinesis Data Streams]
- See: [](#cdc-dynamodb)
:::

## Debezium
Debezium is an open source distributed platform for change data capture (CDC).
It is built on top of Apache Kafka, a distributed streaming platform. It allows
Expand All @@ -30,19 +42,28 @@ SQL Server, IBM DB2, Cassandra, Vitess, Spanner, JDBC, and Informix.
- Webinar: [How to replicate data from other databases to CrateDB with Debezium and Kafka]
:::

(cdc-dynamodb)=
## DynamoDB
:::{div}
Tap into [Amazon DynamoDB Streams], to replicate CDC events from DynamoDB into CrateDB,
with support for CrateDB's container data types.
- {hyper-open}`Documentation <[DynamoDB CDC Relay]>`
- {hyper-read-more}`Blog <[Replicating CDC events from DynamoDB to CrateDB]>`
Support for loading DynamoDB tables into CrateDB (full-load), as well as
[Amazon DynamoDB Streams] and [Amazon Kinesis Data Streams],
to relay CDC events from DynamoDB into CrateDB.

- [DynamoDB Table Loader]
- [DynamoDB CDC Relay]

If you are looking into serverless replication using AWS Lambda:
- [DynamoDB CDC Relay with AWS Lambda]
- Blog: [Replicating CDC events from DynamoDB to CrateDB]
:::

## MongoDB
:::{div}
Tap into [MongoDB Change Streams], to relay CDC events from MongoDB into CrateDB,
with support for CrateDB's container data types.
- {hyper-open}`Documentation <[MongoDB CDC Relay]>`
Support for loading MongoDB collections and databases into CrateDB (full-load),
and [MongoDB Change Streams], to relay CDC events from MongoDB into CrateDB.

- [MongoDB Table Loader]
- [MongoDB CDC Relay]
:::

## StreamSets
Expand Down
33 changes: 28 additions & 5 deletions docs/integrate/etl/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,19 @@ to use them optimally.
Please also have a look at support for [](#cdc) solutions.


## Amazon Kinesis

Amazon Kinesis Data Streams is a serverless streaming data service that
simplifies the capture, processing, and storage of data streams at any
scale, such as application logs, website clickstreams, and IoT telemetry
data, for machine learning (ML), analytics, and other applications.
:::{div}
The [DynamoDB CDC Relay] pipeline uses Amazon Kinesis to relay a table
change stream from a DynamoDB table into a CrateDB table, see also
[DynamoDB CDC](#cdc-dynamodb).
:::


## Apache Airflow / Astronomer

A set of starter tutorials.
Expand Down Expand Up @@ -44,7 +57,8 @@ Tutorials and resources about configuring the managed variants, Astro and CrateD
- {ref}`kafka-connect`
- [Build a data ingestion pipeline using Kafka, Flink, and CrateDB]
- [Community Day: Stream processing with Apache Flink and CrateDB]
- [Executable stack: Apache Kafka, Apache Flink, and CrateDB]
- [Executable stack with Apache Kafka, Apache Flink, and CrateDB]



## Apache Hop
Expand All @@ -57,8 +71,8 @@ Tutorials and resources about configuring the managed variants, Astro and CrateD
## Apache Kafka
:::{div}
- {ref}`kafka-connect`
- [Executable stack with Apache Kafka, Apache Flink, and CrateDB]
- [Replicating data to CrateDB with Debezium and Kafka]
- [Executable stack with Apache Kafka, Apache Flink, and CrateDB]
:::
```{toctree}
:hidden:
Expand Down Expand Up @@ -88,6 +102,13 @@ azure-functions
- [Using dbt with CrateDB]


## DynamoDB
:::{div}
- [DynamoDB Table Loader]
- [DynamoDB CDC Relay]
:::


## InfluxDB

- {ref}`integrate-influxdb`
Expand All @@ -104,9 +125,11 @@ influxdb
- [Setting up data pipelines with CrateDB and Kestra]

## MongoDB

- {ref}`integrate-mongodb`

:::{div}
- Tutorial: {ref}`integrate-mongodb`
- Documentation: [MongoDB Table Loader]
- Documentation: [MongoDB CDC Relay]
:::
```{toctree}
:hidden:
Expand Down
63 changes: 30 additions & 33 deletions docs/migrate/rockset/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,32 +57,12 @@ replacement solution.
:class-footer: text-smaller
{material-outlined}`cast_for_education;3.7em`

Join our Webinar
Watch our Webinars
^^^
{material-outlined}`event_note;2.5em` Date
August 1st, 2024

{material-outlined}`schedule;2.5em` Time \
12:00–12:45 pm PST \
03:00–03:45 pm EST \
09:00–09:45 pm CET

- Why CrateDB is a perfect \[Rockset\] replacement for real-time analytics and hybrid search.
- How CrateDB compares to \[Rockset\] and Elasticsearch/OpenSearch for streaming ingest.
- Why CrateDB is a cost-effective alternative to \[Rockset\].
+++
Register now to learn about our migration services,
and to have a live Q&A session with our experts.
:::

:::{card}
:link: https://cratedb.com/resources/webinars/lp-wb-rockset-migration
:link-alt: "Webinar Recordings"
:class-header: sd-text-center sd-fs-5 sd-align-minor-center sd-font-weight-bold sd-text-capitalize
:class-body: text-smaller
:class-footer: text-smaller
{material-outlined}`live_tv;2.7em`

Watch recordings of previous sessions from this webinar series.
:::

Expand Down Expand Up @@ -258,7 +238,7 @@ mostly due to API rate-limiting measures.


## Learn
Learn how to migrate your database use cases and workloads from Rockset to CrateDB.
Learn how to use CrateDB.

:::::{grid} 1 1 2 2
:gutter: 3
Expand All @@ -270,24 +250,43 @@ CrateDB's lingua franca is SQL, ready for big data, very similar to
Rockset's SQL dialect.
- [CrateDB SQL]
- [Advanced Querying]
::::

::::{grid-item-card}
:::{rubric} Migrating queries from Rockset to CrateDB
:::
Because both Rockset and CrateDB use SQL, there is no need for your teams to
learn a new query language. There are a few differences in the SQL dialect,
where we provide relevant support information to make transitioning easier.
:::{toctree}
Migrate Queries <query>
:::
:::{rubric} Migrating workloads from Rockset to CrateDB
:::
- [Amazon DynamoDB Streams]: Replicate CDC events from DynamoDB into CrateDB. \
{hyper-open}`Documentation <[DynamoDB CDC Relay]>`
{hyper-read-more}`Blog <[Replicating CDC events from DynamoDB to CrateDB]>`
::::

:::::


- [MongoDB Change Streams]: Relay CDC events from MongoDB into CrateDB. \
{hyper-open}`Documentation <[MongoDB CDC Relay]>`
## Integrate
Learn how to migrate your database use cases and workloads from Rockset to CrateDB.

:::::{grid} 1 1 2 2
:gutter: 3

- More information about [](#cdc) with CrateDB.
::::{grid-item-card}
:::
:::{rubric} Migrating DynamoDB workloads from Rockset to CrateDB
:::
- [DynamoDB Table Loader]
- [DynamoDB CDC Relay]
- [DynamoDB CDC Relay with AWS Lambda]
- Blog: [Replicating CDC events from DynamoDB to CrateDB]
:::{rubric} Migrating MongoDB workloads from Rockset to CrateDB
:::
- [MongoDB Table Loader]
- [MongoDB CDC Relay]
:::{rubric} General I/O
:::
- [Data loading](#etl) with CrateDB.
- [](#cdc) with CrateDB.
::::

::::{grid-item-card}
Expand Down Expand Up @@ -319,10 +318,8 @@ and Python example programs.
:::::



[Advanced Querying]: project:#advanced-querying
[All features of CrateDB at a glance]: project:#all-features
[Amazon Kinesis Data Streams]: https://aws.amazon.com/kinesis/
[Apache/Confluent Kafka Streams]: https://kafka.apache.org/documentation/streams/
[automatically indexes all your data]: project:#hybrid-index
[clear commitment]: https://cratedb.com/blog/opensource-licensing-founder
Expand Down

0 comments on commit 1fc7f8d

Please sign in to comment.