Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS DMS: Add first working version #213

Merged
merged 1 commit into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions doc/io/dms/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# AWS DMS Processor

## About
[AWS Database Migration Service] (AWS DMS) is a managed migration and replication
service that helps move your database and analytics workloads quickly, securely,
and with minimal downtime and zero data loss.

The data migration pipeline supports one-shot full-load operations, and continuous
replication based on change data capture (CDC).

## Details
A full-load-and-CDC pipeline using AWS DMS and CrateDB will use [Amazon Kinesis]
Data Streams [as a DMS target], combined with a CrateDB-specific downstream
processor element.

## Coverage
AWS DMS supports migration between 20-plus database and analytics engines, either
on-premises, or per EC2 instance databases.

- Amazon Aurora
- Amazon DocumentDB
- Amazon S3
- IBM Db2 for Linux, UNIX, and Windows versions 9.7 and higher
- IBM Db2 for z/OS version 12
- MariaDB versions 10.0 and higher
- Microsoft Azure SQL Database
- Microsoft SQL Server versions 2005 and higher
- MongoDB versions 3.x and higher
- MySQL versions 5.5 and higher
- Oracle versions 10.2 and higher
- PostgreSQL versions 9.4 and higher
- SAP Adaptive Server Enterprise (ASE) versions 12.5 and higher

AWS DMS also supports the MySQL/MariaDB and PostgreSQL variants on AWS RDS,
Microsoft Azure, and Google Cloud. [Sources for AWS DMS] displays all the
compatibility details on one page.

## Usage
Depending on your needs and requirements, CrateDB and CrateDB Cloud support
different ways to configure AMS DMS using CrateDB as a CDC consolidation
database.
```{toctree}
:maxdepth: 2

standalone
managed
```


[Amazon Kinesis]: https://aws.amazon.com/kinesis/
[as a DMS target]: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Kinesis.html
[AWS Database Migration Service]: https://aws.amazon.com/dms/
[Sources for AWS DMS]: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html
21 changes: 21 additions & 0 deletions doc/io/dms/managed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# AWS DMS Managed

## About
Conduct a data migration from any source supported by AWS DMS into a database
table on [CrateDB Cloud], exclusively using managed infrastructure components.

:::{note}
This is a work in progress. Please contact our data engineers to get started.
:::

## Configuration
1. Set up a DMS instance to replicate data to an Amazon Kinesis Data Stream.
2. Take a note about the AWS ARN of that Kinesis Data Stream,
for example `arn:aws:kinesis:eu-central-1:831394476016:stream/testdrive`.
3. Reach out to CrateDB support, to make CrateDB Cloud connect to your data
stream, in order to converge it into your CrateDB Cloud instance.
Comment on lines +7 to +16
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the gist of the whole story when aiming to use a "managed only, hands-free" way of doing data migration operations, for now in an "ad hoc deployment mode"?



[to an Amazon Kinesis Data Stream]: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.Kinesis.html
[CrateDB]: https://cratedb.com/docs/guide/home/
[CrateDB Cloud]: https://cratedb.com/docs/cloud/
26 changes: 26 additions & 0 deletions doc/io/dms/standalone.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# AWS DMS Standalone

## About
Relay an AWS DMS data stream from Amazon Kinesis into a [CrateDB] table using
a one-stop command `ctk load table kinesis+dms://...`.

You can use it in order to facilitate convenient data transfers to be used
within data pipelines or ad hoc operations. It can be used as a CLI interface,
and as a library.

## Install
Install the CrateDB Toolkit package.
```shell
pip install --upgrade 'cratedb-toolkit[kinesis]'
```

## Usage
1. Set up a DMS instance, replicating data to Amazon Kinesis.
2. Transfer data from Kinesis Data Stream into CrateDB database table.
```shell
export CRATEDB_SQLALCHEMY_URL=crate://crate@localhost:4200/testdrive/demo
ctk load table kinesis+dms://arn:aws:kinesis:eu-central-1:831394476016:stream/testdrive
```
Comment on lines +17 to +23
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is not slotted in yet. daq-tools/lorrystream@6bfd268d3cd needs to converge better.

Copy link
Member Author

@amotl amotl Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



[CrateDB]: https://cratedb.com/docs/guide/home/
1 change: 1 addition & 0 deletions doc/io/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ ctk shell --command="SELECT * FROM data_weather LIMIT 10;" --format=json
:maxdepth: 2
:hidden:

AWS DMS <dms/index>
DynamoDB <dynamodb/index>
InfluxDB <influxdb/index>
MongoDB <mongodb/index>
Expand Down