Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: sendbird/sb-osc
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.1
Choose a base ref
...
head repository: sendbird/sb-osc
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Loading
Showing with 940 additions and 401 deletions.
  1. +3 −2 .github/workflows/linters.yml
  2. +2 −9 README.md
  3. +15 −0 deploy/README.md
  4. 0 { → deploy}/charts/Chart.yaml
  5. +77 −0 deploy/charts/README.md
  6. 0 { → deploy}/charts/templates/externalsecret.yaml
  7. +0 −3 { → deploy}/charts/templates/redis.yaml
  8. 0 { → deploy}/charts/templates/sb-osc.yaml
  9. 0 { → deploy}/charts/templates/serviceaccount.yaml
  10. 0 { → deploy}/charts/values.yaml
  11. +67 −0 deploy/compose/README.md
  12. +78 −0 deploy/compose/config.yaml
  13. +60 −0 deploy/compose/docker-compose.yml
  14. +3 −0 deploy/compose/redis.conf
  15. +9 −0 deploy/compose/secret.json
  16. +10 −0 doc/add-index.md
  17. +18 −0 doc/config.md
  18. +46 −10 doc/operation-class.md
  19. +47 −1 doc/troubleshooting.md
  20. +39 −77 doc/usage.md
  21. +26 −18 src/config/config.py
  22. +3 −1 src/config/env.py
  23. +5 −5 src/modules/db.py
  24. +2 −1 src/modules/redis/schema.py
  25. +2 −0 src/sbosc/component.py
  26. +61 −42 src/sbosc/controller/controller.py
  27. +23 −5 src/sbosc/controller/initializer.py
  28. +87 −64 src/sbosc/controller/validator.py
  29. +28 −33 src/sbosc/eventhandler/eventhandler.py
  30. +5 −2 src/sbosc/monitor/monitor.py
  31. +109 −71 src/sbosc/operations/base.py
  32. +21 −15 src/sbosc/operations/operation.py
  33. +2 −2 src/sbosc/operations/utils.py
  34. +11 −13 src/sbosc/worker/worker.py
  35. +18 −0 tests/README.md
  36. +9 −6 tests/configs/config.yaml
  37. +19 −12 tests/conftest.py
  38. +1 −3 { → tests}/docker-compose.yml
  39. +31 −3 tests/test_controller.py
  40. +1 −1 tests/test_eventhandler.py
  41. +2 −2 tests/test_monitor.py
5 changes: 3 additions & 2 deletions .github/workflows/linters.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
name: Linters

on:
pull_request:
push:
branches:
- master
- main
pull_request:
workflow_dispatch:

jobs:
flake8_py3:
11 changes: 2 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -59,22 +59,15 @@ load when production traffic increases.

## Requirements

SB-OSC is designed to work with Aurora MySQL database, and it's an EKS-based tool.

It requires the following resources to run:

- Aurora MySQL database (v2, v3)
- EKS cluster
- AWS SecretsManager secret
- IAM role
SB-OSC is designed to work with Aurora MySQL database. It's a containerized application that can be run on both Kubernetes and Docker environments.

SB-OSC accepts `ROW` for binlog format. It is recommended to set `binlog-ignore-db` to `sbosc` to prevent SB-OSC from
processing its own binlog events.

- `binlog_format` set to `ROW`
- `binlog-ignore-db` set to `sbosc` (Recommended)

Detailed requirements and setup instructions can be found in the [usage guide](doc/usage.md).
Detailed requirements and setup instructions can be found in the [deployment guide](deploy/README.md).

## Performance

15 changes: 15 additions & 0 deletions deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Usage Guide

SB-OSC is designed to be deployed as a containerized application.
It can be run on both Kubernetes and Docker environments.

For Kubernetes deployment refer to [charts](./charts) directory, and for Docker deployment refer to [compose](./compose) directory.

### Building Docker Image
You can build Docker image using Dockerfile in the root directory.
```bash
docker build -t sb-osc .
```

### Troubleshooting
Issues and solutions that may occur when using SB-OSC can be found in [troubleshooting.md](../doc/troubleshooting.md).
File renamed without changes.
77 changes: 77 additions & 0 deletions deploy/charts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Deploying on EKS Cluster

## 1. Create AWS Resources

### IAM Role

Two IAM role is required. One for `ExternalSecrets` to access SecretsManager secret and another for the `monitor` to access CloudWatch metrics. Each role will be attached to separate service accounts.


Create an IAM role with the following policy:

**sb-osc-external-role**
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource": "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:SECRET_NAME"
}
]
}
```

**sb-osc-role**
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricStatistics"
],
"Resource": "*"
}
]
}
```

### SecretsManager Secret
SB-OSC uses ExternalSecrets with SecretsManager for credentials. Following keys should be defined.

- `username`: Database username
- `password`: Database password
- `port`: Database port
- `redis_host`: Redis endpoint (k8s Service name)
- `redis_password`: Redis password
- `slack_channel`: Slack channel ID (Optional)
`slack_token`: Slack app token (Optional)

You can find these keys in [secret.py](../../src/config/secret.py)

## 2. Create Destination Table
SB-OSC does not create destination table on its own. Table should be manually created before starting migration.

## 3. Enable Binlog
SB-OSC requires binlog to be enabled on the source database. Please set `binlog_format` to `ROW`

### Other Parameters
- Setting `binlog-ignore-db` to `sbosc` is recommended to prevent SB-OSC from processing its own binlog events.
- Set `range_optimizer_max_mem_size` to `0` or a large value to prevent bad query plans on queries with large `IN` clauses (especially on Aurora v3)

## 4. Run SB-OSC
When all of the above steps are completed, you can start the migration process by installing the [helm chart]()

```bash
helm install charts sb-osc -n sb-osc --create-namespace

# or
helm -i upgrade charts sb-osc -n sb-osc
```
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -42,9 +42,6 @@ spec:
- name: redis-data
persistentVolumeClaim:
claimName: redis-pvc
- name: redis-config
configMap:
name: redis-config
- name: redis-secret
secret:
secretName: sb-osc-secret
File renamed without changes.
File renamed without changes.
File renamed without changes.
67 changes: 67 additions & 0 deletions deploy/compose/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Deploying with Docker Compose

## 1. Create IAM Role

### IAM Role

IAM role is required for the `monitor` to access CloudWatch metrics.

Create an IAM role with the following policy:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricStatistics"
],
"Resource": "*"
}
]
}
```

Attach this role to the instance where SB-OSC is running.

## 2. Write Config Files
You have to write three config files for SB-OSC to run properly.

### `config.yaml`
This files contains the configuration for SB-OSC. You can find the template in [config.yaml](config.yaml).
All values are loaded into `Config` class in [config.py](../../src/config/config.py).

### `secret.json`
This file contains the credentials for the database, redis, and slack. You can find the template in [secret.json](secret.json). All values are loaded into `Secret` class in [secret.py](../../src/config/secret.py).

- `username`: Database username
- `password`: Database password
- `port`: Database port
- `redis_host`: Redis endpoint (Docker container name)
- `redis_password`: Redis password (Optional)
- `slack_channel`: Slack channel ID (Optional)
`slack_token`: Slack app token (Optional)

`redis_password` is optional. Keep in mind that if you set a password in `redis.conf`, you should set the same password in `secret.json`.

### `redis.conf`
This file contains the configuration for the Redis server. You can find the template in [redis.conf](redis.conf).
- `requirepass ""`: Match the `redis_password` set in `secret.json`.
- If `requirepass ""` is set, this means that the Redis server does not require a password. Fill in the password between the quotes to set a password.
- `appendonly yes`: Enable AOF persistence
- `save ""`: Disable RDB persistence

## 3. Create Destination Table
SB-OSC does not create destination table on its own. Table should be manually created before starting migration.

## 4. Enable Binlog
SB-OSC requires binlog to be enabled on the source database. Please set `binlog_format` to `ROW`

### Other Parameters
- Setting `binlog-ignore-db` to `sbosc` is recommended to prevent SB-OSC from processing its own binlog events.
- Set `range_optimizer_max_mem_size` to `0` or a large value to prevent bad query plans on queries with large `IN` clauses (especially on Aurora v3)

## 5. Run SB-OSC
When all of the above steps are completed, you can start the migration process by running docker compose.

Please double-check if the `docker-compose.yml` file is correctly configured (ex. `image`, `AWS_REGION`, etc.)
78 changes: 78 additions & 0 deletions deploy/compose/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
####################
# Required configs #
####################

# Migration plan
source_writer_endpoint: "" # If source_cluster_id is not provided, this must be cluster writer endpoint.
source_reader_endpoint: ""
destination_writer_endpoint: "" # If destination_cluster_id is not provided, this must be cluster writer endpoint.
destination_reader_endpoint: ""
source_db: ""
source_table: ""
destination_db: ""
destination_table: ""

auto_swap: false # Whether to swap tables automatically. (Default: false)
preferred_window: "00:00-23:59" # Preferred window for swapping tables & bulk import validation. (Default: "00:00-23:59")

# Worker config
min_batch_size: 500 # Starting batch size to use. (Default: 500)
max_batch_size: 3000 # Desired batch size to use. (Default: 3000)
batch_size_step_size: 500 # Step size to increase batch size. (Default: 500)

min_thread_count: 1 # Starting thread count to use. (Default: 1)
max_thread_count: 8 # Desired thread count to use. (Default: 8)
thread_count_step_size: 1 # Step size to increase thread count. (Default: 1)

commit_interval_in_seconds: 1 # Time wait after each query executed by worker. (Default: 1)

# Validator
bulk_import_validation_batch_size: 10000 # Batch size for bulk import validation (Default: 10000)
apply_dml_events_validation_batch_size: 1000 # Batch size for DML event validation (Default: 1000)
validation_thread_count: 4 # Number of threads to use for validation process (Default: 4)

####################
# Optional configs #
####################

# Migration plan
# sbosc_db: "sbosc" # Database to create sb-osc tables. (Default: "sbosc")
# source_cluster_id: ~ # If not provided, cluster id will be retrieved from source_writer_endpoint (Default: ~)
# destination_cluster_id: ~ # If not provided, cluster id will be retrieved from destination_writer_endpoint (Default: ~)
# min_chunk_size: 100000 # Minimum chunk size to create. (Default: 100000)
# max_chunk_count: 200 # Maximum number of chunks to create. (Default: 200)
# wait_interval_until_auto_swap_in_seconds: 60 # Interval to wait until auto swap. (Default: 60)
# skip_bulk_import: false # Whether to skip bulk import. (Default: false)
# disable_apply_dml_events: false # Whether to disable applying dml events. (Default: false)
# operation_class: BaseOperation # Operation class to use. (Default: BaseOperation)
# indexes: [] # Indexes to create after bulk import. (Default: [])
# index_created_per_query: 4 # Number of indexes to create per iteration. (Default: 4)
# innodb_ddl_buffer_size: ~ # innodb_ddl_buffer_size for MySQL. (Default: ~)
# innodb_ddl_threads: ~ # innodb_ddl_threads for MySQL. (Default: ~)
# innodb_parallel_read_threads : ~ # innodb_parallel_read_threads for MySQL. (Default: ~)

# Worker config
# use_batch_size_multiplier: false # Whether to use batch size multiplier. (Default: false)

# EventHandler config
# eventhandler_thread_count: 4 # Number of threads for EventHandler. Max number of binlog files to read at once. (Default 4. Max 4 recommended)
# eventhandler_thread_timeout_in_seconds: 300 # Timeout for EventHandler thread. If the thread is not finished within this time, it raises exception and restarts EventHandler. (Default: 300)
# init_binlog_file: ~ # Initial binlog file to start reading. (Default: ~)
# init_binlog_position: ~ # Initial binlog position to start reading. (Default: ~)

# Monitor threshold
# cpu_soft_threshold: 40 # Soft threshold for CPU usage. If the CPU usage exceeds this value, thread count will be decreased into half. (Default: 40)
# cpu_hard_threshold: 60 # Hard threshold for CPU usage. If the CPU usage exceeds this value, thread count will be decreased to 0. (Default: 60)
# write_latency_soft_threshold: 30 # Soft threshold for WriteLatency. If the latency exceeds this value, batch size will be decreased into half. (Default: 30)
# write_latency_hard_threshold: 50 # Hard threshold for WriteLatency. If the latency exceeds this value, batch size will be decreased to 0. (Default: 50)

# Validation config
# apply_dml_events_validation_interval_in_seconds: 10 # Interval for DML event validation (seconds) (Default: 10)
# full_dml_event_validation_interval_in_hours: 0 # Interval for full DML event validation. 0 disables full DML event validation (hours) (Default: 0)

# EventLoader config
# pk_set_max_size: 100000 # Max number of DML PKs to load from DB at once. No more than 2 * pk_set_max_size will be kept in Redis. This is used for memory optimization. (Default: 100000)
# event_batch_duration_in_seconds: 3600 # Timestamp range of DML events to load from DB at once (seconds). (Default: 3600)

# Operation class config
# operation_class_config: ~ # Operation class specific configurations. (Default: ~)
60 changes: 60 additions & 0 deletions deploy/compose/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
services:
controller: &component-base
image: "" # SB-OSC image
container_name: controller
environment: &component-env
AWS_REGION: "" # AWS region
CONFIG_FILE: "/opt/sb-osc/config.yaml"
SECRET_FILE: "/opt/sb-osc/secret.json"
volumes:
- ./config.yaml:/opt/sb-osc/config.yaml
- ./secret.json:/opt/sb-osc/secret.json
command: ["python", "-m", "sbosc.controller.main"]
restart: always
depends_on:
redis:
condition: service_healthy

eventhandler:
<<: *component-base
container_name: eventhandler
command: ["python", "-m", "sbosc.eventhandler.main"]
depends_on:
- controller

monitor:
<<: *component-base
container_name: monitor
command: ["python", "-m", "sbosc.monitor.main"]
depends_on:
- controller

worker:
<<: *component-base
container_name: worker
command: ["python", "-m", "sbosc.worker.main"]
environment:
<<: *component-env
POD_NAME: "worker"
depends_on:
- controller

redis:
image: "redis:7.0.4"
container_name: redis
command:
- redis-server
- /usr/local/etc/redis/redis.conf
ports:
- "6379:6379"
volumes:
- redis-data:/data
- ./redis.conf:/usr/local/etc/redis/redis.conf
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5

volumes:
redis-data:
3 changes: 3 additions & 0 deletions deploy/compose/redis.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
requirepass ""
appendonly yes
save ""
9 changes: 9 additions & 0 deletions deploy/compose/secret.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"username": "root",
"password": "",
"port": "3306",
"redis_host": "redis",
"redis_password": "",
"slack_channel": "",
"slack_token": ""
}
10 changes: 10 additions & 0 deletions doc/add-index.md
Original file line number Diff line number Diff line change
@@ -9,3 +9,13 @@ Before `ALTER TABLE ... ADD INDEX` command finishes, index is temporarily create
### Free Memory (Enhanced Monitoring)
Upon creating an index, the Free Memory as reported by Enhanced Monitoring will decrease. This decrease continues rapidly until it reaches a certain value. However, Aurora has the capability to immediately reclaim memory from FreeableMemory (as observed in CloudWatch), so this should not pose a significant issue. Nonetheless, it is important to monitor and ensure that neither Free Memory nor Freeable Memory reaches zero.

### Innodb Parameters (MySQL 8.0.27 and above)
In MySQL 8.0.27 new innodb parameters `innodb_ddl_buffer_size`, `innodb_ddl_threads`, and `innodb_parallel_read_threads` were added to improve secondary index creation.
SB-OSC supports options to set these parameters in the migration configuration before creating indexes.
```yaml
innodb_ddl_buffer_size: 1048576
innodb_ddl_threads: 4
innodb_parallel_read_threads: 4
```
Please refer to the [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html) for more information on these parameters.
Loading