Text updates

pgflo · Sep 2, 2024 · a6626f5 · a6626f5
1 parent f1f4ab8
commit a6626f5
Show file tree

Hide file tree

Showing 4 changed files with 66 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -1,5 +1,9 @@
 # 🌊 pg_flo
 
+## ![](internal/demo.gif)
+
+[![CI](https://github.com/shayonj/pg_flo/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/shayonj/pg_flo/actions/workflows/ci.yml)
+
 `pg_flo` is the easiest way to move and transform data from PostgreSQL. It users PostgreSQL Logical Replication to stream inserts, updates, deletes, and DDL changes to multiple destinations. With support for parallelizable bulk copy, near real-time streaming, and powerful transformation and filtering rules, `pg_flo` simplifies data sync and ETL processes.
 
 ⚠️ CURRENTLY UNDER ACTIVE DEVELOPMENT. ACCEPTING FEEDBACK/ISSUES/PULL REQUESTS 🚀
@@ -30,6 +34,7 @@
 - Supports tracking DDL changes.
 - Configurable via command-line flags or environment variables.
 - Supports copy and stream mode to parallelize bulk copy and stream changes.
+- Resumable streaming from last `lsn` position.
 
 I invite you to take a look through [issues](https://github.com/shayonj/pg_flo/issues) to see what's coming next 🤗.
 
@@ -81,7 +86,6 @@ You can configure `pg_flo` using a YAML configuration file or environment variab
 
 ### Example 1: Basic streaming of changes to STDOUT
 
-````shell
 ```shell
 pg_flo stream stdout \
   --host localhost \
@@ -92,7 +96,7 @@ pg_flo stream stdout \
   --group your_group \
   --schema public \
   --tables table1,table2
-````
+```
 
 ### Example 2: Using Configuration File
 
@@ -149,6 +153,10 @@ pg_flo stream file \
 - `make test`
 - `make lint`
 
+## How it Works
+
+You can read about how the tool works briefly here [here](internal/how-it-works.md).
+
 ### End-to-End Tests
 
 For running end-to-end tests, use the provided script:

diff --git a/internal/demo.gif b/internal/demo.gif
diff --git a/internal/how-it-works.md b/internal/how-it-works.md
@@ -0,0 +1,55 @@
+# How it works
+
+`pg_flo` leverages PostgreSQL's logical replication system to capture and stream data while applying transformations and filtrations to the data on the go.
+
+1. **Publication Creation**: It creates a PostgreSQL publication for the specified tables or all tables (per `group`).
+
+2. **Replication Slot**: A replication slot is created to ensure no data is lost between streaming sessions.
+
+3. **Operation Modes**:
+
+   - Users can choose between two modes of operation:
+     a) **Copy-and-Stream**: Performs an initial bulk copy followed by streaming changes.
+     b) **Stream-Only**: Starts streaming changes immediately from the last known position.
+
+4. **Initial Bulk Copy** (for Copy-and-Stream mode):
+
+   - If no valid LSN (Log Sequence Number) is found in the target sink, `pg_flo` performs an initial bulk copy of existing data.
+   - This process is parallelized for fast data sync:
+     - Tables are analyzed to optimize the copy process.
+     - A snapshot is taken to ensure consistency.
+     - Each table is divided into page ranges.
+     - Multiple workers copy different ranges concurrently.
+
+5. **Resumable Streaming**:
+
+   - After the initial copy (or immediately in Stream-Only mode), streaming starts from the last known position.
+   - The last processed LSN is stored in the target sink/destination, allowing `pg_flo` to resume operations from where it left off in case of interruptions.
+
+6. **Message Processing**: It processes various types of messages:
+
+   - Relation messages to understand table structures
+   - Insert, Update, and Delete messages containing actual data changes
+   - Begin and Commit messages for transaction boundaries
+
+7. **Data Transformation**: Received data is converted into a structured format, with type-aware conversions for different PostgreSQL data types.
+
+8. **Rule Application**: If configured, transformation and filtering rules are applied to the data:
+
+   - **Transform Rules**:
+     - Regex: Apply regular expression transformations to string values.
+     - Mask: Mask sensitive data, keeping the first and last characters visible.
+   - **Filter Rules**:
+     - Comparison: Filter based on equality, inequality, greater than, less than, etc.
+     - Contains: Filter string values based on whether they contain a specific substring.
+   - Rules can be applied selectively to insert, update, or delete operations.
+
+9. **Buffering**: Processed data is buffered and written in batches to optimize write operations to the destination.
+
+10. **Writing to Sink**: Data is periodically flushed from the buffer to the configured sink (e.g., stdout, file, or other destinations).
+
+11. **State Management**:
+
+    - The tool keeps track of its progress by updating the Last LSN in the target sink/destination.
+    - This allows for resumable operations across multiple runs.
+    - Periodic status updates are sent to PostgreSQL to maintain the replication connection..
diff --git a/internal/webhook_test.sh b/internal/webhook_test.sh
@@ -3,7 +3,7 @@ set -euo pipefail
 
 source "$(dirname "$0")/e2e_common.sh"
 
-WEBHOOK_URL="https://big-lamp-86.webhook.cool"
+WEBHOOK_URL="https://deep-article-49.webhook.cool"
 
 setup_docker() {
   rm -Rf /tmp/pg*