Skip to content

Latest commit

 

History

History
77 lines (53 loc) · 3.96 KB

README.md

File metadata and controls

77 lines (53 loc) · 3.96 KB

Sales-DataPipeline


Project Overview

This project is an overview of an Sales Data Projection data pipeline that involves near real tine data ingestion and transformation with Change Data Capture functionality. We will design a system using AWS services such as S3, Lambda, Glue, DynamoDB, Kinesis Stream, Kinesis Firehose and Event Bridge to ingest, transform, with change data capture functionality to load data in S3 and accessing using Athena for analytical purposes.


Architectural Diagram

SalesDataPipeline Architecture


Key Steps

1. Create a Table "OrdersRawTable" in DynamoDB

DynamoDB_Table

2. Create a Data Stream "kinesis-for-sales-data" in Kinesis.

KinesisDataStream

3. Create a Event Bridge Pipe to ingest data from DynamoDb Stream to Kinesis Stream.

EventBridge

  • Note: Give the attached IAM role the permission to access DynamoDB & Kinesis
    • AmazonDynamoDBFullAccess
    • AmazonKinesisFullAccess

4. Run the Mock data generator script to load the data in DyanomoDB.

DataLoadCMD

5. Check the Data Load

  • Data should be visible in DynamoDB Table "OrdersRawTable" DynamoDB_DataLoad

  • Event Bridge should be triggerd and data should flow from DyanomoDB Stream to Kinesis Stream KinesisDataLoad

6. Check the Change Data Capture (CDC) Events

  • Before Edit the data

    • before1
    • before2
  • After Edit the data

    • after1
    • after2
    • after3

7. Create Kinesis Firehose

  • Create Kinesis Firehose to fetch the data from Kinesis Stream and transform it with the help of Lambda and load as batches into S3

  • Kinesis Firehose

    • KinesisFirehose
  • Lambda Function for Transformation

    • Lambda
  • S3 Bucket "kinesis-firehose-destination-yb" for data load destination

    • S3Before

8. Generate Some more mock data in DynamoDB

  • The data will be generated by mock data generator script and data will flow from DynamoDB to Kinesis Stream. from the Kinesis stream the daat will flow in Kinesis Firehose and Tranform by Lambda function and stored in the destination S3 bucket
    • S3After

9. Create a Glue Crawler to crawl data from destination S3 bucket

  • crawler

    • Note: Create a Classifier and attach to crawler to load the JSON data in the correct format
    • claasifier
  • Schema fetched by Crawler

    • CrawlerResult

10. Query the data from S3 using Athena and Glue's Crawled Schema

Athena