Credit Card Rewards Pipeline

Introduction

Credit card issuers frequently offer cashback incentives as a key selling point for their products. This initiative aims to create a streamlined and dependable data infrastructure for examining information related to credit card cashback rewards. The system is designed to gather, refine, and archive data obtained from payment cards through an API interface. Its primary objective is to uncover trends in spending habits and reward accumulation, providing valuable insights for both cardholders and issuers alike.

Architecture

The pipeline consists of the following components:

Data extraction from Plutus API
Data storage in AWS S3
Data transformation using AWS Glue and Lambda
Data loading into AWS Redshift
Data visualization using Google Data Studio Dashboard

The entire process is orchestrated using AWS Step Functions and deployed using Terraform and Docker.

Key Components

api.py: Handles authentication and data retrieval from the Plutus API
pull_data_glue_job_lambda.py: Lambda function for pulling data and triggering Glue jobs
glue_script.py: Glue job for data transformation
load_to_redshift_lambda.py: Lambda function for loading data into Redshift
infra/: Terraform configurations for AWS infrastructure
Dockerfiles: For containerizing Lambda functions

Data Flow

The Plutus API is queried for transaction and reward data
Raw data is stored in S3 as CSV files
Glue jobs process and transform the data
Transformed data is stored back in S3 as Parquet files
A Glue crawler updates the data catalog
Data is loaded into Redshift for analysis
Looker Studio connects to Redshift for visualization

Dashboard

A comprehensive dashboard is available in Looker Studio, providing insights into spending patterns and reward accumulation.

Architecture Overview

Workflow Orchestration

step_functions are used to orchestrate the workflow of the data pipeline.

Data Transformation

Two datasets are pulled from the Open Banking API. Rewards and Transactions data. A left join is performed on the two datasets matching each reward with its transaction. This is because rewards are missing key information such as the merchant name and transaction amount.
Performed cleaning, updated schema and created new variables such as reward_amount and plu_price for analysis downstream.

Data Warehouse

Redshift is used as the data warehouse for the project.

Setup

Prerequisites

AWS Account
AWS CLI configured with appropriate credentials
Docker
Terraform (version ~> 1.7.5)
Python 3.12
LookerStudio

Deployment

Clone the repository
Update the Makefile with your AWS Account ID and ECR region
Run make terraform/plan to preview the infrastructure changes
Run make terraform/apply to create the AWS resources
Use the AWS Console to run the Step Functions state machine
Get the Redshift endpoint and credentials from the AWS Console and sign into Looker to create a connection to Redshift

Tear Down

make terraform/plan-destroy then make terraform/destroy

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
glue_job		glue_job
infra		infra
static		static
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
Pipfile		Pipfile
PullDataDockerfile		PullDataDockerfile
README.md		README.md
RedShiftDockerfile		RedShiftDockerfile
api.py		api.py
glue_crawler_lambda.py		glue_crawler_lambda.py
load_to_redshift_lambda.py		load_to_redshift_lambda.py
pull_data_glue_job_lambda.py		pull_data_glue_job_lambda.py
rewards.csv		rewards.csv
transactions.csv		transactions.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Card Rewards Pipeline

Introduction

Architecture

Key Components

Data Flow

Dashboard

Architecture Overview

Workflow Orchestration

Data Transformation

Data Warehouse

Setup

Prerequisites

Deployment

Tear Down

About

Releases

Packages

Languages

bhanuteja2001/credit-card-pipeline

Folders and files

Latest commit

History

Repository files navigation

Credit Card Rewards Pipeline

Introduction

Architecture

Key Components

Data Flow

Dashboard

Architecture Overview

Workflow Orchestration

Data Transformation

Data Warehouse

Setup

Prerequisites

Deployment

Tear Down

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages