Credit card issuers frequently offer cashback incentives as a key selling point for their products. This initiative aims to create a streamlined and dependable data infrastructure for examining information related to credit card cashback rewards. The system is designed to gather, refine, and archive data obtained from payment cards through an API interface. Its primary objective is to uncover trends in spending habits and reward accumulation, providing valuable insights for both cardholders and issuers alike.
The pipeline consists of the following components:
- Data extraction from Plutus API
- Data storage in AWS S3
- Data transformation using AWS Glue and Lambda
- Data loading into AWS Redshift
- Data visualization using Google Data Studio Dashboard
The entire process is orchestrated using AWS Step Functions and deployed using Terraform and Docker.
api.py
: Handles authentication and data retrieval from the Plutus APIpull_data_glue_job_lambda.py
: Lambda function for pulling data and triggering Glue jobsglue_script.py
: Glue job for data transformationload_to_redshift_lambda.py
: Lambda function for loading data into Redshiftinfra/
: Terraform configurations for AWS infrastructure- Dockerfiles: For containerizing Lambda functions
- The Plutus API is queried for transaction and reward data
- Raw data is stored in S3 as CSV files
- Glue jobs process and transform the data
- Transformed data is stored back in S3 as Parquet files
- A Glue crawler updates the data catalog
- Data is loaded into Redshift for analysis
- Looker Studio connects to Redshift for visualization
A comprehensive dashboard is available in Looker Studio, providing insights into spending patterns and reward accumulation.
- step_functions are used to orchestrate the workflow of the data pipeline.
- Two datasets are pulled from the Open Banking API. Rewards and Transactions data. A left join is performed on the two datasets matching each reward with its transaction. This is because rewards are missing key information such as the merchant name and transaction amount.
- Performed cleaning, updated schema and created new variables such as
reward_amount
andplu_price
for analysis downstream.
Redshift is used as the data warehouse for the project.
- AWS Account
- AWS CLI configured with appropriate credentials
- Docker
- Terraform (version ~> 1.7.5)
- Python 3.12
- LookerStudio
- Clone the repository
- Update the
Makefile
with your AWS Account ID and ECR region - Run
make terraform/plan
to preview the infrastructure changes - Run
make terraform/apply
to create the AWS resources - Use the AWS Console to run the Step Functions state machine
- Get the Redshift endpoint and credentials from the AWS Console and sign into Looker to create a connection to Redshift
make terraform/plan-destroy
then make terraform/destroy