Skip to content

lucasfonsecads/data-extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

.github/workflows/release.yml

AWS ETL Project

Overview

This project extracts, transforms, and loads (ETL) data into an AWS-based datalake. It includes linting, tests, containerization, and infrastructure-as-code with Terraform.

Features

  • Extract data from multiple sources (APIs, Databases, CSV files)
  • Transform data (cleaning, normalization, type conversion)
  • Load data into Parquet format and store it in AWS S3
  • AWS Glue integration for ETL orchestration
  • Terraform for AWS infrastructure provisioning
  • CI/CD pipeline with GitHub Actions
  • Dockerized environment for consistency

Setup

Prerequisites

  • Python 3.9+
  • AWS CLI configured with necessary permissions
  • Terraform installed for infrastructure deployment
  • Docker (optional, for containerized execution)

Installation

pip install -r requirements.txt

Deploy AWS Infrastructure

cd infrastructure
terraform init
terraform apply

Running ETL

make run

Running Tests

make test

Linting & Code Formatting

make lint

Running in Docker

make docker-build
make docker-run

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published