Sparkify Data Lake | ETL Pipeline

Summary

Introduction
Getting started
Data sources
Parquet data schema

Project Overview

In this project, data has been extracted from a AWS S3 bucket. The data processed, fact and dimension tables have been created. The final output has been load back into S3. This process has been deployed in Spark session.

Pre-requisite

Install Python 3
Install pyspark, os, pyspark.sql pip install pyspark
Optional:
Jupyter Notebook
PyCharm

ETL Pipeline

Read data from S3
- Song data: s3://udacity-dend/song_data
- Log data: s3://udacity-dend/log_data
Transform the data using Spark
- Create five different tables
Fact Table

songplays - data lives in log data.
- songplay_id, start_time, user_id, level, song_id, artist_id, session_id, location, user_agent
Dimension Tables

users - users Fields - user_id, first_name, last_name, gender, level

songs - songs in the database Fields - song_id, title, artist_id, year, duration

artists - artists in the database Fields - artist_id, name, location, lattitude, longitude

time - timestamps of records in songplays broken down into specific units Fields - start_time, hour, day, week, month, year, weekday

Setup Instructions:

Populate the dwh.cfg config file with AWSAccessKeyId and AWSSecretKey
Setup ETL.py and run

[KEY]
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
data		data
venv		venv
README.md		README.md
dl.cfg		dl.cfg
etl.py		etl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparkify Data Lake | ETL Pipeline

Summary

Project Overview

Pre-requisite

Optional:

ETL Pipeline

Fact Table

Dimension Tables

Setup Instructions:

About

Releases

Packages

Languages

0xCakin/Data_Lake

Folders and files

Latest commit

History

Repository files navigation

Sparkify Data Lake | ETL Pipeline

Summary

Project Overview

Pre-requisite

Optional:

ETL Pipeline

Fact Table

Dimension Tables

Setup Instructions:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages