sources of data: https://www.kaggle.com/martinellis/nhl-game-data
The type of transformation needed for this data (cleaning, joining, filtering, aggregating, etc). The type of final production database to load the data into (relational or non-relational). The final tables or collections that will be used in the production database.
-
Data Definition Language Statements:
- ALTER
- DROP
- CREATE VIEW
- DROP VIEW
-
Data Manipulation Language Statements:
- SELECT
- APPEND
- DELETE
- UPDATE
- JOIN
-
Aggregate Functions:
- SUM
- COUNT
- GROUP BY
- WHERE
Extract: the original data sources and how the data was formatted (CSV, JSON, pgAdmin 4, etc). Transform: what data cleaning or transformation was required. Load: the final database, tables/collections, and why this was chosen.
-
The transformed files were saved as a csv from Jupyter notebook and imported into pgAdmin for use in our queries. The files were created in pgAdmin and joined into one table. This resulting table is the LOAD portion of our ETL project.
-
This project enables queries on NHL data at the team level. Incorporating the team.csv file provides critical information so we can identify each team by something other than a randomly assigned identification number.
-
Code and result from joining tables: