Skip to content

Latest commit

 

History

History
49 lines (31 loc) · 2.98 KB

README.md

File metadata and controls

49 lines (31 loc) · 2.98 KB

Project-Based Data Engineering Learning

Welcome to our open-source project aimed at fostering practical data engineering skills! This project is inspired by the approach of breaking into data engineering with zero cost and a focus on hands-on projects. We will guide you through setting up a real-world data pipeline using modern tools and technologies like Python, BigQuery/Snowflake, and Astronomer. Whether you're a beginner looking to dive into data engineering or an experienced professional aiming to brush up on your skills, this project i...

Overview

This project outlines a step-by-step approach to building a data engineering pipeline, from sourcing data to implementing quality checks. We focus on practical, project-based learning to equip you with the skills needed to excel in the field of data engineering.

What You Will Build

  • A Python script to fetch data from a REST API.
  • A process to dump this data into a CSV file initially.
  • A Snowflake or BigQuery setup to manage your data in the cloud.
  • An automated pipeline using Astronomer to ingest data on a scheduled basis.
  • Data quality checks to ensure the integrity of your data.

Getting Started

Before you begin, make sure you have the following prerequisites:

  • Python installed on your machine.
  • An account with Snowflake or BigQuery (free tiers are available).
  • An account with Astronomer.

Installation & Setup

  1. Find a Data Source: Choose a data source you are interested in (e.g., stock market, Pokémon, sports data). Make sure it offers a REST API.
  2. Python Script for Data Fetching: Clone this repository and navigate to the script directory. Modify the script to point to your chosen data source.
  3. Snowflake/BigQuery Account: Follow the instructions on their website to set up a free trial account. Modify the script to dump data into your Snowflake/BigQuery instance instead of a CSV.
  4. Astronomer for Automation: Set up an account and follow the instructions to automate your data ingestion.
  5. Data Quality Checks: Implement data quality checks using Great Expectations or your custom checks.

Contributing

We welcome contributions from the community! Whether it's adding new features, improving documentation, or reporting bugs, your contributions are greatly appreciated.

  • Fork the Repository: Start by forking this repository to your GitHub account.
  • Create a Pull Request: After making your changes, create a pull request against our repository. Please provide a clear description of your changes.
  • Code Review: Your pull request will be reviewed by our team. We may suggest some changes or improvements.

License

This project is open-source and available under the MIT License.

Acknowledgments

Test git push and pull