Skip to content

Thejipppp/polars-dataminded-exercise

Repository files navigation

DataFrame of Mind with Polars and PySpark

Open in Gitpod

This repository contains the exercises for the course around DataFrames and data processing with two different analytics engines: Polars and Pyspark.

Below will be removed from the README, but is useful as a guideline during development of the course.

  • Operational vs. analytical data

  • Data processing / data transformation

    • Examples transformations (input/output) for each common transformation (relational model)
      • Join
      • Agg (GroupBy)
      • Window
      • Filter
      • Project
  • DataFrame abstraction (tabular data vs. unstructured or semi-structured data)

  • Engines

    • Spark vs. Polars vs. DuckDB vs. Pandas
      • Cost / Simplicity / scalability trade-off
      • SQL vs. DataFrame API
      • Roles: analyst, data scientist, data engineer
  • Polars:

    • Deep dive + architecture
    • Hands-on exercises
  • PySpark:

    • Deep dive + architecture
    • Hands-on exercises
  • Ecosystem

  • Advanced

  • Outlook: processing in the data engineering landscape

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •