DataFrame of Mind with Polars and PySpark

This repository contains the exercises for the course around DataFrames and data processing with two different analytics engines: Polars and Pyspark.

Below will be removed from the README, but is useful as a guideline during development of the course.

Operational vs. analytical data
Data processing / data transformation
- Examples transformations (input/output) for each common transformation (relational model)
  - Join
  - Agg (GroupBy)
  - Window
  - Filter
  - Project
DataFrame abstraction (tabular data vs. unstructured or semi-structured data)
Engines
- Spark vs. Polars vs. DuckDB vs. Pandas
  - Cost / Simplicity / scalability trade-off
  - SQL vs. DataFrame API
  - Roles: analyst, data scientist, data engineer
Polars:
- Deep dive + architecture
- Hands-on exercises
PySpark:
- Deep dive + architecture
- Hands-on exercises
Ecosystem
Advanced
- Arrow interoperability
- Python DataFrame API
- Substrait
Outlook: processing in the data engineering landscape

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
1-operational-vs-analytic		1-operational-vs-analytic
2-csv-from-hell		2-csv-from-hell
3-basic-transforms		3-basic-transforms
4-window-aggregations		4-window-aggregations
5-joins		5-joins
6-udf		6-udf
7-pipe		7-pipe
data		data
demo-reading-data		demo-reading-data
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback