Tutorial: Exploratory Data Analysis, the Polars Way
(as given at PyCon Italia 2024 and EuroPython 2024).
Please prepare a Python environment that you can use during the workshop. We will work in Jupyter Notebook. However, you can also use jupyter lab or one of the IDES, Visual Studio Code or PyCharm.
git clone https://github.com/janpipek/eda-polars-way.git
or using gh
client:
gh repo clone janpipek/eda-polars-way
Alternatively, you can just download the repo as a package from here:
https://github.com/janpipek/eda-polars-way/archive/refs/heads/main.zip
The included requirements.txt
file should be enough for you to create a Python environment
using the pip
command.
Python version 3.10+ is required.
First, cd
into the repository directory:
cd eda-polars-way
# Activate the environment (every time you open the shell)
python -m venv .venv # (or `uv venv`)
source .venv/bin/activate # Linux, Mac
.venv\Scripts\activate.bat # Windows
# Install the required packages (once)
python -m pip install -r requirements.txt (or `uv pip install -r requirements.txt`)
(note that we require the new, stable 1.0 version of polars)
This is not recommended but working in case you have probelms installing on your laptop.
Create an account at https://deepnote.com (for free) and launch the repo by clicking the button:
Note that you will have to install additional packages (there is a command you need to uncomment).
All contents (a bit of text + all exercises) are located in exercises.ipynb
. The exercise are partly filled and accompanied by hints. If you are still unsure, in solutions.ipynb
, you have working code to answer the questions. To help SQL-savvy, the solutions-sql.ipynb
file contains solution using the SQL API of polars).
All the data sources are believed to be open and publicly distributable,
see data/README.md
for more details.
- Python Polars: A Lightning-Fast DataFrame Library @ RealPython
- R. Vink: What polars does for you, EuroPython 2023
- M. Harrison: Getting Started with Polars, Pycon US 2023