✨ enhance `dataset.read_table(...)` method with type normalisation #3277

Marigold · 2024-09-12T10:24:33Z

Motivation

Historically, we've been using function dataset["my_table"] to access table from a dataset. Recently, a new helper method dataset.read_table(reset_index: bool = False) has been added that lets us read the table with reset index which is significantly faster for large dimensional datasets.

Concept

We could add more functionality to read_table and make it de facto standard to read tables. These could be:

Retype all columns to "standard" types (e.g. uint8 -> int64, Float16 -> float64) and categorical to string type
~~underscore column names etc. (see .format method)~~
- Should already be in snake case by the time we're talking tables

The text was updated successfully, but these errors were encountered:

larsyencken · 2024-10-17T09:38:50Z

We think type standardising would remove some common footguns for data folk:

categorical → string
Float* -> Float64
Int* -> Int64

Maybe this should live in the repack module.

It would be nice to turn this on by default.

github-actions bot added the needs triage label Sep 12, 2024

larsyencken added enhancement New feature or request priority 2 - important and removed needs triage labels Oct 17, 2024

larsyencken changed the title ~~✨ enhance dataset.read_table(...) method~~ ✨ enhance dataset.read_table(...) method with type normalisation Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ enhance `dataset.read_table(...)` method with type normalisation #3277

✨ enhance `dataset.read_table(...)` method with type normalisation #3277

Marigold commented Sep 12, 2024 •

edited by larsyencken

Loading

larsyencken commented Oct 17, 2024 •

edited

Loading

✨ enhance dataset.read_table(...) method with type normalisation #3277

✨ enhance dataset.read_table(...) method with type normalisation #3277

Comments

Marigold commented Sep 12, 2024 • edited by larsyencken Loading

Motivation

Concept

larsyencken commented Oct 17, 2024 • edited Loading

✨ enhance `dataset.read_table(...)` method with type normalisation #3277

✨ enhance `dataset.read_table(...)` method with type normalisation #3277

Marigold commented Sep 12, 2024 •

edited by larsyencken

Loading

larsyencken commented Oct 17, 2024 •

edited

Loading