Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gestion des variables de la lib #182

Open
armgilles opened this issue Jan 8, 2025 · 0 comments
Open

Gestion des variables de la lib #182

armgilles opened this issue Jan 8, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@armgilles
Copy link
Owner

armgilles commented Jan 8, 2025

Il peut y avoir des problèmes avec la gestion des path actuelles pour la lecture de fichiers et model dans le package lorsque celui-ci est installé (pip install) notamment dans le front sur heroku :
https://github.com/armgilles/vcub_keeper/blob/efa5a147b0ee8724a8ad379e27cca830b1a01be1/src/vcub_keeper/config.py#L11C1-L30C48

Utiliser importlib.metadata pour la gestion des chemins d'accès des fichiers de la librairie (exemple rapide) :

import os
from pathlib import Path
import importlib.metadata as metadata
from dotenv import load_dotenv

load_dotenv()

# Change config env in production
IS_PROD = False

# Paths
try:
    package_name = "vcub_keeper"
    package_location = metadata.distribution(package_name).locate_file("")
    ROOT_DIR = Path(package_location).resolve()
    print("ROOT_DIR: ", ROOT_DIR)
except Exception as e:  # noqa: E722
    # Case of heroku env var
    print("Try to find environment variable")
    # Ne plus utiliser ce hack et la lecture dans le .env ??
    ROOT_DIR = Path(os.environ.get("ROOT_DIR", ""))
    IS_PROD = True

# If package is installed with pip
if "site-packages" in str(ROOT_DIR):  # install via pip
    load_dotenv()
    ROOT_DIR = Path(os.environ.get("ROOT_DIR", ROOT_DIR))  # with .env file in preprod
    print("Package installed with pip: ", ROOT_DIR)
    IS_PROD = True

# In case where ROOT_DIR is None (pre-prod) but we don't need these variables
try:
    ROOT_DATA_RAW = ROOT_DIR / "data/raw/"
    ROOT_DATA_CLEAN = ROOT_DIR / "data/clean/"
    ROOT_DATA_REF = ROOT_DIR / "data/ref/"
    ROOT_MODEL = ROOT_DIR / "model/"
    ROOT_TESTS_DATA = ROOT_DIR / "tests/data_for_tests/"
except Exception as e:  # noqa: E722
    print("Can't have repository variables")
    ROOT_DATA_REF = ""  # https://github.com/armgilles/vcub_keeper/issues/56#issuecomment-1007593715

# Only in dev
if IS_PROD is False:
    # ROOT_DATA_RAW
    if not ROOT_DATA_RAW.exists():
        ROOT_DATA_RAW.mkdir(parents=True, exist_ok=True)
        print("Create " + str(ROOT_DATA_RAW))

    # ROOT_DATA_CLEAN
    if not ROOT_DATA_CLEAN.exists():
        ROOT_DATA_CLEAN.mkdir(parents=True, exist_ok=True)
        print("Create " + str(ROOT_DATA_CLEAN))

    # ROOT_DATA_REF
    if not ROOT_DATA_REF.exists():
        ROOT_DATA_REF.mkdir(parents=True, exist_ok=True)
        print("Create " + str(ROOT_DATA_REF))

    # ROOT_MODEL
    if not ROOT_MODEL.exists():
        ROOT_MODEL.mkdir(parents=True, exist_ok=True)
        print("Create " + str(ROOT_MODEL))

    # ROOT_TESTS_DATA
    if not ROOT_TESTS_DATA.exists():
        ROOT_TESTS_DATA.mkdir(parents=True, exist_ok=True)
        print("Create " + str(ROOT_TESTS_DATA))

Autre solution plus ou moins identique mais sans doute plus élégant et pythonique mais nécessite que le répertoire data et model soit dans le package...

# Dans le fichier pyproject.toml
[tool.setuptools.package-data]
"vcub_keeper" = ["data/ref/station_attribute.csv", "model*.joblib"] # tous les fichier dont on a besoin
# Exemple de refacto pour la focntion de lecture du fichier read_stations_attributes()

from importlib.resources import files
import io
import polars as pl
import pandas as pd

def read_stations_attributes(
    data: None | io.StringIO = None,
    file_name: str = "station_attribute.csv",
    output_type: str = "",
) -> pl.DataFrame | pd.DataFrame:
    """
    Lecture du fichier sur les attributs des Vcub à Bordeaux. Ce fichier provient de
    create.creator.py - create_station_attribute()

    Parameters
    ----------
    data : io.StringIO | None
        Données en mémoire (utilisées si présentes).
    file_name : str
        Nom du fichier CSV à lire (par défaut "station_attribute.csv").
    output_type : str
        Format de sortie, "pandas" pour un DataFrame Pandas. Par défaut, retourne un DataFrame Polars.

    Returns
    -------
    stations : pl.DataFrame | pd.DataFrame
        DataFrame contenant les attributs des stations.

    Examples
    --------
    stations = read_stations_attributes()
    stations_pandas = read_stations_attributes(output_type="pandas")
    """

    column_dtypes = {"station_id": pl.UInt16}

    # Si un objet `io.StringIO` est fourni, on l'utilise directement
    if isinstance(data, io.StringIO):
        file_path = data
    else:
        # Récupère le chemin du fichier dans le package
        file_path = files('vcub_keeper.data').joinpath(file_name)

    # Lecture du fichier CSV avec Polars
    stations = pl.read_csv(file_path, separator=";", schema_overrides=column_dtypes)

    # Conversion en DataFrame Pandas si demandé
    if output_type == "pandas":
        stations = stations.to_pandas()

    return stations
@armgilles armgilles added the enhancement New feature or request label Jan 8, 2025
@armgilles armgilles self-assigned this Jan 8, 2025
armgilles added a commit that referenced this issue Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant