Reformat weather datasets into zarr.
We use
uv
to manage dependencies and python environmentsruff
for linting and formattingmypy
for type checkingpre-commit
to automatically lint and format as you git commit (type checking on commit is TODO)
- Install uv
- Run
uv run pre-commit install
to setup the git hooks - If you use VSCode, you may want to install the extensions (ruff, mypy) it will recommend when you open this folder
uv run main.py --help
uv run main.py noaa-gefs-forecast update-template
uv run main.py noaa-gefs-forecast reformat-local 2024-01-02T00:00
- Add dependency:
uv add <package> [--dev]
. Use--dev
to add a development only dependency. - Lint:
uv run ruff check
- Type check:
uv run mypy
- Format:
uv run ruff format
To reformat a large archive we parallelize work across multiple cloud servers.
We use
docker
to containerize the codekubernetes
indexed jobs to run work in parallel
- Install
docker
andkubectl
. Make suredocker
can be found at /usr/bin/docker andkubectl
at /usr/bin/kubectl. - Setup a docker image repository and export the DOCKER_REPOSITORY environment variable in your local shell. eg.
export DOCKER_REPOSITORY=us-central1-docker.pkg.dev/<project-id>/reformatters/main
- Setup a kubernetes cluster and configure kubectl to point to your cluster. eg
gcloud container clusters get-credentials <cluster-name> --region <region> --project <project>
- Create a kubectl secret containing your Source Coop S3 credentials
kubectl create secret generic source-coop-key --from-literal='AWS_ACCESS_KEY_ID=XXX' --from-literal='AWS_SECRET_ACCESS_KEY=XXX'
uv run main.py noaa-gefs-forecast reformat-kubernetes <INIT_TIME_END> [--jobs-per-pod <int>] [--max-parallelism <int>]