Skip to content
Colton Loftus edited this page Sep 3, 2024 · 15 revisions

Scheduler IoW Notes

Current Challenges

  • build files (implnet_{jobs,ops}_*.py) are tracked in the repo, making a verbose git history and PRs more work to review
  • multiple organizations have stored configurations in the repo, causing a higher burden on maintainers
  • build is done with environment variables and multiple coupled components instead of one build script, making it more challenging to debug, test, and refactor

Current understanding for alignment

  • Each organization has 2 config files that are necessary to build the gleanerconfig.yaml
    1. nabuconfig.yaml specifies configuration and context for how to retrieve triplet data and how to store it in minio

pygen uses 1.

Ideas for improvement

  • Condense code into one central Python build program
    • Use https://github.com/docker/docker-py to control the containers instead of shell scripts. (Makes it easier to test and debug to have it all in one language as a data pipeline)
    • By using a cli library like https://typer.tiangolo.com/ we can validate argument correctness and fail early, making it easier to debug instead of reading in the arguments and failing after containers are spun up
  • Move all build files to the root of the repo to make it more clear for end users
    • (i.e. makefiles, build/ directory, etc.)
  • Refactor such that individual organizations store their configuration outside the repo.
    • The Python build program should be able to read the configuration files at an arbitrary path that the user specifices
  • Add types and doc strings for easier maintenance long term
  • Use jinja templating instead of writing raw text to the output files
Clone this wiki locally