-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Colton Loftus edited this page Sep 3, 2024
·
15 revisions
Scheduler IoW Notes
- build files (
implnet_{jobs,ops}_*.py
) are tracked in the repo, making a verbose git history and PRs more work to review - multiple organizations have stored configurations in the repo, causing a higher burden on maintainers
- build is done with environment variables and multiple coupled components instead of one build script, making it more challenging to debug, test, and refactor
- Each organization has 2 config files that are necessary to build the
gleanerconfig.yaml
- nabuconfig.yaml specifies configuration and context for how to retrieve triplet data and how to store it in minio
pygen uses 1.
- Condense code into one central Python build program
- Use https://github.com/docker/docker-py to control the containers instead of shell scripts. (Makes it easier to test and debug to have it all in one language as a data pipeline)
- By using a cli library like https://typer.tiangolo.com/ we can validate argument correctness and fail early, making it easier to debug instead of reading in the arguments and failing after containers are spun up
- Move all build files to the root of the repo to make it more clear for end users
- (i.e. makefiles,
build/
directory, etc.)
- (i.e. makefiles,
- Refactor such that individual organizations store their configuration outside the repo.
- The Python build program should be able to read the configuration files at an arbitrary path that the user specifices
- Add types and doc strings for easier maintenance long term
- Use jinja templating instead of writing raw text to the output files