Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure repo dirs/files/execution #249

Open
justb4 opened this issue Apr 27, 2018 · 2 comments
Open

Restructure repo dirs/files/execution #249

justb4 opened this issue Apr 27, 2018 · 2 comments

Comments

@justb4
Copy link
Contributor

justb4 commented Apr 27, 2018

Currently each (Stetl-based) ETL process like Top10nl, BRK, BGT etc has its own config/execution mode etc. At the same time all are very similar. Also for a user it is hard to grasp how to perform a specific ETL. This also makes Dockerization harder to develop.

The following needs/can be done to restructure the repo and its (Stetl-based) ETL processes:

  • move each process to its own (sub)directory named after the Basisregistratie: e.g. brt/top10, brk/dkk. Call each a "Project" (or "Process")
  • have for each Project a consistent dir/file-naming , e.g. Stetl config files e.g. brt/top10/etl/config/default.cfg, gfs files etc.
  • have a single script at the top dir like nlextract.sh (or nlextract.py maybe to be cross platform?)
  • each Project/Process will have a default argument-file and a possibly host-named args file.
  • allow the user to easily override default options like database host, and other credentials

Something like

nlextract.sh -p brt/top250 -a brt/top250/options/default.args -a /home/me/nlx/top250.args 

For Stetl an issue has been opened to allow multiple -a args.

Only problem is how to deal with the BAG, which is not Stetl-based and has more extended commandline options. Possibly the default "convert to PostGIS" can be performed by nlextract.sh|py.

@justb4
Copy link
Contributor Author

justb4 commented Apr 27, 2018

It should be mentioned that this issue is already worked on/merged via PR #244 and #245 by @stvno on a separate restructure repo Branch.

@justb4
Copy link
Contributor Author

justb4 commented May 3, 2018

Stetl (master/latest) ondersteunt nu multiple -a opties. Zie voorbeeld gebruik in top10nl (README): https://github.com/nlextract/NLExtract/tree/master/brt/top10nl/etl . Tevens filenamen gestandaardiseerd, default.args (allowed nu in .gitignore maar niet andere .args bestanden) heeft alle default args, eigen .args hoeft alleen wijzigingen daarop te bevatten bijv alleen DB credentials.

@justb4 justb4 modified the milestones: Versie 1.4.0, Versie 1.5.0 Feb 14, 2020
@justb4 justb4 modified the milestones: Versie 1.5.0, Versie 1.6.0 Oct 23, 2020
@justb4 justb4 modified the milestones: Versie 1.5.5, Versie 1.6.0 Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants