Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a config file system #240

Open
Bisaloo opened this issue Jun 7, 2021 · 6 comments
Open

Use a config file system #240

Bisaloo opened this issue Jun 7, 2021 · 6 comments

Comments

@Bisaloo
Copy link
Contributor

Bisaloo commented Jun 7, 2021

Currently, most functions of this package require specifying the same argument throughout scripts (such as source, hub_repo_path, hub).

This is typo-prone (e.g., #228), difficult to update (e.g., in the case of a migration from a "local_hub_repo" to Zoltar).

Additionally, the hub argument from which is derived information like possible_locations makes it difficult for someone to spin a new hub without extra work from your end.

Instead of this, I propose to use use a config file that would contain all the metadata about the hub. Below is an example of some information that could be stored in a YAML file:

source: local_hub_repo
hub_repo_path: https://raw.githubusercontent.com/epiforecasts/covid19-forecast-hub-europe/main
hub_locations: data-locations/locations_eu.csv
verbose: false

This suggestion is inspired from the way pkgdown handles this problem of arguments that will remain the same throughout the project.

@elray1
Copy link
Collaborator

elray1 commented Jul 1, 2021

How would this work in practice, in terms of where the YAML file would be located and how it would be accessed? Would you have to edit this file at the time you install covidHubUtils, e.g. by cloning the repo, editing the file to reflect the settings you'll use most often, and then installing? Or is there another approach?

@Bisaloo
Copy link
Contributor Author

Bisaloo commented Jul 2, 2021

In practice, this file would be located in the hub repository / at the root of the project using covidHubUtils and would be read by covidHubUtils function when necessary.

We merged a first version of this for settings we use to build the ensemble and generate the reports in the EuroCOVIDhub: european-modelling-hubs/covid19-forecast-hub-europe_archive#511.

We could add there the list of countries supported by the hub, etc. and this information would be used by covidHubUtils.

The precise location and name of the file can be discussed and ideally could be adjusted via a specific argument (config_file argument in get_hub_config() if we take the first version we merged on our side but it can be changed).

@Bisaloo
Copy link
Contributor Author

Bisaloo commented Jul 2, 2021

As Kath noted in european-modelling-hubs/covid19-forecast-hub-europe_archive#511, it could make sense to have a single file for this and for zoltr since there is already some overlap.

@elray1
Copy link
Collaborator

elray1 commented Nov 15, 2021

We think a hub needs to keep track of:

  • the name of the hub
  • the url of the hub repo
  • a set of locations with location_id and location_name information
  • set of targets (maybe tracked as horizon, temporal scale, target variable)
  • maybe target end weekdays and forecast weekdays, per target (stuff involved in date_management.R calculations)
  • zoltar project id

We are wondering if this should be stored in yaml files, json files, R objects, or something else?

@Bisaloo
Copy link
Contributor Author

Bisaloo commented Nov 15, 2021

At the moment, we have this:

https://github.com/epiforecasts/covid19-forecast-hub-europe/blob/main/forecasthub.yml

I agree we're likely missing some info at the moment, but we are slowly adding more and more items. It would be great if we can adopt a compatible syntax!

@elray1
Copy link
Collaborator

elray1 commented Dec 16, 2021

Serena and I just discussed this issue a little more, and decided we think the best way to handle this would involve a fair amount of redesign, and we propose to delay handling this until we start on hubUtils (as a separate package). Our thinking is that maybe the package should take a more object-oriented approach and define something like a Hub class. When you instantiate an object of that class, you provide it a config file or the path to a config file, and maybe some other stuff. Then you call load_forecast and similar functions on that object and it knows where to find things and how to validate them based on the config file contents it was given at the start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants