Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove data folder and add script to pull data from azure blob / original sources #207

Open
arik-shurygin opened this issue Aug 1, 2024 · 2 comments · May be fixed by #296
Open

Remove data folder and add script to pull data from azure blob / original sources #207

arik-shurygin opened this issue Aug 1, 2024 · 2 comments · May be fixed by #296
Labels
enterprise_practices Governance housekeeping trying to keep projects usable and secure. governance CDC practices related

Comments

@arik-shurygin
Copy link
Collaborator

Having data, especially raw data, in this repository is not good practice.
Where at all possible we should provide a way for the user to source the data themselves (if it is public of course).

This ticket is responsible for the removal of the data folder as a tracked folder, and instead gitignore the folder and provide scripts that download it from the cloud (for VAP users) or raw data (for external cloners of the repo).

@SamuelBrand1
Copy link
Contributor

Whats your plan for this? There are a few different data sources so it might be best to organise into sources where there is a public remote (e.g. https://github.com/cmu-delphi/epidatr ?) and sources we could put in blob storage from /data.

Then you can tick them off as sub-issues.

@arik-shurygin arik-shurygin added enterprise_practices Governance housekeeping trying to keep projects usable and secure. governance CDC practices related labels Oct 17, 2024
@arik-shurygin
Copy link
Collaborator Author

arik-shurygin commented Oct 17, 2024

Whats your plan for this? There are a few different data sources so it might be best to organise into sources where there is a public remote (e.g. https://github.com/cmu-delphi/epidatr ?) and sources we could put in blob storage from /data.

Then you can tick them off as sub-issues.

Thats is a good plan, currently this issue is in limbo because I am waiting to see what happens with the cfa-initialization repo which creates many of our initialization csvs and vaccination splines. If the repository is folded in with the model then we will have to change what data the repo grabs (we would need alot less data from azure).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enterprise_practices Governance housekeeping trying to keep projects usable and secure. governance CDC practices related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants