Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor this repo #35

Open
MathewBiddle opened this issue Aug 10, 2023 · 12 comments
Open

refactor this repo #35

MathewBiddle opened this issue Aug 10, 2023 · 12 comments
Assignees

Comments

@MathewBiddle
Copy link
Contributor

MathewBiddle commented Aug 10, 2023

This process is very confusing ATM. I've tried to update the README to document how to update the webpages. However, there are lots of interweaving dependencies and step-wise processes that need to be executed in a specific way to make everything work.

I'm starting this issue to do two things.

  1. Document what I'm doing now.
  2. Make a plan for how to simplify the steps and, hopefully, resolve webpage build GHA doesn't run on automated commits #29 in the end.

What is happening now:

  1. ATN GTS metrics https://ioos.github.io/ioos_metrics/gts_atn.html
    1. get calculated from https://github.com/ioos/ioos_metrics/blob/main/gts_atn_metrics.py (by ripping through an html directory) and saved to https://github.com/ioos/ioos_metrics/blob/main/gts/GTS_ATN_monthly_totals.csv.
    2. The result is then used to create the website via https://github.com/ioos/ioos_metrics/blob/main/website/create_gts_atn_landing_page.py.
  2. Regional GTS metrics https://ioos.github.io/ioos_metrics/gts_regional.html
    1. Calculated in https://github.com/ioos/ioos_metrics/blob/main/gts_regional_metrics.py from the data hosted at https://www.ndbc.noaa.gov/ioosstats/ and saved to https://github.com/ioos/ioos_metrics/tree/main/gts/.
      1. we are serving those source data via ERDDAP at https://erddap.ioos.us/erddap/search/index.html?page=1&itemsPerPage=1000&searchFor=GTS as well (I have a script to pull over new data from NDBC for ERDDAP, which I run manually).
      #! /bin/bash
      start=2018-01-01; # when NDBC started collecting data
      end=`date +%Y-%m-%d`
      while ! [[ $start > $end ]]; do
          date_fmt=$(date -d "$start" +%Y_%m)
          start=$(date -d "$start + 1 month" +%Y-%m-%d)
          echo "Downloading $date_fmt..."
          # IOOS Regional
          wget -N https://www.ndbc.noaa.gov/ioosstats/rpts/"$date_fmt"_ioos_regional.csv -nH -P ioos_regional -a logfile_regional.txt
          # NDBC
          wget -N https://www.ndbc.noaa.gov/ioosstats/rpts/"$date_fmt"_ndbc.csv -nH -P ndbc -a logfile_ndbc.txt
          # non-NDBC
          wget -N https://www.ndbc.noaa.gov/ioosstats/rpts/"$date_fmt"_non_ndbc.csv -nH -P non_ndbc -a logfile_non_ndbc.txt
      done
    2. The calculated quarterly files are then read by https://github.com/ioos/ioos_metrics/blob/main/website/create_gts_regional_landing_page.py to create the webpage.
  3. Asset inventory https://ioos.github.io/ioos_metrics/asset_inventory.html
    1. Generated from https://github.com/ioos/ioos-asset-inventory/blob/main/inventory_creation.ipynb and data saved to a yearly directory at https://github.com/ioos/ioos-asset-inventory/tree/main
    2. Those data are read by ERDDAP via manual git pull of ioos-asset-inventory repo on ERDDAP server.
    3. Webpage is then generated with https://github.com/ioos/ioos_metrics/blob/main/website/create_asset_inventory_page.py by reading data from ERDDAP.

How can this process be simplified to:

  1. Catch bugs
  2. Update with new data
  3. Make it relatively hands-off
  4. Ensure data are available for other uses (load into ERDDAP)
@MathewBiddle
Copy link
Contributor Author

I think what I would like to see is all of the data used in the metrics website are made available through the IOOS ERDDAP. Then, lightweight scripts that bring the data in and make the webpage. The trouble I have is where to put the scripts used to generate the datasets that then get served on the IOOS ERDDAP?

@MathewBiddle
Copy link
Contributor Author

current flow

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#007396',
      'primaryTextColor': '#fff',
      'primaryBorderColor': '#003087',
      'lineColor': '#003087',
      'secondaryColor': '#007396',
      'tertiaryColor': '#CCD1D1'
    },
   'flowchart': { 'curve': 'basis' }
  }
}%%

flowchart LR

pA["html"]
A["gts_atn_metrics.py"]
B["GTS_ATN_monthly_totals.csv"]
C["create_gts_atn_landing_page.py"]

subgraph ATN
A --> pA
A --> B
C --> B
end

D["ioosstats/"]
E["gts_regional_metrics.py"]
F["ioos_metrics/tree/main/gts/"]
G["create_gts_regional_landing_page.py"]

subgraph GTS
E --> D
E --> F
G --> F
end

H["inventory_creation.ipynb"]
I["ioos-asset-inventory/tree/main"]
J["IOOS ERDDAP"]
K["create_asset_inventory_page.py"]

subgraph inventory
H --> I
I --> J
K --> J
end
Loading

@MathewBiddle
Copy link
Contributor Author

I think it should look like

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#007396',
      'primaryTextColor': '#fff',
      'primaryBorderColor': '#003087',
      'lineColor': '#003087',
      'secondaryColor': '#007396',
      'tertiaryColor': '#CCD1D1'
    },
   'flowchart': { 'curve': 'basis' }
  }
}%%

flowchart LR

L["ERDDAP"]
A["gts_atn_metrics.py"]
C["create_gts_atn_landing_page.py"]
E["gts_regional_metrics.py"]
G["create_gts_regional_landing_page.py"]
H["inventory_creation.ipynb"]
K["create_asset_inventory_page.py"]


E --> L
L --> G
A --> L
L --> C
H --> L
L --> K

Loading

@MathewBiddle MathewBiddle self-assigned this Oct 4, 2023
@MathewBiddle
Copy link
Contributor Author

MathewBiddle commented Jan 26, 2024

I am also calculating IOOS by the Numbers in this notebook https://github.com/ioos/ioos_metrics/blob/main/IOOS_BTN.ipynb which writes to a csv file https://github.com/ioos/ioos_metrics/blob/main/ioos_btn_metrics.csv. I would like to define a process for running that notebook (or the code inside) and then write the data that could then be hosted on the IOOS ERDDAP https://erddap.ioos.us/erddap/index.html

related to #8

@MathewBiddle
Copy link
Contributor Author

💡 for 2.a. (#35 (comment)) I could run that shell script as a cron job on AWS. Then, we don't have to worry about the ERDDAP endpoint getting out of sync by forgetting to pull new data. Probably best to run on the 5th of each month...

@MathewBiddle
Copy link
Contributor Author

setup the cron job:

$ crontab -l
0 12 5 * * get_data.sh

Will need to check on the 5th of the month if it ran.

@ocefpaf
Copy link
Member

ocefpaf commented Jan 29, 2024

I could run that shell script as a cron job on AWS

Could it work as a GHA cronjob? Or there are reasons to no go that route?

@MathewBiddle
Copy link
Contributor Author

MathewBiddle commented Jan 29, 2024

I need the data where ERDDAP can access it. And https://erddap.ioos.us/erddap/index.html is currently on AWS.

@MathewBiddle
Copy link
Contributor Author

@MathewBiddle
Copy link
Contributor Author

I think we are at a point where we have successfully migrated the jupyter notebook into something more functional. Now I'd like to see what we can clean up from the repo and what we can move to appropriate places. I'm not quite sure where to start, however.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 5, 2024

I'm at the NumFOCUS event until the 8th. After that I can take a look at it. We can start by printing the tree of the repo and add some comments to all the files here so we know what is what.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 17, 2024

@MathewBiddle this is the file tree with some comments to help us navigate this. Please edit/fix the comments if necessary. I'm not 100% sure on some of those files.

.
├── btn_metrics.py           # called in ./.github/workflows/metrics.yml
├── gts_atn_metrics.py       # called in ./.github/workflows/metrics.yml
├── gts_regional_metrics.py  # called in ./.github/workflows/metrics.yml
├── read_bufr.py  # I believe it is unsed
├── conda-lock.yml   # used to lock the environment.yml
├── environment.yml  # requirements to run both the library and the extra script in the CIs
├── ioos_btn_metrics.csv  # current BTN metrics
├── LICENSE                         # metrics package
├── MANIFEST.in                     # metrics package
├── pyproject.toml                  # metrics package
├── README.md                       # metrics package
├── ruff.toml                       # metrics package
├── tests                           # metrics package
│   ├── test_metrics.py             # metrics package
│   └── test_national_platforms.py  # metrics package
├── ioos_metrics                    # metrics package
│   ├── __init__.py                 # metrics package
│   ├── ioos_metrics.py             # metrics package
│   └── national_platforms.py       # metrics package
├── notebooks
│   ├── glider_metrics.ipynb  # gdutil alternative
│   ├── GTS_Totals_weather_act.ipynb  # note sure if this one is used
│   ├── mbon_citation_visualizations.ipynb # note sure if this one is used
│   ├── IOOS_BTN.ipynb  # old notebook before the package
│   └── run_metrics.ipynb  # new notebook that replaces IOOS_BTN.ipynb
├── gts  # everything below is part of the website
│   ├── GTS_ATN_monthly_totals.csv
│   ├── GTS_regional_totals_FY2018_Q2.csv
│   ├── GTS_regional_totals_FY2018_Q3.csv
│   ├── GTS_regional_totals_FY2018_Q4.csv
│   ├── GTS_regional_totals_FY2019_Q1.csv
│   ├── GTS_regional_totals_FY2019_Q2.csv
│   ├── GTS_regional_totals_FY2019_Q3.csv
│   ├── GTS_regional_totals_FY2019_Q4.csv
│   ├── GTS_regional_totals_FY2020_Q1.csv
│   ├── GTS_regional_totals_FY2020_Q2.csv
│   ├── GTS_regional_totals_FY2020_Q3.csv
│   ├── GTS_regional_totals_FY2020_Q4.csv
│   ├── GTS_regional_totals_FY2021_Q1.csv
│   ├── GTS_regional_totals_FY2021_Q2.csv
│   ├── GTS_regional_totals_FY2021_Q3.csv
│   ├── GTS_regional_totals_FY2021_Q4.csv
│   ├── GTS_regional_totals_FY2022_Q1.csv
│   ├── GTS_regional_totals_FY2022_Q2.csv
│   ├── GTS_regional_totals_FY2022_Q3.csv
│   ├── GTS_regional_totals_FY2022_Q4.csv
│   ├── GTS_regional_totals_FY2023_Q1.csv
│   ├── GTS_regional_totals_FY2023_Q2.csv
│   ├── GTS_regional_totals_FY2023_Q3.csv
│   ├── GTS_regional_totals_FY2023_Q4.csv
│   ├── GTS_regional_totals_FY2024_Q1.csv
│   ├── GTS_regional_totals_FY2024_Q2.csv
│   └── GTS_regional_totals_FY2024_Q3.csv
└── website
    ├── asset_inventory_config.json
    ├── create_asset_inventory_page.py
    ├── create_gts_atn_landing_page.py
    ├── create_gts_regional_landing_page.py
    ├── deploy
    │   ├── index.html
    │   └── static
    │       └── main.css
    ├── gts_atn_config.json
    ├── gts_regional_config.json
    └── templates
        ├── asset_inventory_page.html
        ├── gts_atn_landing_page.html
        └── gts_regional_landing_page.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants