Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add archiver for NREL Standard Scenarios #563

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

krivard
Copy link
Contributor

@krivard krivard commented Jan 30, 2025

Overview

Closes #561.

What problem does this address?

  • NREL Standard Scenarios data is not available for download from static links, only from a javascript-powered web app
  • We reverse engineered the API calls made by the web app and reproduce them here; they're basic GET and POST requests with no credentials necessary

What did you change in this PR?

  • New archiver for NREL Standard Scenarios
    • Includes some semi-custom fetching:
      • Response JSON from a POST request
      • Response header from a POST request
  • Split hyperlink extraction into two phases to permit extraction of links directly from an HTML string

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Tasks

Preview Give feedback

@krivard krivard linked an issue Jan 30, 2025 that may be closed by this pull request
10 tasks
"keywords": sorted(
{
"nrel",
"standard scenarios",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other keywords that could go in here, cribbed from pudl/metadata/sources.py@nrelatb:

                + KEYWORDS["us_govt"]
                + KEYWORDS["electricity"]

"standard scenarios",
}
),
"license_raw": LICENSES["cc-by-4.0"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have a weirdo disclaimer that says (approximately) "you have to cite us but you can't make it look like we endorse you" which seems close enough to cc-by?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would ask our resident license scrutinizer @zaneselvans on this one!

"major cost declines for electricity generation technologies (e.g., using cost"
"inputs from the Annual Technology Baseline)."
"For select scenarios, the models are run using the PLEXOS software and the"
"Cambium tool that assembles structured data sets of hourly cost, emissions, and"
Copy link
Contributor Author

@krivard krivard Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider pulling in the cambium results as well (as a second partition) but A) they only go back to 2020, and B) they're like 6GB for each year

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its simple to also add in the camdium results i'd say add em & add a second partition of project or scenario_type or something! but this seems like a lower priority than grabbing just the standard scenarios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the camdium results 6 GB zipped? if so w/ the standard scenarios (assuming they are a similar size) its pushing up against the 50 GB archive limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah zipped. The standard scenarios are like two orders of magnitude smaller since they don't include hourly data though, so it's less "will cambium push this archiver over the limit" and more "can we archive cambium on zenodo at all"

I'll go with "not right now" and write up cambium as a separate issue

text: text containing HTML.
filter_pattern: If present, only return links that contain pattern.
"""
parser = _HyperlinkExtractor()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have carefully sliced this to not conflict with Marianne's get_hyperlink changes but I'll handle any massaging necessary if not

@krivard krivard changed the title Add archiver for NREL Sstandard Scenarios Add archiver for NREL Standard Scenarios Jan 31, 2025
@krivard krivard force-pushed the 561-nrel-standard-scenarios branch from 811331d to 49a2974 Compare January 31, 2025 20:50
@krivard krivard force-pushed the 561-nrel-standard-scenarios branch from a967aeb to 7e7b211 Compare January 31, 2025 20:57
@krivard krivard force-pushed the 561-nrel-standard-scenarios branch from 6df426b to 3b9e2d2 Compare January 31, 2025 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

Write an archiver for NREL Standard Scenarios
2 participants