-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
107 scaling HN comparison to all LAs and refactoring of the code #107
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @helloaidank from what I could see this looks great - thanks a lot! The script ran perfectly and the results are really encouraging, and I couldnt see any obvious bugs. Just a few suggestions.
# 6. After all LAs are processed, save the combined MAE data as CSV | ||
mae_df = pl.DataFrame(la_mae_data) | ||
csv_path = os.path.join(output_dir, "la_mae_data.csv") | ||
mae_df.write_csv(csv_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to save this to s3 too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @helloaidank ,
Thanks for all this work, it looks good! I didn't test the code as Liz has done that. I didn't find any bugs or anything. I did leave some suggestions for moving functions to other existing files where they would fit in, plus a couple other minor suggestions.
I also think it might be useful to have these functions refactored so they can compare either the Nesta / conventional view scores to the HN zones. It would be interesting to see if the results are very different. This should definitely be in a separate PR and is a nice to have. Would you mind please adding it as an issue into the repo?
Thank you :D
asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/hn_zones/config/hnz_config.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/hn_zones/utils/spatial_utils.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/hn_zones/utils/spatial_utils.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/hn_zones/utils/spatial_utils.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/hn_zones/utils/spatial_utils.py
Outdated
Show resolved
Hide resolved
asf_heat_pump_suitability/analysis/hn_zones/utils/spatial_utils.py
Outdated
Show resolved
Hide resolved
… adding missing lsoa info to dict
Hi both @lizgzil and @crispy-wonton! Thank you both for reviewing the code , very useful suggestions, I've made the suggested changes and where I didn't, I've explained why not! This took longer than I thought as it's clearly no longer morning 🤣 |
Hi @crispy-wonton and @lizgzil, reviving this PR, potentially it would be good to have a quick check if you're happy with the changes I've made and also explanations I've given. I've made some changes to some of the hp suitability util files and also implemented some more concise code. Happy to discuss at stand-up as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helloaidank great work here - thanks for making the changes. Everything looking good and neat. I spotted a couple of minor changes to make, but all the logic looks good. I ran the following code as given in your PR description:
python asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py --read_in_s3 --save_to_s3
It worked well and I could see it looping through LAs and producing outputs. I interrupted it after a few iterations to save time.
subfolder: str, | ||
): | ||
""" | ||
Upload a local file to an S3 bucket, used for evaluation of our heat network suitability model using the DESNZ Heat Network pilot zones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upload a local file to an S3 bucket, used for evaluation of our heat network suitability model using the DESNZ Heat Network pilot zones. | |
Upload a local file to an S3 bucket. |
Just to make the docstring more generic to match the function :)
local_file_path: str, | ||
s3_bucket: str, | ||
s3_key_dir: str, | ||
save_to_s3: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
save_to_s3
needs to be removed from here and the docstring
s3_key_dir (str): S3 key (path) where the file should be uploaded. | ||
save_to_s3 (bool): boolean which indicates whether to save or not the file to s3. | ||
filename (str): The actual filename to store in S3. | ||
subfolder (str): Subfolder within S3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the diff between s3_key_dir
and subfolder
. Aren't these, plus file name, in their entirety referred to as key
s?
dict: A dictionary containing paths for LIVERPOOL_GPKG_PATH, LSOA_SHP_PATH, and NESTA_HP_SUITABILITY_PARQUET_PATH. | ||
""" | ||
paths = {} | ||
if read_in_s3: | ||
paths["LSOA_SHP_PATH"] = LSOA_SHP_PATH_S3 | ||
paths["NESTA_HP_SUITABILITY_PARQUET_PATH"] = NESTA_HPS_PARQUET_S3 | ||
|
||
else: | ||
paths["LSOA_SHP_PATH"] = LSOA_SHP_PATH_LOCAL | ||
paths["NESTA_HP_SUITABILITY_PARQUET_PATH"] = NESTA_HPS_PARQUET_LOCAL | ||
return paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict is missing the LIVERPOOL_GPKG_PATH
. Not sure which of function or docstring needs updating.
Args: | ||
gdf (gpd.GeoDataFrame): The GeoDataFrame to write. | ||
output_dir (str): Local directory to save the GPKG. | ||
filename_prefix (str): The prefix used for naming the GPKG file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename_prefix (str): The prefix used for naming the GPKG file. | |
filename_prefix (str): The prefix used for naming the GPKG file. `filename_prefix` will be joined to `_with_desnz_hn_lsoa.gpkg` to generate full filename. |
) | ||
|
||
# Extract unique LSOA codes | ||
unique_lsoa_codes = intersection_gdf["LSOA21CD"].dropna().unique().tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any/a lot of rows that get dropped here because of drop na?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, it would be good to include a logging warning here to note how many rows are lost.
Description
This PR extends the
comparison_of_hn_zones.py
script to process all available Local Authorities (LAs), rather than just Liverpool. Additionally, the codebase has been refactored for better modularity, maintainability, and efficiency.Key changes include:
Major Changes
Scaling to All LAs
hnz_config.py
rather than being hardcoded to Liverpool.Extracted Utility Functions:
log_save_utils.py
for logging, path management, and file saving.spatial_utils.py
to handle spatial joins and CRS transformations.hnz_comparison_analysis_utils.py
for statistical calculations (e.g., MAE, suitability scores).nesta_hp_suitability_utils.py
to filter heat pump suitability data for the relevant LAs.hnz_config.py
.Improvements to Efficiency and Code Structure
Outputs
Each LA now generates the following output files:
<la_name>_with_desnz_hn_lsoa.gpkg
: Processed GeoDataFrame with LSOA spatial joins.<la_name>_hp_suitability_lsoas.json
: List of unique LSOA codes per LA.<la_name>_hp_suitability_scores_with_desnz.parquet
: Processed suitability scores with DESNZ pilot fraction and MAE.script_output.log
: Detailed logs on calculations and data processing.Instructions for Reviewer
Testing the Script
To run the script and verify the outputs, use:
python asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py --read_in_s3 --save_to_s3
This should output multiple data files in the relevant s3 bucket (
s3://asf-heat-pump-suitability/evaluation/desnz_hn_zone_scores/
) for all LAs.Code Structure Review
Just have a look at the way I've refactored the code and make sure everything makes sense (utils and config done in the right manner, anything obvious missing in this refactoring).
Things to pay special attention to
Correctness of Outputs
plot_comparison_of_hn_zones.py
should still work for the Liverpool outputs from this script, make sure that's the case.Handling of Special Cases
Thank you for taking the time to review this PR! Please do let me know if you have any questions.
Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s