107 scaling HN comparison to all LAs and refactoring of the code #107

helloaidank · 2025-01-10T17:32:17Z

Description

This PR extends the comparison_of_hn_zones.py script to process all available Local Authorities (LAs), rather than just Liverpool. Additionally, the codebase has been refactored for better modularity, maintainability, and efficiency.

Key changes include:

Major Changes

Scaling to All LAs

The script now iterates through all LAs listed in hnz_config.py rather than being hardcoded to Liverpool.
Special handling for Greater Manchester and other grouped LAs.

Extracted Utility Functions:

Created log_save_utils.py for logging, path management, and file saving.
Created spatial_utils.py to handle spatial joins and CRS transformations.
Created hnz_comparison_analysis_utils.py for statistical calculations (e.g., MAE, suitability scores).
Created nesta_hp_suitability_utils.py to filter heat pump suitability data for the relevant LAs.
Moved configurable paths, parameters, and LAs to hnz_config.py.

Improvements to Efficiency and Code Structure

Used config-based file path management to avoid hardcoded paths.
Extracted file I/O and S3 operations into helper functions.

Outputs

Each LA now generates the following output files:

<la_name>_with_desnz_hn_lsoa.gpkg: Processed GeoDataFrame with LSOA spatial joins.
<la_name>_hp_suitability_lsoas.json: List of unique LSOA codes per LA.
<la_name>_hp_suitability_scores_with_desnz.parquet: Processed suitability scores with DESNZ pilot fraction and MAE.
script_output.log: Detailed logs on calculations and data processing.

Instructions for Reviewer

Testing the Script

To run the script and verify the outputs, use:

python asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py --read_in_s3 --save_to_s3

This should output multiple data files in the relevant s3 bucket (s3://asf-heat-pump-suitability/evaluation/desnz_hn_zone_scores/) for all LAs.

Code Structure Review

Just have a look at the way I've refactored the code and make sure everything makes sense (utils and config done in the right manner, anything obvious missing in this refactoring).

Things to pay special attention to

Correctness of Outputs

Do the generated GPKG, JSON, and Parquet files contain expected data?
Are the MAE calculations and suitability scores consistent with previous results?
The plot_comparison_of_hn_zones.py should still work for the Liverpool outputs from this script, make sure that's the case.

Handling of Special Cases

For Greater Manchester, the script processes sub-LAs separately—please check if this works correctly. I think a secondary issue will come to the plotting of this region with the way the current plotting script is structured but I will come to that in the plotting PR.
LSOAs outside LA boundaries (but still in DESNZ HN zones) are not included for now—should they be? I guess ideally we would but I'm just wondering if at the moment we are happy enough with this validation of our model.

Thank you for taking the time to review this PR! Please do let me know if you have any questions.

Checklist:

lizgzil

hey @helloaidank from what I could see this looks great - thanks a lot! The script ran perfectly and the results are really encouraging, and I couldnt see any obvious bugs. Just a few suggestions.

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

lizgzil · 2025-01-14T11:19:48Z

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

+    # 6. After all LAs are processed, save the combined MAE data as CSV
+    mae_df = pl.DataFrame(la_mae_data)
+    csv_path = os.path.join(output_dir, "la_mae_data.csv")
+    mae_df.write_csv(csv_path)


would be good to save this to s3 too

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

crispy-wonton

Hi @helloaidank ,
Thanks for all this work, it looks good! I didn't test the code as Liz has done that. I didn't find any bugs or anything. I did leave some suggestions for moving functions to other existing files where they would fit in, plus a couple other minor suggestions.

I also think it might be useful to have these functions refactored so they can compare either the Nesta / conventional view scores to the HN zones. It would be interesting to see if the results are very different. This should definitely be in a separate PR and is a nice to have. Would you mind please adding it as an issue into the repo?

Thank you :D

asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py

asf_heat_pump_suitability/analysis/hn_zones/config/hnz_config.py

asf_heat_pump_suitability/analysis/hn_zones/utils/spatial_utils.py

… adding missing lsoa info to dict

…d small typos

helloaidank · 2025-01-16T17:29:57Z

Hi both @lizgzil and @crispy-wonton! Thank you both for reviewing the code , very useful suggestions, I've made the suggested changes and where I didn't, I've explained why not!

This took longer than I thought as it's clearly no longer morning 🤣

helloaidank · 2025-02-10T19:11:47Z

Hi @crispy-wonton and @lizgzil, reviving this PR, potentially it would be good to have a quick check if you're happy with the changes I've made and also explanations I've given. I've made some changes to some of the hp suitability util files and also implemented some more concise code.

Happy to discuss at stand-up as well.

crispy-wonton

@helloaidank great work here - thanks for making the changes. Everything looking good and neat. I spotted a couple of minor changes to make, but all the logic looks good. I ran the following code as given in your PR description:

python asf_heat_pump_suitability/analysis/hn_zones/comparison_of_hn_zones.py --read_in_s3 --save_to_s3

It worked well and I could see it looping through LAs and producing outputs. I interrupted it after a few iterations to save time.

crispy-wonton · 2025-02-11T16:02:16Z

asf_heat_pump_suitability/utils/save_utils.py

+    subfolder: str,
+):
+    """
+    Upload a local file to an S3 bucket, used for evaluation of our heat network suitability model using the DESNZ Heat Network pilot zones.


Suggested change

Upload a local file to an S3 bucket, used for evaluation of our heat network suitability model using the DESNZ Heat Network pilot zones.

Upload a local file to an S3 bucket.

Just to make the docstring more generic to match the function :)

crispy-wonton · 2025-02-11T16:03:05Z

asf_heat_pump_suitability/utils/save_utils.py

+    local_file_path: str,
+    s3_bucket: str,
+    s3_key_dir: str,
+    save_to_s3: bool,


save_to_s3 needs to be removed from here and the docstring

crispy-wonton · 2025-02-11T16:05:25Z

asf_heat_pump_suitability/utils/save_utils.py

+        s3_key_dir (str): S3 key (path) where the file should be uploaded.
+        save_to_s3 (bool): boolean which indicates whether to save or not the file to s3.
+        filename (str): The actual filename to store in S3.
+        subfolder (str): Subfolder within S3.


I'm a bit confused by the diff between s3_key_dir and subfolder. Aren't these, plus file name, in their entirety referred to as keys?

crispy-wonton · 2025-02-11T16:15:54Z

asf_heat_pump_suitability/analysis/hn_zones/hnz_utils/hnz_comparison_analysis_utils.py

+        dict: A dictionary containing paths for LIVERPOOL_GPKG_PATH, LSOA_SHP_PATH, and NESTA_HP_SUITABILITY_PARQUET_PATH.
+    """
+    paths = {}
+    if read_in_s3:
+        paths["LSOA_SHP_PATH"] = LSOA_SHP_PATH_S3
+        paths["NESTA_HP_SUITABILITY_PARQUET_PATH"] = NESTA_HPS_PARQUET_S3
+
+    else:
+        paths["LSOA_SHP_PATH"] = LSOA_SHP_PATH_LOCAL
+        paths["NESTA_HP_SUITABILITY_PARQUET_PATH"] = NESTA_HPS_PARQUET_LOCAL
+    return paths


dict is missing the LIVERPOOL_GPKG_PATH. Not sure which of function or docstring needs updating.

crispy-wonton · 2025-02-11T16:18:18Z

asf_heat_pump_suitability/analysis/hn_zones/hnz_utils/hnz_comparison_analysis_utils.py

+    Args:
+        gdf (gpd.GeoDataFrame): The GeoDataFrame to write.
+        output_dir (str): Local directory to save the GPKG.
+        filename_prefix (str): The prefix used for naming the GPKG file.


Suggested change

filename_prefix (str): The prefix used for naming the GPKG file.

filename_prefix (str): The prefix used for naming the GPKG file. `filename_prefix` will be joined to `_with_desnz_hn_lsoa.gpkg` to generate full filename.

crispy-wonton · 2025-02-11T16:22:10Z

asf_heat_pump_suitability/analysis/hn_zones/hnz_utils/hnz_comparison_analysis_utils.py

+    )
+
+    # Extract unique LSOA codes
+    unique_lsoa_codes = intersection_gdf["LSOA21CD"].dropna().unique().tolist()


Is there any/a lot of rows that get dropped here because of drop na?

Either way, it would be good to include a logging warning here to note how many rows are lost.

asf_heat_pump_suitability/analysis/hn_zones/utils/hnz_comparison_analysis_utils.py

scaling to all LAs and refactoring of the code

05c3a11

helloaidank changed the title ~~scaling to all LAs and refactoring of the code~~ 107 scaling to all LAs and refactoring of the code Jan 10, 2025

helloaidank changed the title ~~107 scaling to all LAs and refactoring of the code~~ 107 scaling HN comparison to all LAs and refactoring of the code Jan 10, 2025

helloaidank marked this pull request as ready for review January 10, 2025 18:01

helloaidank requested review from crispy-wonton and lizgzil January 10, 2025 18:02

lizgzil reviewed Jan 14, 2025

View reviewed changes

crispy-wonton reviewed Jan 15, 2025

View reviewed changes

helloaidank added 2 commits January 16, 2025 13:54

changes to saving outputs like la_mae to s3, streamlining of code and…

db6ea34

… adding missing lsoa info to dict

factored out utils functions into other areas, some reorganisation an…

fda196b

…d small typos

crispy-wonton reviewed Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

107 scaling HN comparison to all LAs and refactoring of the code #107

107 scaling HN comparison to all LAs and refactoring of the code #107

helloaidank commented Jan 10, 2025 •

edited

Loading

lizgzil left a comment

lizgzil Jan 14, 2025

crispy-wonton left a comment

helloaidank commented Jan 16, 2025

helloaidank commented Feb 10, 2025

crispy-wonton left a comment

crispy-wonton Feb 11, 2025

crispy-wonton Feb 11, 2025

crispy-wonton Feb 11, 2025

crispy-wonton Feb 11, 2025

crispy-wonton Feb 11, 2025

crispy-wonton Feb 11, 2025

crispy-wonton Feb 11, 2025

	Upload a local file to an S3 bucket, used for evaluation of our heat network suitability model using the DESNZ Heat Network pilot zones.
	Upload a local file to an S3 bucket.

	filename_prefix (str): The prefix used for naming the GPKG file.
	filename_prefix (str): The prefix used for naming the GPKG file. `filename_prefix` will be joined to `_with_desnz_hn_lsoa.gpkg` to generate full filename.

107 scaling HN comparison to all LAs and refactoring of the code #107

Are you sure you want to change the base?

107 scaling HN comparison to all LAs and refactoring of the code #107

Conversation

helloaidank commented Jan 10, 2025 • edited Loading

Description

Major Changes

Outputs

Instructions for Reviewer

Testing the Script

Code Structure Review

Things to pay special attention to

Correctness of Outputs

Handling of Special Cases

Checklist:

lizgzil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crispy-wonton left a comment

Choose a reason for hiding this comment

helloaidank commented Jan 16, 2025

helloaidank commented Feb 10, 2025

crispy-wonton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

helloaidank commented Jan 10, 2025 •

edited

Loading