Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

08 add map #9

Draft
wants to merge 29 commits into
base: dev
Choose a base branch
from
Draft

08 add map #9

wants to merge 29 commits into from

Conversation

crispy-wonton
Copy link
Contributor

@crispy-wonton crispy-wonton commented Oct 17, 2023


Fixes #8

Changes:

  • add kepler map code to repo and all the supporting functions including utils.py file for using Arial font
  • add documentation of data versions for October 2023 work to README

Instructions for Reviewer

  • please could you checkout the branch and run the code as instructed by the README and test that you get hp_map.html produced and it looks correct

Checklist:

  • I have refactored my code out from notebooks/
  • I have checked the code runs
  • I have tested the code
  • I have run pre-commit and addressed any issues not automatically fixed
  • I have merged any new changes from dev
  • I have documented the code
    • Major functions have docstrings
    • Appropriate information has been added to READMEs
  • I have explained this PR above
  • I have requested a code review

crispy-wonton and others added 29 commits September 21, 2023 17:54
improve and correct set up instructions and update output file extensions to html.
update local_data_dir to use relative file path from config base.yaml so config file can be updated by users rather than editing scripts.
change output graphs from produce_plots.py from .png to .html files to avoid altair CalledProcessError
- add argparser to allow user to specify local_data_dir, epc-, and mcs-batch
- add function to load MCS data directly from S3 bucket
- add function to check for EPC data locally and download from S3 if not located
- update existing functions to work with above
- add global parameters for EPC processing version to base.yaml config file
update instructions in README to explain new way of running script and to record batches for historical analyses
add script to generate stats and charts from asf_senedd_response into produce_plots.py
merge getters from loading.py in asf_senedd_response into get_data.py script
update EPC getter function name to match updates in get_data.py
add files:
- translation_config for welsh translations
- plotting.py for plotting functions
- augmenting.py for data processing and augmentation
update to prevent module not subscriptable error
allows user to specify which directory with supplementary data to use, meaning new supplementary data can be added to analysis as it's updated
Copy link
Contributor

@sofiapinto sofiapinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @crispy-wonton ,

thanks for this PR - the code for the map looks good. I didn't look to closely at the specific Kepler function calls, but overall looks like it's doing what is supposed to. Also haven't tested if the numbers make sense as in, if areas of higher percentage are correct.

When I open the HTML it shows San Francisco are. I had to zoom out and then look for wales. There my be a way of centering the map in wales. Something like:

wales_map = KeplerGl(...)
wales_map.config = {
    'version': 'v1',
    'config': {
        'mapState': {
            'latitude': xxxx,  # Specific latitude value
            'longitude':xxx,  # Specific longitude value
            'zoom': 10  # Specify the desired zoom level
        }
    }
}

I also gave a suggestion of how to re-define the ranges in the color scale, but haven't tested it so i'm not sure it works.

The code doesn't run as is, given the changes done in the other branch, so I had to change a few things around and comment parts of the code so that the map would be generated, but i managed to generate it after those changes. I'm happy to re-run the code to make sure if runs smoothly after you incorporate changes from main (after merging the other branch).

- You can specify which batch of EPC data to download and MCS data to load from S3 by passing the `--epc_batch` and `--mcs_batch` arguments, both
default to downloading/loading the newest data from S3, respectively. You can also specify which set of supplementary data should be used by passing
the `--supp_data` argument followed by the name of the directory, e.g. data_202310. See the `Historical analyses` section below to see which version was used for each analysis.
Run `python asf_welsh_energy_consultation/analysis/produce_plots.py -h` for more info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be changed after you merge the PR number 3, as it's now produce_plots_and_stats.py

Comment on lines +72 to +80
October 2023 analysis:

- Supplementary data: data_202310
- EPC: 2023_Q2_complete (preprocessed)
- mcs_installations_231009.csv
- mcs_installations_epc_full_231009.csv
- off-gas-live-postcodes-2022.xlsx - check [here](https://www.xoserve.com/a-to-z/) for updates
- rurality.ods - 2011 Rural Urban Classification for small area geographies, see [here](https://www.ons.gov.uk/methodology/geography/geographicalproducts/ruralurbanclassifications)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! :)

@@ -445,3 +445,19 @@ def load_wales_hp(wales_epc):
wales_hp = wales_epc.loc[wales_epc.HP_INSTALLED].reset_index(drop=True)

return wales_hp


def pc_to_coords_df():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the dostrings :)
I've been finding it helpful to have the autodosctring extension in VSCode. It might also exist for your code editor.

@@ -0,0 +1,9 @@
def arial():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing docstring for this function. Why not use Averta?

@@ -7,6 +7,7 @@ matplotlib
odfpy
selenium==4.2.0
argparse==1.4.0
keplergl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to set the version?

@@ -277,6 +278,36 @@ def get_mcs_retrofits():
return mcs_retrofits


def generate_hex_counts(wales_df, pc_df):
Copy link
Contributor

@sofiapinto sofiapinto Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or something similar.. just changing names to make it more obvious what they represent. not required, just a suggestion :)

Suggested change
def generate_hex_counts(wales_df, pc_df):
def generate_hex_counts(wales_epc, location_info):

wales_df_coords = pd.merge(
wales_df, pc_df, on=["POSTCODE"]
) # merge EPC with postcode df
wales_df_hex = add_hex_id(wales_df_coords, 6) # add H3 hex id to each row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the 6 mean?


def plot_kepler_graph(base_data, filename):
hex_map = KeplerGl(height=500)
hex_map.add_data(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To define the ranges appearing in the map i think you can do something like this:

# Define the custom color scale
custom_color_scale = {
    "name": "Custom",  # You can use any name you prefer
    "type": "sequential",
    "domain": [0, 0.1, 0.5, 1],  # Change this to whatever you want :)
    "colors": ["#FF0000", "#FFFF00", "#00FF00", "#0000FF"]  # Define colors for each range
}

# Add data with the custom color scale
hex_map.add_data(
    data=base_data[["perc_true", "hex_id"]],
    name="Heat pump proportions",
    color_scale=custom_color_scale
)

def plot_kepler_graph(base_data, filename):
hex_map = KeplerGl(height=500)
hex_map.add_data(
data=base_data[["perc_true", "hex_id"]], name="Heat pump proportions"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way to go around the name that appears on the plot is to rename "perc_true" before plotting.

file_name=os.path.join(fig_output_path["english"], f"{filename}.html")
)

print("Saved: " + filename + ".html")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is saving to the english folder. is that what you want?

@sofiapinto
Copy link
Contributor

Something else I remembered is that we might want to be strict and only show percentages when there's enough data for a specific hexagon, e.g. only show data for hexagon if denominator (of %) is above X (where X needs to be pre-set by us). I think we have enough data for each hexagon, but maybe worth checking. Hopefully it's not too much trouble - just checking the values of "total" for each hexagon.

Base automatically changed from 03_merge_asf_senedd_repo to dev October 30, 2023 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants