Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Subset of SQL files with Spatial Index Creation for Review/Testing #1068

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

GeorgeRought-NOAA
Copy link
Contributor

@GeorgeRought-NOAA GeorgeRought-NOAA commented Jan 28, 2025

Following a discussion with Lorne, Shawn and Rob - I've created some initial edits to the SQL files in the viz_db_postprocess_sql section of the pipeline to test the proposed changes of spatial index creation and table analysis for tables undergoing spatial querying/processing via PostGIS.

For this initial test, I added spatial indexes to tables being spatially queried in the building fimpact sections of the ana inundation services. I'd like this to be reviewed and considered for other services.

These changes should be tested in TI, and any time savings should be noted in the github ticket before deploying broader SQL file changes to the rest of the pipeline. Initial tests on copies of tables on my EC2 instance indicated significant processing time savings.

Additions

Created spatial index for external.buildings_footprints_fame, and relevant fim_ingest.ana inundation tables used for PostGIS queries/functions.

Added "ANALYZE" to update table statistics after adding spatial index and for more efficient query planning within PostgreSQL.

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Windows
  • Linux
  • Browser

Accessibility

  • Keyboard friendly
  • Screen reader friendly

Other

  • Is useable without CSS
  • Is useable without JS
  • Flexible from small to large screens
  • No linting errors or warnings
  • JavaScript tests are passing

…and ANALYZE table feature for testing query efficiency improvements.
@shawncrawley
Copy link
Collaborator

I don't think we need to be checking and potentially creating an index on the external.building_footprints_fema table during every pipeline run since that table is static. I'd rather just ensure that the index is created as a documented manual procedure - preferrably as part of the Jupyter Notebook used to create the table in the first place. Alternatively or additionally it could be added to our "Deployment Checklist" template as a manual check/apply just prior to a deployment to UAT.

As per the index being added to the fim_ingest tables. Maybe keeping them where they are at now is the path of least resistance, but I think there'd likely be a way to add it to a single spot that standardizes it being applied to all tables in the fim_ingest schema. Let me glance at where that might occur... Yep, I think we should just add it right after the table is dynamically created here: https://github.com/NOAA-OWP/hydrovis/blob/ti/Core/LAMBDA/viz_functions/viz_db_postprocess_sql/fim_caching_templates/0_create_or_truncate_tables.sql#L36-L42.

Let me know what you think about these suggestions.

@GeorgeRought-NOAA
Copy link
Contributor Author

@shawncrawley You're right, the spatial index only needs to be created once on external.building_footprints_fema. I considered adding it to this python script:

https://github.com/NOAA-OWP/hydrovis/blob/ti/Source/Visualizations/aws_loosa/utils/db_load_fema_building_footprints.py

But to run it, I'd need to grab the fema buildings layer as an input, and it'd make changes to the table. For the purpose of this test, I just wanted a simple way to insert this into a small subset of the pipeline so we can compare the run time on these specific queries before making changes across the board. (Running the pipeline/ documenting runtime differences is something I'd need help with).

For the fim_ingest tables, yes this was simply a path of least resistance thing for testing, and I wanted to somewhat isolate the impact of the change. I wasn't sure exactly what tables would be impacted by adding the spatial index to your linked sql file, but yes for the final implementation, the spatial indexes should be created dynamically where possible and not ran in every single sql file.

For the purposes of a sanity check/ documenting potential improvements, does my thinking make sense to you? Or do you think since it's TI, I should go ahead and add these changes to where tables are created dynamically for broader changes?

@shawncrawley
Copy link
Collaborator

Not sure what was going on with my brain, because although I read your title and description, I completely ignored all the cues to the fact that this is intended as a temporary test/check. Because it is TI, I wouldn't mind it being implemented broadly, but I'm also okay with it as is for the sake of an isolated test. We can go ahead and give this a shot.

@GeorgeRought-NOAA GeorgeRought-NOAA marked this pull request as ready for review January 29, 2025 16:27
@shawncrawley shawncrawley merged commit 2abaaf9 into ti Jan 30, 2025
1 check passed
@shawncrawley shawncrawley deleted the faster-queries-vizdb-ana-inundation branch January 30, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Priority - Low Task 3 GAMA Task 3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Updated SQL Files for Subset Viz DB Postprocess Folder for Approval
3 participants