Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor: Safety- Street lights/Nighttime lights #53

Open
Tracked by #15
amyburness opened this issue Jun 21, 2024 · 25 comments
Open
Tracked by #15

Factor: Safety- Street lights/Nighttime lights #53

amyburness opened this issue Jun 21, 2024 · 25 comments
Assignees
Labels
🔍️ Factor Analysis factors 🏙️ Place characterization Dimension - place characterization

Comments

@amyburness
Copy link

amyburness commented Jun 21, 2024

Mapillary

endpoint: object--street-light

This indicator should be calculated by creating 20-meter buffers around streetlights. Raster cells in which 80-100% of their area intersects with the buffers should receive a score of 5. Rasters with 60-79% intersection should be scored as 4, 40-59% as 3, 20-39% as 2, and 1-19% as 1. Areas without any overlap with streetlight buffers should be scored as 0. NOTE: Use nighttime lights only if street lights are absent.

Nighttime Lights

Nighttime lights observation data from a Low light imaging sensor part of the Visible and Infrared Imaging Suite (VIIRS) Day Night Band (DNB). Unit - (average, average-masked, max, median, median-masked, min) nW/cm2/sr, CRS - EPSG:4326 (Geographic Latitude/Longitude), Resolution - 15 arc second (~500m at the Equator)

@timlinux timlinux added the 🔍️ Factor Analysis factors label Jun 21, 2024
@amyburness amyburness added the 🏙️ Place characterization Dimension - place characterization label Jun 21, 2024
@carolinamayh
Copy link
Collaborator

carolinamayh commented Jul 1, 2024

If data for streetlights or nighttime lights is unavailable, or if the user has disaggregated statistics, they should have the option to provide data on women's perceived safety at the municipal, district, state, or any other required level. The tool would then standardize these scores, percentages, or statistics on a scale from 0 to 5, where 5 indicates the lowest level of violence or the highest level of perceived safety.

@osundwajeff osundwajeff self-assigned this Jul 22, 2024
@osundwajeff osundwajeff added the Size 3 It will take me between 2 hours to half a day label Jul 22, 2024
@javaftw
Copy link
Collaborator

javaftw commented Jul 24, 2024

SAFStreetLight Function Overview:

Image

Processes streetlight vector data to create a raster of safety scores based on coverage. For detailed explanation, see docs/dev_notes/README.md in the issue-53-factor-safety branch.

Constants:

Light area radius: 20 meters
Score bins: [0, 1, 20, 40, 60, 80, 100]
Scores: [0, 0, 1, 2, 3, 4, 5]

Function:

  • Setup and input processing
  • Buffer streetlight points (20m)
  • Rasterize buffered vector
  • Calculate coverage using convolution
  • Assign safety scores
  • Output final raster

Notes:

  • Uses QGIS for buffering, GDAL for rasterization, and numpy for calculations
  • Called by SAFnightTimeLights when input is vector data

Substitute Streetlights Data Generation

Due to the unavailability of actual streetlight location data in the sample dataset and limited time for sourcing, a method was devised to generate substitute data based on available road network information.
Data Source and Preprocessing
The primary data source was the roads shapefile for the area of interest. A subset of roads was extracted based on the following criteria:

'Residential' roads from the 'highways' category
'Tertiary', 'Secondary', and 'Primary' roads, which were merged into a single category of major roads

This extraction process provided a foundation for estimating streetlight distributions.

Image

Residential Streetlight Generation
For residential roads, points were scattered along the geometry at 50-meter intervals. This interval was chosen to approximate the typical spacing of streetlights in residential areas.

Image

Major Road Streetlight Generation
The process for major roads involved additional steps:

The merged major roads were filtered based on segment lengths to exclude longer cross-country roads that typically have less frequent lighting.
Points were then scattered along the remaining geometry at 150-meter intervals, reflecting the generally wider spacing of streetlights on major roads compared to residential areas.

Image

Resulting Distribution
This methodology produces a calculated estimation of light source distribution for development purposes. The resulting point dataset approximates streetlight locations based on road type and typical lighting patterns.

Image
The final distribution provides a reasonable proxy for actual streetlight locations, allowing for the development and testing of the SAFStreetLight function in the absence of real-world data.

Image

@javaftw
Copy link
Collaborator

javaftw commented Jul 24, 2024

"Perceived Safety Data" handling is still outstanding

@osundwajeff
Copy link
Collaborator

@javaftw

@javaftw
Copy link
Collaborator

javaftw commented Jul 24, 2024

@osundwajeff

@osundwajeff osundwajeff assigned javaftw and unassigned osundwajeff Jul 30, 2024
@javaftw
Copy link
Collaborator

javaftw commented Jul 31, 2024

If data for streetlights or nighttime lights is unavailable, or if the user has disaggregated statistics, they should have the option to provide data on women's perceived safety at the municipal, district, state, or any other required level. The tool would then standardize these scores, percentages, or statistics on a scale from 0 to 5, where 5 indicates the lowest level of violence or the highest level of perceived safety.

Hi @carolinamayh

Could you please elaborate on and/or provide examples of the file format of the perceived safety and disaggregated statistics data? Also, in this regard, could you please verify that the adjusted description text, as per the image, is correct?

Image

@carolinamayh
Copy link
Collaborator

@osundwajeff @javaftw As mentioned yesterday, if streetlights data or nightlights data is not available, the user will have the option to input a safety score for the country or provide disaggregated scores by administrative units. The indicator used will be the number of homicides per 100,000 people, standardized based on the following scale:

Score 5 (Safest): 0 to 1 homicides per 100,000 people
Score 4: 1.1 to 3 homicides per 100,000 people
Score 3: 3.1 to 6 homicides per 100,000 people
Score 2: 6.1 to 10 homicides per 100,000 people
Score 1: 10.1 to 15 homicides per 100,000 people
Score 0 (Least Safe): More than 15 homicides per 100,000 people

@javaftw
Copy link
Collaborator

javaftw commented Aug 1, 2024

or provide disaggregated scores by administrative units

Hi @carolinamayh (Hennie here),

While the information about "the user will have the option to input a safety score for the country" is useful (there are UI elements in the plugin which performs this action), the part about "or provide disaggregated scores by administrative units" is unclear, despite the accompanying breakdown of the scoring mechanic.
Is is a shapefile? A csv? Something else? How is the user to interact with it and the fields it contains?
Could you please shed some more light on this, thanks!

@javaftw
Copy link
Collaborator

javaftw commented Aug 2, 2024

@dragosgontariu
Copy link
Collaborator

  1. there is no option for street lights as per specifications
  2. there is no option for disaggregated data per administrative unit as per specifications and email communications
  3. nighttime light data throws the following error:

2024-08-07T12:32:58 INFO Results: {'OUTPUT': 'temp/countryUTMLayerBuf.shp'}
2024-08-07T12:32:58 INFO gdalwarp -overwrite -t_srs EPSG:32620 -tr 100.0 100.0 -r near -te 705589.018038702 1514124.5410898936 731809.9074413507 1562729.1977703366 -te_srs EPSG:32620 -of GTiff D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif temp/tempResample.tif
2024-08-07T12:32:58 INFO GDAL command:
2024-08-07T12:32:58 INFO gdalwarp -overwrite -t_srs EPSG:32620 -tr 100.0 100.0 -r near -te 705589.018038702 1514124.5410898936 731809.9074413507 1562729.1977703366 -te_srs EPSG:32620 -of GTiff D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif temp/tempResample.tif
2024-08-07T12:32:58 INFO GDAL command output:
2024-08-07T12:32:58 INFO Process completed successfully
2024-08-07T12:32:58 INFO Results: {'OUTPUT': 'temp/tempResample.tif'}
2024-08-07T12:32:58 INFO gdal_calc.bat --overwrite --calc "A1000" --format GTiff --type Int32 -A temp/tempResample.tif --A_band 1 --outfile temp/tempCalc.tif
2024-08-07T12:32:58 INFO GDAL command:
2024-08-07T12:32:58 INFO gdal_calc.bat --overwrite --calc "A
1000" --format GTiff --type Int32 -A temp/tempResample.tif --A_band 1 --outfile temp/tempCalc.tif
2024-08-07T12:32:58 INFO GDAL command output:
2024-08-07T12:32:59 CRITICAL Process returned error code 1
2024-08-07T12:32:59 INFO Results: {'OUTPUT': 'temp/tempCalc.tif'}
2024-08-07T12:32:59 INFO Creating output file that is 262P x 486L.
Processing D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif [1/1] : 0Using internal nodata values (e.g. 3.4e+38) for image D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif.
Copying nodata values from source D:\Munca\WB\Geest\St_Lucia\SHPs\Nighttime_lights_2021.tif to destination temp/tempResample.tif.
...10...20...30...40...50...60...70...80...90...100 - done.
2024-08-07T12:32:59 CRITICAL RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem .
Traceback (most recent call last):
File "C:\PROGRA1\QGIS 3.32.0\apps\Python39\Scripts\gdal_calc.py", line 8, in
2024-08-07T12:32:59 CRITICAL from osgeo_utils.gdal_calc import * # noqa
File "C:\PROGRA
1\QGIS 3.32.0\apps\Python39\lib\site-packages\osgeo_utils\gdal_calc.py", line 46, in
2024-08-07T12:32:59 CRITICAL from osgeo import gdal, gdal_array
File "C:\PROGRA~1\QGIS 3.32.0\apps\Python39\lib\site-packages\osgeo\gdal_array.py", line 13, in
2024-08-07T12:32:59 CRITICAL from . import _gdal_array
ImportError: numpy.core.multiarray failed to import

Traceback (most recent call last):
File "rasterio\_base.pyx", line 69, in rasterio._base.get_dataset_driver
File "rasterio\_err.pyx", line 221, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: temp/tempCalc.tif: No such file or directory

         During handling of the above exception, another exception occurred:
         
         Traceback (most recent call last):
          File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 2952, in SAFnightTimeLights
          with rasterio.open(tempCalc, "r+") as src:
          File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\env.py", line 451, in wrapper
          return f(*args, **kwds)
          File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\__init__.py", line 334, in open
          dataset = get_writer_for_path(path, driver=driver)(
          File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\io.py", line 299, in get_writer_for_path
          driver = get_dataset_driver(path)
          File "rasterio\\_base.pyx", line 74, in rasterio._base.get_dataset_driver
         TypeError: temp/tempCalc.tif: No such file or directory

@osundwajeff @mvmaltitz @javaftw

@mvmaltitz mvmaltitz added this to the Indicator Enhancements milestone Aug 13, 2024
@javaftw
Copy link
Collaborator

javaftw commented Aug 13, 2024

Input Type: Raster

Image

Input Type: Shapefile (Points):

Image

Input Type: Shapefile (Polygons)

When the input is a polygon layer, the user may choose the field of interest within that layer to process. If the field contains text (and not numeric) values, the name of the field is appended with (text)

Image

If the user chooses a text element in the dropdown, values may be assigned to the unique text fields

Image

The result of the above operation, using user-assigned text field values:

Image

The results of using a numeric field:

Image

Input Type: User Value

If no input file is specified, the user value is evaluated:

Image

@javaftw
Copy link
Collaborator

javaftw commented Aug 14, 2024

Improvement of point rendering method

Image

@mvmaltitz mvmaltitz removed the Size 3 It will take me between 2 hours to half a day label Aug 15, 2024
@ClaraIV
Copy link

ClaraIV commented Aug 21, 2024

@osundwajeff @mvmaltitz we will provide detailed guidance on the processing of the NTL data for this factor today;

@bennyistanto
Copy link
Collaborator

bennyistanto commented Aug 21, 2024

Enhanced Night Time Light (NTL) Classification for GEEST

Approach and Rationale

We propose refining the NTL classification in GEEST to better reflect local conditions, particularly for small island countries. Instead of using fixed thresholds or coverage percentages, we'll employ local statistics to create a more contextually relevant classification. This approach is supported by Elvidge et al. (2013), who emphasize the importance of considering local context in night-time light analysis.

Standard Classification Procedure:

  1. Clip the global NTL raster to the country boundary.
  2. Calculate statistics for the clipped raster: min, max, mean, median, and 75th percentile.
    • The use of percentiles in NTL analysis is supported by Levin & Zhang (2017).
  3. Apply the classification scheme based on local statistics.

Standard Classification Scheme:

GEEST Class Description NTL Value Range
0 No Access 0 - 0.05
1 Very Low > 0.05 - 0.25 * Median
2 Low > 0.25 * Median - 0.5 * Median
3 Moderate > 0.5 * Median - Median
4 High > Median - 75th percentile
5 Very High > 75th percentile

This classification approach aligns with the methods used by Chen & Nordhaus (2011) in using luminosity data as a proxy for socio-economic factors.

QGIS Implementation:

For implementation in QGIS, use the Raster Calculator with the following expression:

(NTL <= 0.05) * 0 + 
(NTL > 0.05 AND NTL <= [0.25 * median]) * 1 + 
(NTL > [0.25 * median] AND NTL <= [0.5 * median]) * 2 + 
(NTL > [0.5 * median] AND NTL <= [median]) * 3 + 
(NTL > [median] AND NTL <= [75th_percentile]) * 4 + 
(NTL > [75th_percentile]) * 5

Replace [median] and [75th_percentile] with the actual calculated values.

Example Python Script for Calculating Statistics:

Here's a simple Python script to calculate the required statistics for a GeoTIFF raster:

import rasterio
import numpy as np

def calculate_raster_stats(raster_path):
    with rasterio.open(raster_path) as src:
        data = src.read(1)
        data = data[data != src.nodata]  # Remove no data values
        
        min_val = np.min(data)
        max_val = np.max(data)
        mean_val = np.mean(data)
        median_val = np.median(data)
        percentile_75 = np.percentile(data, 75)
        
        return {
            'min': min_val,
            'max': max_val,
            'mean': mean_val,
            'median': median_val,
            '75th_percentile': percentile_75
        }

# Usage
raster_path = 'path/to/your/iso3_ntl.tif'
stats = calculate_raster_stats(raster_path)
print(stats)

This script can be integrated into the GEEST QGIS plugin to automatically calculate the statistics needed for the classification.

Benefits:

  • Adapts to local lighting conditions
  • Provides meaningful classification for areas with varying NTL intensities
  • Maintains consistency with the original GEEST approach while improving granularity

Additional Note: Handling Areas with Low NTL Values:

If the NTL values within the Area of Interest (AOI) are predominantly or entirely below 0.05, which is our baseline for electricity access, we need to adjust our approach. Here's how to proceed:

  1. Check the data range:
    After clipping the NTL raster to our AOI, examine the statistics, particularly the maximum value.

  2. If max value < 0.05:

    • This indicates that the entire area falls below our standard threshold for electricity access.
    • In this case, we need to create a relative scale based on the available data. This approach is informed by Small et al. (2005), who discuss methods for analyzing areas with low light emissions.
  3. Adjusted classification for low-value areas:
    Use this modified classification scheme based on the local maximum value:

Low NTL Classification Scheme:

GEEST Class Description NTL Value Range
0 No Light 0
1 Very Low > 0 - 0.2 * max_value
2 Low > 0.2 * max_value - 0.4 * max_value
3 Moderate > 0.4 * max_value - 0.6 * max_value
4 High > 0.6 * max_value - 0.8 * max_value
5 Highest > 0.8 * max_value - max_value

This adaptive thresholding approach is conceptually similar to methods described by Otsu (1979) for image classification.

QGIS Implementation:

For implementation in QGIS, use the Raster Calculator with the following expression:

with_variable('max_val', maximum("NTL@1"),
    (NTL = 0) * 0 + 
    (NTL > 0 AND NTL <= @max_val * 0.2) * 1 + 
    (NTL > @max_val * 0.2 AND NTL <= @max_val * 0.4) * 2 + 
    (NTL > @max_val * 0.4 AND NTL <= @max_val * 0.6) * 3 + 
    (NTL > @max_val * 0.6 AND NTL <= @max_val * 0.8) * 4 + 
    (NTL > @max_val * 0.8) * 5
)

Note: Replace "NTL@1" with our actual raster layer name.

Additional Considerations:

Temporal Variability:

It's important to note that NTL data can vary seasonally and annually. As Xie et al. (2019) demonstrate, temporal variations in artificial nighttime lights can provide insights into urbanization patterns and socio-economic changes. For GEEST, consider using multi-year averaged NTL data to mitigate short-term fluctuations and capture more stable patterns of lighting infrastructure.

Limitations and Caveats:

Users of this enhanced NTL classification in GEEST should be aware of certain limitations:

  • Overglow Effect: NTL data can suffer from the "overglow" effect, where light appears to extend beyond its actual source (Small et al., 2005). This may lead to overestimation of lit areas in some regions.
  • Saturation in Urban Centers: In highly urbanized areas, NTL sensors may saturate, leading to a loss of differentiation in the brightest areas (Elvidge et al., 2013).
  • Rural Underrepresentation: In sparsely populated or rural areas, the resolution of NTL data may not capture small-scale lighting infrastructure adequately.

References:

  1. Elvidge, C. D., Baugh, K. E., Zhizhin, M., & Hsu, F.-C. (2013). Why VIIRS data are superior to DMSP for mapping nighttime lights. Proceedings of the Asia-Pacific Advanced Network, 35(0), 62. https://doi.org/10.7125/APAN.35.7
  2. Levin, N., & Zhang, Q. (2017). A global analysis of factors controlling VIIRS nighttime light levels from densely populated areas. Remote Sensing of Environment, 190, 366–382. https://doi.org/10.1016/j.rse.2017.01.006
  3. Small, C., Pozzi, F., & Elvidge, C. D. (2005). Spatial analysis of global urban extent from DMSP-OLS night lights. Remote Sensing of Environment, 96(3-4), 277-291. https://doi.org/10.1016/j.rse.2005.02.002
  4. Chen, X., & Nordhaus, W. D. (2011). Using luminosity data as a proxy for economic statistics. Proceedings of the National Academy of Sciences, 108(21), 8589-8594. https://doi.org/10.1073/pnas.1017031108
  5. N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, Jan. 1979, https://doi.org/10.1109/TSMC.1979.4310076
  6. Xie, Y., Weng, Q., & Fu, P. (2019). Temporal variations of artificial nighttime lights and their implications for urbanization in the conterminous United States, 2013–2017. Remote Sensing of Environment, 225, 160-174. https://doi.org/10.1016/j.rse.2019.03.008

@ClaraIV
Copy link

ClaraIV commented Aug 22, 2024

@osundwajeff @mvmaltitz please see above, grateful if you could implement the NTL processing accordingly. I will move this to Work in Progress, thank you!

@javaftw
Copy link
Collaborator

javaftw commented Aug 22, 2024 via email

@carolinamayh
Copy link
Collaborator

@osundwajeff, please note that the score 5 ( Very High) is > 75th percentile

@dragosgontariu
Copy link
Collaborator

dragosgontariu commented Aug 22, 2024

@osundwajeff @mvmaltitz @javaftw

When raster data input (nighttime lights) it still gets:

Traceback (most recent call last):
File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3266, in SAFRasterizerDelegator
self.SAFnightTimeLights()
File "C:\Users/Dragos/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3550, in SAFnightTimeLights
with rasterio.open(tempCalc, "r+") as src:
File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\env.py", line 451, in wrapper
return f(*args, **kwds)
File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio_init_.py", line 334, in open
dataset = get_writer_for_path(path, driver=driver)(
File "C:\Users\Dragos\AppData\Roaming\Python\Python39\site-packages\rasterio\io.py", line 299, in get_writer_for_path
driver = get_dataset_driver(path)
File "rasterio\_base.pyx", line 74, in rasterio._base.get_dataset_driver
TypeError: temp/tempCalc.tif: No such file or directory

Also for the points input it gets this:

Image

where we can see that there are street lights but no data at all

@bennyistanto
Copy link
Collaborator

Good morning Benny, Thank you for this information. Just a few quick questions: - Can you confirm the source and resolution of the NTL data to be used? Is it already available, or do I need to obtain it? - Are there any specific guidelines for the statistics (esp. handling outliers, excluding certain values)? - What should be done if the calculated median/90th percentile is very close or equal to the max value? Kind regards, Hennie Kotze

Hi Hennie

Answering your question below:

  1. After discussed with @ClaraIV, for now let's use the NTL data from this link https://eogdata.mines.edu/nighttime_light/annual/v22/2023/VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.tif.gz This is the latest annual global composite for 2023, it's 9GB compressed file, and become 11GB as GeoTIFF

You can use the clipped version below for LCA testing
lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.zip

  1. As the data is categorized as ready-to-use, I don't think there is any additional process to use the data. From the website, handling outlier and other issue are more focused on daily data.

  2. For the last question on median/75th percentile value are very close to the max. I create below flowchart to provides a clear visual representation of the decision-making process in the classification of NTL data, accounting for different scenarios based on the data characteristics.

graph TD
    A[Download NTL data] --> B[Gunzip data]
    B --> C[Clip using country boundary]
    C --> D[Check max_value]
    D -->|max_value > 0.05| E[Check median and 75th percentile]
    D -->|max_value <= 0.05| F[Use modified classification scheme]
    E -->|Difference > 5% of max_value| G[Use original classification scheme]
    E -->|Difference <= 5% of max_value| F
    G -->|Use median and percentile| H[Classify NTL data]
    F -->|Use max_value| H
    H --> I[Output classified NTL map]

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333,stroke-width:2px
    style C fill:#bbf,stroke:#333,stroke-width:2px
    style D fill:#fbb,stroke:#333,stroke-width:2px
    style E fill:#fbb,stroke:#333,stroke-width:2px
    style F fill:#bfb,stroke:#333,stroke-width:2px
    style G fill:#bfb,stroke:#333,stroke-width:2px
    style H fill:#fbf,stroke:#333,stroke-width:2px
    style I fill:#ff9,stroke:#333,stroke-width:2px
Loading

This diagram illustrates the process as follows:

  1. Download the NTL data from the provided URL.
  2. Gunzip the downloaded data.
  3. Clip the data using the country boundary.
  4. Check the max_value of the clipped NTL data.
  5. If max_value > 0.05:
    • Check if the median or 75th percentile is within 5% of the max_value.
    • If the difference is > 5% of max_value, use the original classification scheme (median and percentile based).
    • If the difference is <= 5% of max_value, use the modified classification scheme.
  6. If max_value <= 0.05:
    • Use the modified classification scheme directly.
  7. Classify the NTL data using the selected scheme.
  8. Output the classified NTL map.

Original Classification Scheme:

GEEST Class Description NTL Value Range
0 No Access 0 - 0.05
1 Very Low > 0.05 - 0.25 * Median
2 Low > 0.25 * Median - 0.5 * Median
3 Moderate > 0.5 * Median - Median
4 High > Median - 75th percentile
5 Very High > 75th percentile

Modified Classification Scheme:

GEEST Class Description NTL Value Range
0 No Light 0
1 Very Low > 0 - 0.2 * max_value
2 Low > 0.2 * max_value - 0.4 * max_value
3 Moderate > 0.4 * max_value - 0.6 * max_value
4 High > 0.6 * max_value - 0.8 * max_value
5 Highest > 0.8 * max_value - max_value

@mvmaltitz mvmaltitz changed the title Factor: Safety- Street lights/Nigthttime ligths Factor: Safety- Street lights/Nighttime lights Aug 22, 2024
@mvmaltitz
Copy link

@dragosgontariu
Test Results:
Perceived Safety Value = 80

Image

Street lights (points)
Image

Image

Nighttime lights (raster)

Image

@bennyistanto
Copy link
Collaborator

@mvmaltitz I've obtained different results for the NTL classification for LCA. Here are my findings:

  • Max Value: 28.618135
  • Median: 1.060158
  • 75th Percentile: 2.263107

The resulting classification map is shown below:
geest_ntl_6class

You can review the code I used to generate this map here.

Could you please compare these results with your findings? I'm particularly interested in understanding any discrepancies in the classification thresholds or the final map output.

@ClaraIV
Copy link

ClaraIV commented Aug 26, 2024

@mvmaltitz @osundwajeff please ensure that the descriptive text on the tab is updated and explains briefly the processing of each data type. Also, please rename this factor "Safety" as per the indications in the Indicators excel --> remove "safe urban design"

@javaftw
Copy link
Collaborator

javaftw commented Aug 26, 2024

@bennyistanto Hi Benny
I have modified the classification algorithm as per your instructions.

Modified algo + original supplied input data data:

Input:

Image

Output:

Image

Modified algo + downloaded input data data:

(Data download source )
Input:

Image

Output:

Image

Code snippet:

                # Get valid data (non-NaN values)
                valid_data = ntl_data[~np.isnan(ntl_data)]

                max_value = np.max(valid_data)
                median = np.median(valid_data)
                percentile_75 = np.percentile(valid_data, 75)

                print(f"Max Value: {max_value:.6f}")
                print(f"Median: {median:.6f}")
                print(f"75th Percentile: {percentile_75:.6f}")

                # Determine classification scheme
                if max_value <= 0.05 or (max_value - percentile_75) <= 0.05 * max_value:
                    print("Using max_value classification scheme")
                    classification = np.full_like(ntl_data, 0, dtype=np.uint8)
                    classification[(ntl_data > 0) & (ntl_data <= 0.2 * max_value)] = 1
                    classification[(ntl_data > 0.2 * max_value) & (ntl_data <= 0.4 * max_value)] = 2
                    classification[(ntl_data > 0.4 * max_value) & (ntl_data <= 0.6 * max_value)] = 3
                    classification[(ntl_data > 0.6 * max_value) & (ntl_data <= 0.8 * max_value)] = 4
                    classification[ntl_data > 0.8 * max_value] = 5
                else:
                    print("Using original classification scheme")
                    classification = np.full_like(ntl_data, 0, dtype=np.uint8)
                    classification[(ntl_data > 0.05) & (ntl_data <= 0.25 * median)] = 1
                    classification[(ntl_data > 0.25 * median) & (ntl_data <= 0.5 * median)] = 2
                    classification[(ntl_data > 0.5 * median) & (ntl_data <= median)] = 3
                    classification[(ntl_data > median) & (ntl_data <= percentile_75)] = 4
                    classification[ntl_data > percentile_75] = 5

                # Set NaN values in the original data to 255
                classification[np.isnan(ntl_data)] = 255

                # Update metadata for output
                out_meta.update(dtype=rasterio.uint8, nodata=255)

Using the same approach in both cases using different but very similar input data, the results appear to be significantly different.

@bennyistanto
Copy link
Collaborator

Hey Hennie @javaftw,

I've figured out why our maps look different. The classification patterns we're seeing on the map are different because we're getting different statistical values, especially for the median and 75_percentile.

Here's what I'm guessing you got:

  • Max Value: 28.618135
  • Median: 0.458617
  • 75th Percentile: 1.280255

And here's what I'm seeing:

  • Max Value: 28.618135
  • Median: 1.060158
  • 75th Percentile: 2.263107

The reason for this difference? It's probably how we're dealing with those pesky nodata values.

When we clip the raster with the country boundary, we might end up with areas outside the boundary that aren't properly marked as nodata. If these areas are included as zeros in our calculations, it could potentially lower our median and 75_percentile values, depending on how many of these zero values there are compared to the valid data points inside the country boundary.

I took a look at the GitHub code (https://github.com/kartoza/GEEST/blob/dev/gender_indicator_tool/gender_indicator_tool.py, lines 3640-3685), and I noticed it's following the Colab script. The thing is, Colab handles NaN values correctly during clipping, but in the gender_indicator_tool.py, especially in the gdal:cliprasterbymasklayer part, nodata is set to 0.

I tried to make the Colab script work in the PyQGIS console by tweaking how we set the NODATA value. The goal was to make sure the areas outside the boundary are properly recognized as NODATA. I've got the full code below if you want to take a look.

Screenshot 2024-08-27 194434
https://gist.github.com/bennyistanto/7ec367ba090468ca0e6e888dd13f7f3e

After testing in both Colab and PyQGIS with this updated script, I'm getting the same stats for min, max, median, and 75_percentile.

QGIS test

import numpy as np
from qgis.core import QgsRasterLayer

# File path to your raster
raster_path = r'D:\temp\geest\factor_safety\ntl\lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.tif'

# Load the raster layer
raster_layer = QgsRasterLayer(raster_path, "NTL")

# Check if the raster layer is valid
if not raster_layer.isValid():
    print("Error: Raster failed to load.")
else:
    provider = raster_layer.dataProvider()
    extent = raster_layer.extent()
    cols = raster_layer.width()
    rows = raster_layer.height()
    block = provider.block(1, extent, cols, rows)
    
    # Convert raster block to numpy array
    data = np.array([[block.value(i, j) for j in range(cols)] for i in range(rows)])
    
    # Handle nodata values (non-NaN)
    nodata = provider.sourceNoDataValue(1)
    valid_data = data[~np.isnan(data) & (data != nodata)]
    
    if valid_data.size > 0:
        # Calculate statistics
        min_value = np.min(valid_data)
        max_value = np.max(valid_data)
        median = np.median(valid_data)
        percentile_75 = np.percentile(valid_data, 75)
        
        print(f"Min Value: {min_value:.6f}")
        print(f"Max Value: {max_value:.6f}")
        print(f"Median: {median:.6f}")
        print(f"75th Percentile: {percentile_75:.6f}")
    else:
        print("No valid data found.")

Colab test

import numpy as np
import rasterio

def analyze_raster(raster_path):
    # Open the raster file
    with rasterio.open(raster_path) as src:
        data = src.read(1)  # Read the first band

        # Retrieve the nodata value from the raster metadata
        nodata = src.nodata

        # Get valid data (non-NaN values and not nodata)
        valid_data = data[~np.isnan(data) & (data != nodata)]

        if valid_data.size > 0:
            # Calculate statistics
            min_value = np.min(valid_data)
            max_value = np.max(valid_data)
            median = np.median(valid_data)
            percentile_75 = np.percentile(valid_data, 75)

            print(f"Min Value: {min_value:.6f}")
            print(f"Max Value: {max_value:.6f}")
            print(f"Median: {median:.6f}")
            print(f"75th Percentile: {percentile_75:.6f}")
        else:
            print("No valid data found.")

# Path to your raster file
raster_path = f'{dir}/ntl/lca_VNL_npp_2023_global_vcmslcfg_v2_c202402081600.average.dat.tif'

# Run the analysis
analyze_raster(raster_path)

Both script returned the same value:

  • Min Value: 0.329669
  • Max Value: 28.618135
  • Median: 1.060158
  • 75th Percentile: 2.263107

What do you think? Does this help explain the differences we're seeing?

@mvmaltitz mvmaltitz mentioned this issue Sep 4, 2024
25 tasks
@hennie-k hennie-k assigned hennie-k and unassigned javaftw Sep 5, 2024
@mvmaltitz
Copy link

@timlinux

Safety trigger a warning on OGR when open the NTL data, although the data is available and loaded into map, dragged from Explorer window.

Screenshot 2024-09-12 211350

2024-09-12T21:05:41     WARNING    Cannot open D:\temp\geest\geest_qgis_templates\layers\iso3\fji_input\fji_place_saf_ntl_2023_8859_clipped.tif ().()

After Execute the button, the label next to the button appear with text:

Processing failed: Unable to execute algorithm
Could not load source layer for INPUT: temp\clippedClassified.tif not found

See below the python warning

2024-09-12T21:09:16     WARNING    warning:C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py:596: FutureWarning: Neither ogr.UseExceptions() nor ogr.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
              warnings.warn(
             
             traceback: File "C:\Users/benny/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3304, in SAFRasterizerDelegator
              self.SAFnightTimeLights_v2()
              File "C:\Users/benny/AppData/Roaming/QGIS/QGIS3\profiles\default/python/plugins\gender_indicator_tool\gender_indicator_tool.py", line 3740, in SAFnightTimeLights_v2
              processing.run(
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\tools\general.py", line 109, in run
              return Processing.runAlgorithm(algOrName, parameters, onFinish, feedback, context)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\core\Processing.py", line 176, in runAlgorithm
              ret, results = execute(alg, parameters, context, feedback, catch_exceptions=False)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\gui\AlgorithmExecutor.py", line 70, in execute
              results, ok = alg.run(parameters, context, feedback, {}, False)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalAlgorithm.py", line 132, in processAlgorithm
              commands = self.getConsoleCommands(parameters, context, feedback, executing=True)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\ClipRasterByMask.py", line 170, in getConsoleCommands
              maskLayer, maskLayerName = self.getOgrCompatibleSource(self.MASK, parameters, context, feedback, executing)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalAlgorithm.py", line 89, in getOgrCompatibleSource
              ogr_layer_name = GdalUtils.ogrLayerName(ogr_data_path)
              File "C:\PROGRA~1/QGIS33~1.4/apps/qgis-ltr/./python/plugins\processing\algs\gdal\GdalUtils.py", line 439, in ogrLayerName
              ds = ogr.Open(basePath)
              File "C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py", line 7481, in Open
              _WarnIfUserHasNotSpecifiedIfUsingExceptions()
              File "C:\PROGRA~1\QGIS33~1.4\apps\Python39\lib\site-packages\osgeo\ogr.py", line 596, in _WarnIfUserHasNotSpecifiedIfUsingExceptions
              warnings.warn(

And below the Processing log

2024-09-12T21:09:16     INFO    Results: {'OUTPUT': 'output_e54fbcd2_4078_43f6_b086_52d0e9371988'}
2024-09-12T21:09:16     INFO    gdalwarp -overwrite -of GTiff -cutline C:/TEMP/processing_JvEyfI/72efe44926e247ec969f00d98a4ce638/MASK.gpkg -cl MASK -crop_to_cutline -dstnodata -3.4028234663852886e+38 temp\tempClassified.tif temp\clippedClassified.tif
2024-09-12T21:09:16     INFO    GDAL command:
2024-09-12T21:09:16     INFO    gdalwarp -overwrite -of GTiff -cutline C:/TEMP/processing_JvEyfI/72efe44926e247ec969f00d98a4ce638/MASK.gpkg -cl MASK -crop_to_cutline -dstnodata -3.4028234663852886e+38 temp\tempClassified.tif temp\clippedClassified.tif
2024-09-12T21:09:16     INFO    GDAL command output:
2024-09-12T21:09:17     CRITICAL    Process returned error code 1
2024-09-12T21:09:17     INFO    Results: {'OUTPUT': 'temp\\clippedClassified.tif'}
2024-09-12T21:09:20     INFO    Results: {'OUTPUT': 'temp/countryUTMLayerBuf.shp'}
2024-09-12T21:09:22     CRITICAL    Unable to execute algorithm
             Could not load source layer for INPUT: temp\clippedClassified.tif not found
2024-09-12T21:09:22     CRITICAL    Warning 1: Ring Self-intersection at or near point 2617160.3599800956 -2438488.8009277876
2024-09-12T21:09:22     CRITICAL    ERROR 1: Cutline polygon is invalid.

And OGR log

2024-09-12T21:05:41     WARNING    Cannot open D:\temp\geest\geest_qgis_templates\layers\iso3\fji_input\fji_place_saf_ntl_2023_8859_clipped.tif ().()
2024-09-12T21:09:16     WARNING    Cannot open EPSG:8859 ().()

See tickets #134 and #135 for full testing, results and input data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔍️ Factor Analysis factors 🏙️ Place characterization Dimension - place characterization
Projects
None yet
Development

No branches or pull requests

10 participants