Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving w order in local Geary values #195

Merged
merged 1 commit into from
Dec 10, 2021
Merged

Conversation

jeffcsauer
Copy link
Collaborator

Addressing: #192

Explanation of issue: As noted by @Michel-Anton, the current use of .groupby in local_geary.py and local_geary_MV.py can lead to a loss of order in the weight IDs, specifically when the weight IDs use a custom order. The consequence of this loss of order is that the resulting G and p-values will be out of order, creating possibilities for mapping or interpreting values incorrectly.

Explanation of fix: as suggested by Michel-Anton and @ljwolf in the issue ticket, the order of the temporary object where values are stored should be rearranged according to the values of the weight IDs. No dynamic code should be needed as weights operating on 0-n will be unchanged when reordered, and

To fix the issue I have added a couple small lines. This is not the most elegant solution (adj_list_gs = adj_list_gs.iloc[w.id_order] should have worked as a one-liner but for some reason this would only work outside of a function) but it appears to work.

Old code:

adj_list_gs.columns = ["gs", "ID"]
adj_list_gs = adj_list_gs.groupby(by="ID").sum()
localG = adj_list_gs.gs.values
return localG

New code:

adj_list_gs.columns = ["gs", "ID"]
adj_list_gs = adj_list_gs.groupby(by="ID").sum()
# Rearrange data based on w id order
adj_list_gs['w_order'] = w.id_order
adj_list_gs.sort_values(by='w_order', inplace=True)
localG = adj_list_gs.gs.values
return localG

Demonstration of new code handling weights ordered 0-n and shuffled:

Uses the guerry data, which are the 85 departments across France. First analysis is to use standard queen contiguity weights (ordered 0-n).

import libpysal as lp
import geopandas as gpd
# load the data - you will need to change the filepath
guerry_ds = gpd.read_file(("C:/Users/jeffe/Dropbox/LOSH_Letter_2020/Data/Guerry/Guerry.shp"))
# generate weights
w = lp.weights.Queen.from_dataframe(guerry_ds)
print("w weight ordering:", w.id_order[0:10])
# isolate y
y = guerry_ds['Donatns']
# run statistic
lG = Geary_Local(connectivity=w).fit(y)
print("local G values:", lG.localG[0:5])
print("local p values:", lG.p_sim[0:5])

w weight ordering: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
local G values: [0.18208704 0.56001403 0.97529461 0.21590694 0.61737256]
local p values: [0.193 0.053 0.066 0.169 0.448]

Now we randomly reshuffle the weight IDs and ensure that the new G and p values are different:

import random
neworder = list(np.array(range(0,85)))
random.Random(12345).shuffle(neworder)
w_remap = lp.weights.remap_ids(w, neworder)
print("new weight order:", w_remap.id_order[0:10])

new weight order: [39, 19, 30, 59, 43, 50, 28, 9, 56, 70]

This should impact our local G calculations...

lG = Geary_Local(connectivity=w_remap).fit(y)
print("local G values:",lG.localG[0:5])
print("local p values:",lG.p_sim[0:5])

local G values: [0.99314034 1.28169257 0.96518863 1.19722984 0.3317755 ]
local p values: [0.374 0.228 0.055 0.493 0.175]

If needed, next steps would be a quick validation by hand and/or comparison to the ongoing discussion at r-spatial/spdep#68

@codecov
Copy link

codecov bot commented Nov 24, 2021

Codecov Report

Merging #195 (5a7e4ff) into master (9dac8bd) will increase coverage by 0.20%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #195      +/-   ##
==========================================
+ Coverage   77.32%   77.52%   +0.20%     
==========================================
  Files          45       45              
  Lines        4927     4931       +4     
==========================================
+ Hits         3810     3823      +13     
+ Misses       1117     1108       -9     
Impacted Files Coverage Δ
esda/geary_local.py 88.13% <100.00%> (+0.41%) ⬆️
esda/geary_local_mv.py 100.00% <100.00%> (ø)
esda/_version.py 40.65% <0.00%> (+2.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9dac8bd...5a7e4ff. Read the comment docs.

@ljwolf ljwolf merged commit cd15d8c into pysal:master Dec 10, 2021
@ljwolf ljwolf mentioned this pull request Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants