Preserving w order in local Geary values #195
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addressing: #192
Explanation of issue: As noted by @Michel-Anton, the current use of
.groupby
in local_geary.py and local_geary_MV.py can lead to a loss of order in the weight IDs, specifically when the weight IDs use a custom order. The consequence of this loss of order is that the resulting G and p-values will be out of order, creating possibilities for mapping or interpreting values incorrectly.Explanation of fix: as suggested by Michel-Anton and @ljwolf in the issue ticket, the order of the temporary object where values are stored should be rearranged according to the values of the weight IDs. No dynamic code should be needed as weights operating on 0-n will be unchanged when reordered, and
To fix the issue I have added a couple small lines. This is not the most elegant solution (
adj_list_gs = adj_list_gs.iloc[w.id_order]
should have worked as a one-liner but for some reason this would only work outside of a function) but it appears to work.Old code:
New code:
Demonstration of new code handling weights ordered 0-n and shuffled:
Uses the
guerry
data, which are the 85 departments across France. First analysis is to use standard queen contiguity weights (ordered 0-n).Now we randomly reshuffle the weight IDs and ensure that the new G and p values are different:
This should impact our local G calculations...
If needed, next steps would be a quick validation by hand and/or comparison to the ongoing discussion at r-spatial/spdep#68