Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local Geary loses order #192

Closed
Michel-Anton opened this issue Sep 29, 2021 · 3 comments
Closed

Local Geary loses order #192

Michel-Anton opened this issue Sep 29, 2021 · 3 comments
Labels

Comments

@Michel-Anton
Copy link

When running:

local_geary = Geary_Local(spatial_weights)
local_geary_contiguity_ratio = local_geary.fit(x)

The order of values in local_geary_contiguity_ratio does not correspond to the order of values in the input array x.

The loss of order occurs at line 167 in local_geary.py:
adj_list_gs = adj_list_gs.groupby(by="ID").sum()

The groupby function returns the values in the lexicographic order of the weight id's. While the spatial weights class W stores data in lexicographic order by default, a user may impose a different ordering by setting the id_order parameter. In this case, the order in the localG attribute is different from the input, which is quite misleading and frankly speaking a bug.

It would be quite useful for the fit function to return the values in same order as the spatial weights. If you chose not to do it, please indicate in the documentation that the order of values in the localG attribute may change and is the lexicographic order of the spatial weights and not the order of the input.

@ljwolf
Copy link
Member

ljwolf commented Sep 29, 2021

Hi! Thank you for the report.

Yes, we've been aware of issues with w/dataframe/weights alignment for a while now (pysal/libpysal#184) ☹️

This is indeed a bug, and not intended behavior. The output should always be aligned with X, and X should ideally be aligned with the spatial weights. The "right" way to work around this is to try to compute the spatial weights as close to the computation as possible, and ensure that the data is sorted by id before construction of the weights.

I think as a fix, this should be addressed by re-ordering adj_list_gs by w.id_order after the group by. For a full solution, we need to stop sorting ids by default in libpysal (pysal/libpysal#223 needs to get un-stuck)

@ljwolf ljwolf added the bug label Sep 29, 2021
@Michel-Anton
Copy link
Author

Thank you for the quick answer. I took a look at pysal/libpysal#184. This is a significant change in the structure of libpysal and downstream repercussions are hard to foresee.

The reordering of adj_list_gs is the natural fix. I believe the same fix should be applied to geary_local_mv.py because the code lines 113-116 induce a similar loss of order:

temp = pd.DataFrame(gs).T
temp['ID'] = adj_list.focal.values
adj_list_gs = temp.groupby(by='ID').sum()
localG = np.array(adj_list_gs.sum(axis=1) / k)

@ljwolf
Copy link
Member

ljwolf commented Dec 10, 2021

OK, this should be addressed in #195. For example:

import libpysal, geopandas, numpy, esda

guerry = geopandas.read_file("./guerry.shp")

guerry_shuffled = guerry.sample(frac=1, replace=False)
guerry_abet = guerry.sort_values('Dprtmnt')

w = libpysal.weights.Queen.from_dataframe(guerry)
w_shuffled = libpysal.weights.Queen.from_dataframe(guerry_shuffled)
w_abet = libpysal.weights.Queen.from_dataframe(guerry, ids='Dprtmnt')

lG = esda.Geary_Local(connectivity=w).fit(guerry['Donatns'])
lG_shuffled = esda.Geary_Local(connectivity=w_shuffled).fit(guerry_shuffled['Donatns'])
lG_abet =  esda.Geary_Local(connectivity=w_abet).fit(guerry_abet['Donatns'])

guerry['localG'] = lG.localG
guerry_shuffled['localG'] = lG_shuffled.localG
guerry_abet['localG'] = lG_abet.localG

numpy.testing.assert_allclose(guerry.sort_values('Dprtmnt').localG, guerry_abet.localG)
numpy.testing.assert_allclose(guerry_shuffled.sort_values('Dprtmnt').localG, guerry_abet.localG)

This indicates that the "right" localG score now corresponds to the "right" department, even when input weights ids are out of order (integers or set explicitly as strings). Still note, though, the silent lexical sorting induced for w_abet, which we need to change in libpysal#184, and needs @sjsrey approval plus (I think) a major version increment!

@ljwolf ljwolf closed this as completed Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants