Skip to content

Post workshop notes

Levi John Wolf edited this page Jul 13, 2018 · 3 revisions

These are notes taken by @ljwolf during and after the workshop on things that can be improved.

Some should be filed as issues, some might need more general assessment.

setup

Conda Forge issues

What, exactly, do we need from conda forge & how can we avoid it? Maybe in the future, we aim for only conda & pip until the frequently-encountered solving environment... hang goes away.

Rasterio from forge

Finding a consistent way to install rasterio across all platforms proved difficult, with many users having distinct errors across Windows & OSX

install ipykernel or jupyter, whichever ensures that the kernel is added to the user's environment

assume that users may have more than one kernel, so that our setup works by default on someone with no kernels (i.e. ours is the only installed kernel) and so that ours shows in the kernel selector. Especially when we're giving a workshop with other workshops.

berlin-districts.geojson is "invalid"

because it is not unprojected WGS84.

Rewrite analysis notebooks to use pysal

after the next major release, we need to use pysal proper in this teaching material.

change all instances of language about "neighborhoods" to "residential districts" or "districts" for short

We tend to use the "neighborhood of an observation" language, and this is confusing when an observation is itself a "neighborhood." This change was made in the regression notebook, but not everwhere, and the shape data is still berlin-neighborhoods.geojson. I thought I changed this, but I need to double check git history.

Remove aliasing in the imports

some polite ribbing in the lightning talk suggested we stop importing using aliases.

PySAL devs also might want to revisit the libpysal.api stuff, if this is a prevailing sentiment.

esda

use diverging color maps on notebook 06 maps

Right now, the maps use 'Rd', which is a single color ramp. But, in some cases (such as the box-and-whisker map), we need to use a divergent scheme with a zero point at the median. This might require changes to geopandas, like @slumnitz's centered choropleth stuff. Not sure.

visualize one of the synthetic maps next to the observed map

I think this could just be done with y[np.random.permutation(W.n)] or something, but we'll have to see. Also, maybe, a map with the same I/join count but different pattern would help.

regression

really make clear what the data borrowing is before you fit a model

eventually, replace this with the gds book spatial regression chapter, which is excellent on this front

make the residual plot for the aspatial model to show how error can cluster

again, made redundant by the forthcoming book chapter on the topic.

clustering

do silhouettes for agglomerative clustering

this will show that cluster "certainty" in spatially-constrained clustering often reveals some latent within-zone heterogeneity that the unconstrained clustering picks up. It helps to build intuition, I think.

show silhouette box plots for identified zones

for a clustering (maybe all or more than one) plot the distributions of silhouette scores to show how they work within groups, and their relationship to the map-average.

show a silplot for identifying best number of zones, in addition to clusters?

this is just pure sklearn, but we can't assume people have too much sklearn knowledge.