Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: optionally perform multiple PIP lookups per doc #300

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Sep 9, 2020

Hi @orangejulius, @Joxit, @blackmad

I had a thought last night that we can fairly easily improve recall for queries where the user enters the name of a nearby parent, instead of the parent assigned by PIP. ie. they get the neighbourhood wrong.

We already have the postal cities mapping, which works well when a postcode is present.
The postal cities mapping adds aliases to the parent field, so a record can have multiple 'locality' values, for instance.

We can extend on this further by performing multiple point-in-polygon lookups per document and recording each of the additionally matched parents as an alias.

I threw this PR together quickly, so it's not exactly what I would recommend merging, but I wanted to solicit feedback on the general idea, which is:

  • use the doc.getCentroid() for the primary parent info
  • if there is a 'meta' property specified with additional points, use results from those lookups for aliases of the parent.

The wof-admin-lookup module would not be responsible for determining which additional points to use, we can update the importers accordingly to use this functionality as required, varying the amount of points based on geometry type and layer.

  • I think we can begin with adding two additional points to the polyline importer, so we PIP the start and end points of a street additionally to the midpoint
  • We may also want to apply this logic to some of the "lower level" WOF records, such as neighbourhoods. We could provide four additional points at the corners of the bbox, or even extend this to the 8 compass directions.
  • Finally we may want to also apply this to points using a similar method, this would greatly improve the 'postal cities' and 'realestate cities' issues at the cost of significantly more PIP work.

Below is a pretty picture I drew to illustrate how this might work for point, linestring and polygon geometry types, in each case the poorly draw pin is the centroid we're currently using and the crosshairs highlighted in yellow represent additional points we might lookup for aliases

IMG_20200909_095004_2

@blackmad
Copy link

blackmad commented Sep 9, 2020 via email

@orangejulius
Copy link
Member

Interesting. This definitely makes the most sense IMO for roads (lines), since we have the current problem that we can only assign a single neighbourhood, borough, city, etc to the road where clearly it might equally belong to multiple. There would still be issues with display, but at least we can make search function better.

Somewhat related, and more complicated to implement, would be a PIP step at query time for interpolated results. This would prevent a street where the centroid is in City A from forcing all interpolated results to belong to that city, even if a large portion of the road (and therefore the addresses on the road) belong to nearby City B.

@blackmad
Copy link

blackmad commented Sep 9, 2020 via email

@orangejulius
Copy link
Member

Yes, that already exists just fine, using the array-like nature of all Elasticsearch fields (just the same as we use for other aliases).

Code can call

doc.addParent(...)

multiple times and only the first will be used for display.

@Joxit
Copy link
Member

Joxit commented Sep 9, 2020

Interesting, a few years ago (whosonfirst-data/whosonfirst-data#1094 (comment)), I had had some problems with addresses assigned to the neighboring locality. I think this might fix this kind of issue (if it still exists). In my case we fixed this with data update.

What should be the distance from the original point to take ? Delta in degree ? Meter ?

This sound promising !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants