Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust geo-based queries #494

Open
acka47 opened this issue Oct 8, 2019 · 2 comments
Open

Adjust geo-based queries #494

acka47 opened this issue Oct 8, 2019 · 2 comments
Assignees
Labels

Comments

@acka47
Copy link
Contributor

acka47 commented Oct 8, 2019

With hbz/lobid-resources#1031, all bigger regions (NRW itself, Rheinland, Westfalen etc.) will also have geo coordinates. We have to at least adjust the geo-based queries on the home page.

@acka47 acka47 self-assigned this Oct 8, 2019
@acka47
Copy link
Contributor Author

acka47 commented Oct 8, 2019

On the home page there are two ways of diving into the data:

  1. by clicking a "Kreis" or "kreisfreie Stadt" at https://nwbib.de/
  2. by clicking a "Gemeinde" or "kreisfreie Stadt" at https://nwbib.de/?map=gemeinden

The question is: Which types of places should be covered / not covered by those queries?

For a start, I checked which Wikidata types occur and how often in the NWBib data (using lobid-resources-staging to include focus data from hbz/lobid-resources#1029), see https://gist.github.com/acka47/097b976da0c9eaab7679d9ad80f3e75e.

At this point, seeing 106 different types, I scrapped this whole approach (leaving it here for documentation, though) and just looked at the spatial classification to see which regions are to big and should be excluded from queries based on geo coordinates. I think it is a good rule of thumb to exclude all regions from the second level of the concept scheme which can be easily filtered out by this SPARQL query:

import rdflib


g=rdflib.Graph()
g.parse("nwbib-spatial.ttl", format='turtle')
count = 0
results = g.query("""
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
    SELECT ?secondLevelConcept
    WHERE {
    ?secondLevelConcept a skos:Concept ;
        skos:broader ?topLevelConcept .
    FILTER NOT EXISTS { ?topLevelConcept skos:broader ?anything }
    }
""")

for row in results:
    print("%s" % row)

The result:

https://nwbib.de/spatial#Q7920
https://nwbib.de/spatial#Q1787360
https://nwbib.de/spatial#N69
https://nwbib.de/spatial#N33
https://nwbib.de/spatial#N01
https://nwbib.de/spatial#Q1604680
https://nwbib.de/spatial#N72
https://nwbib.de/spatial#N74
https://nwbib.de/spatial#N57
https://nwbib.de/spatial#Q1787374
https://nwbib.de/spatial#Q251069
https://nwbib.de/spatial#N10
https://nwbib.de/spatial#N54
https://nwbib.de/spatial#Q7927
https://nwbib.de/spatial#N64
https://nwbib.de/spatial#N03
https://nwbib.de/spatial#N46
https://nwbib.de/spatial#N65
https://nwbib.de/spatial#N42
https://nwbib.de/spatial#Q313969
https://nwbib.de/spatial#N62
https://nwbib.de/spatial#Q457468
https://nwbib.de/spatial#N32
https://nwbib.de/spatial#N18
https://nwbib.de/spatial#N76
https://nwbib.de/spatial#N91
https://nwbib.de/spatial#Q1787376
https://nwbib.de/spatial#N28
https://nwbib.de/spatial#N16
https://nwbib.de/spatial#Q7924
https://nwbib.de/spatial#N34
https://nwbib.de/spatial#Q1803148
https://nwbib.de/spatial#N47
https://nwbib.de/spatial#Q7926
https://nwbib.de/spatial#N63
https://nwbib.de/spatial#N52
https://nwbib.de/spatial#Q1689034
https://nwbib.de/spatial#N43
https://nwbib.de/spatial#N45
https://nwbib.de/spatial#N48
https://nwbib.de/spatial#Q896929
https://nwbib.de/spatial#Q1787260
https://nwbib.de/spatial#Q1787322
https://nwbib.de/spatial#N68
https://nwbib.de/spatial#N13
https://nwbib.de/spatial#N36
https://nwbib.de/spatial#N44
https://nwbib.de/spatial#Q1110953
https://nwbib.de/spatial#N20
https://nwbib.de/spatial#N66
https://nwbib.de/spatial#N14
https://nwbib.de/spatial#N12
https://nwbib.de/spatial#N22
https://nwbib.de/spatial#Q1803239
https://nwbib.de/spatial#Q7923
https://nwbib.de/spatial#N05
https://nwbib.de/spatial#N24
https://nwbib.de/spatial#N77
https://nwbib.de/spatial#N70

To exclude this from search, we have to add those to the respective queries like so:

_exists_:spatial AND NOT spatial.id:("https://nwbib.de/spatial#Q7920" OR "https://nwbib.de/spatial#Q1787360")

@acka47 acka47 assigned fsteeg and unassigned acka47 Oct 8, 2019
@acka47 acka47 added the bug label Oct 8, 2019
@fsteeg
Copy link
Member

fsteeg commented Oct 9, 2019

The problem with the filtering approach (based on types or the actual coordinates) is that, with a normal query, it would exclude all hits with e.g. https://nwbib.de/spatial#N01, even if that hit had other additional spatial entries.

We could in theory set that up as a nested query in lobid-resources, but that would be quite complex and would restrict location queries in general. Or we'd have to add an option, further increasing complexity.

I think the most straightforward approach would be to exclude the geo field for the entities that don't actually describe a location, but an area. We could retain the other focus information, and thus keep the Wikidata links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants