Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify handler interface #128

Closed
lonvia opened this issue Mar 7, 2020 · 4 comments
Closed

Simplify handler interface #128

lonvia opened this issue Mar 7, 2020 · 4 comments

Comments

@lonvia
Copy link
Member

lonvia commented Mar 7, 2020

The current concepts of handler classes mimics closely the C++ interface of osmium. There is no need to do this. pyosmium should get a simpler interface that is more in the spirit of python. I'm thinking about something along the way:

import osmium

@osmium.node
def do_stuff(node):
    pass

@osmium.way(locations=true)
def do_other_stuff(obj):
    pass

@osmium.node
@osmium.way
@osmium.relation
def do_this_for_all_objects(obj):
    pass

osmium.apply_file("foo.pdf", do_stuff, do_other_stuff, do_this_for_all_objects)
@wiktorn
Copy link
Contributor

wiktorn commented Mar 9, 2020

Looks good to me, just a few comments:

  • I'm not sure if functions should be provided as multiple arguments or as one argument (tuple, list etc.). Making all (or almost all) other arguments as keyword only should secure easy future extensions

And I'm thinking, that we could extend the list of decorators to include:

  • create copies of objects for those, who are working with sets of data
  • input sorting (if locations are required)
  • define behaviour on invalid locations (skip, warn or error)

And probably, I'd made @osmium.locations as separate decorator, which can take index types as arguments.

@lonvia
Copy link
Member Author

lonvia commented Mar 25, 2020

That's some good comments. Thoughts on that:

  • The thought about the multiple arguments was mainly that it saves two more characters. I'm not sure if there is any advantage to one or the other, they are easily convertible.
  • I have strong opinions about providing copies, see TypeError: can't pickle osmium.osm._osm.Node objects #125. So, that's a firm no.
  • Input sorting is more of a function of the input file, so maybe something like:
    osmium.apply(osmium.sorted("foo.pbf"), my_handler)
  • Invalid locations and index types are a good points but they are properties of the location handler which is global to the apply function. So we'd have to add the parameters there:
    `osmium.apply("foo.pbf", my_handler, location_index="sparse_array", location_ignore_errors=False)

@danieldjewell
Copy link

In the spirit of this, +1 to the existing behavior of using callbacks as being not-Pythonic.

One very visible issue is that, as it stands now, the library can't really be used from a REPL/IPython/Jupyter notebook... Which is a real bummer for a couple of reasons:

  • Using <your favorite REPL/Jupyter> provides a great prototyping/testing environment
  • Jupyter notebooks are heavily used in data science applications
  • The structure of using callbacks in the way that PyOsmium does is pretty foreign to a Python developer.... I'm sure it makes perfect sense coming from libosmium but that structure doesn't translate into something Pythonic.

Yes, technically you can write your handler class in a Jupyter notebook and then use it, but it's really atypical... Being able to use the library procedurally is important.

Ultimately, part of a goal of any Python library/binding for a library written in C/C++ is not just being able to access the functionality provided by the library but also to be able to utilize the convenience/ease of Python to do that. (The reason Python has taken off as a major language in data science and other fields is definitely not because Python itself is fast -- it isn't. But it's comparatively easy to write native code [especially with things like Cython], compile it, and run complex stuff from Python at near-native speed (aka nearly the speed of pure C/C++).... )

Given that OSM data can be utilized heavily in data science, analysis, GIS applications, etc, I think it's really important to take a step back and make sure that some clear use cases are defined. Opening an OSM PBF dump/extract in a Jupyter notebook and pulling data for analysis is one such use case.

@lonvia
Copy link
Member Author

lonvia commented Mar 16, 2024

Went a bit of a different way. We have now:

This should simplify the code enough even without decorators.

@lonvia lonvia closed this as completed Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants