Simplify handler interface #128

lonvia · 2020-03-07T15:18:13Z

The current concepts of handler classes mimics closely the C++ interface of osmium. There is no need to do this. pyosmium should get a simpler interface that is more in the spirit of python. I'm thinking about something along the way:

import osmium

@osmium.node
def do_stuff(node):
    pass

@osmium.way(locations=true)
def do_other_stuff(obj):
    pass

@osmium.node
@osmium.way
@osmium.relation
def do_this_for_all_objects(obj):
    pass

osmium.apply_file("foo.pdf", do_stuff, do_other_stuff, do_this_for_all_objects)

wiktorn · 2020-03-09T19:01:03Z

Looks good to me, just a few comments:

I'm not sure if functions should be provided as multiple arguments or as one argument (tuple, list etc.). Making all (or almost all) other arguments as keyword only should secure easy future extensions

And I'm thinking, that we could extend the list of decorators to include:

create copies of objects for those, who are working with sets of data
input sorting (if locations are required)
define behaviour on invalid locations (skip, warn or error)

And probably, I'd made @osmium.locations as separate decorator, which can take index types as arguments.

lonvia · 2020-03-25T21:44:13Z

That's some good comments. Thoughts on that:

The thought about the multiple arguments was mainly that it saves two more characters. I'm not sure if there is any advantage to one or the other, they are easily convertible.
I have strong opinions about providing copies, see TypeError: can't pickle osmium.osm._osm.Node objects #125. So, that's a firm no.
Input sorting is more of a function of the input file, so maybe something like:
osmium.apply(osmium.sorted("foo.pbf"), my_handler)
Invalid locations and index types are a good points but they are properties of the location handler which is global to the apply function. So we'd have to add the parameters there:
`osmium.apply("foo.pbf", my_handler, location_index="sparse_array", location_ignore_errors=False)

danieldjewell · 2021-09-08T18:33:55Z

In the spirit of this, +1 to the existing behavior of using callbacks as being not-Pythonic.

One very visible issue is that, as it stands now, the library can't really be used from a REPL/IPython/Jupyter notebook... Which is a real bummer for a couple of reasons:

Using <your favorite REPL/Jupyter> provides a great prototyping/testing environment
Jupyter notebooks are heavily used in data science applications
The structure of using callbacks in the way that PyOsmium does is pretty foreign to a Python developer.... I'm sure it makes perfect sense coming from libosmium but that structure doesn't translate into something Pythonic.

Yes, technically you can write your handler class in a Jupyter notebook and then use it, but it's really atypical... Being able to use the library procedurally is important.

Ultimately, part of a goal of any Python library/binding for a library written in C/C++ is not just being able to access the functionality provided by the library but also to be able to utilize the convenience/ease of Python to do that. (The reason Python has taken off as a major language in data science and other fields is definitely not because Python itself is fast -- it isn't. But it's comparatively easy to write native code [especially with things like Cython], compile it, and run complex stuff from Python at near-native speed (aka nearly the speed of pure C/C++).... )

Given that OSM data can be utilized heavily in data science, analysis, GIS applications, etc, I think it's really important to take a step back and make sure that some clear use cases are defined. Opening an OSM PBF dump/extract in a Jupyter notebook and pulling data for analysis is one such use case.

lonvia · 2024-03-16T21:49:54Z

Went a bit of a different way. We have now:

support for arbitrary number of handlers in apply() (Generalize osmium.apply() to work with an arbitrary number of handlers #241)
any class can be used as a handler (Allow any Python object to be used as a handler object #242, Convert SimpleHandler to a pure Python class #243)
an iterative interface, so that a simple for loop can be used(Introduce iterative processing #245)

This should simplify the code enough even without decorators.

lonvia closed this as completed Mar 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify handler interface #128

Simplify handler interface #128

lonvia commented Mar 7, 2020

wiktorn commented Mar 9, 2020

lonvia commented Mar 25, 2020

danieldjewell commented Sep 8, 2021

lonvia commented Mar 16, 2024

Simplify handler interface #128

Simplify handler interface #128

Comments

lonvia commented Mar 7, 2020

wiktorn commented Mar 9, 2020

lonvia commented Mar 25, 2020

danieldjewell commented Sep 8, 2021

lonvia commented Mar 16, 2024