Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native (de-)serialization based on Rust and PyO3 #520

Closed
wants to merge 5 commits into from
Closed

Native (de-)serialization based on Rust and PyO3 #520

wants to merge 5 commits into from

Conversation

124C41p
Copy link
Contributor

@124C41p 124C41p commented Aug 26, 2023

Proof of concept
Only capable of deserializing (nested) Messages with primitive fields No handling of lists, maps, enums, .. implemented yet
See example.py for a working example

@cetanu
Copy link
Collaborator

cetanu commented Aug 30, 2023

Very cool, I've reviewed so far and looks good. Will wait for the remaining changes.

@124C41p
Copy link
Contributor Author

124C41p commented Sep 6, 2023

Current state of progress:

I think deserialization is feature complete now. However, the code base is a complete mess, so I will try to increase code quality over the next commits. I would recommend not to review before the refactoring is done.

There are two tests failing which I don't know how to fix (or whether they can be fixed at all). I think the problem is related to the fact that I am converting between Rust HashMaps and Python Dicts, and that those are iterated in different orders. So when performing a (de-)serialization round trip of a message containing a map field, the protobuf wire format ends up in a different order than before. (Apparently, the order is not even deterministic.) So it is hard to make assertions about the wire format.

@Gobot1234
Copy link
Collaborator

I think maintaining insertion order is a pretty big deal since its guaranteed by the default dict implementation, it might be worth looking into IndexMap

@Gobot1234
Copy link
Collaborator

Also thank you for all your work on this, I'll try and look into getting CI set up for this so it can compile

@124C41p
Copy link
Contributor Author

124C41p commented Sep 6, 2023

I think maintaining insertion order is a pretty big deal since its guaranteed by the default dict implementation, it might be worth looking into IndexMap

Good to know. Unfortunately, prost-reflect automatically deserializes map fields into HashMaps. If preserving order is important to us, I'll have to work around this by not declaring map fields as such, but as ordinary repeated fields instead. Then prost-reflect will deserialize into Vec<_> which can be converted into Python Dicts with order preserved.

@124C41p
Copy link
Contributor Author

124C41p commented Sep 13, 2023

Should be feature complete now.

major refactoring

We are now (de-)serializing manually by calling prost's low level API.

This is a little bit faster than before, and also a lot cleaner.
@124C41p 124C41p marked this pull request as ready for review September 19, 2023 19:15
@124C41p
Copy link
Contributor Author

124C41p commented Sep 22, 2023

I think the extension module is finished now. What remains is proper integration into the betterproto project. This includes:

  • configuring betterproto-extras as an optional dependency (opt in / opt out / default with Python fallback / whatever you prefer)
  • configuring tests/benchmarks to run with and without native (de-)serialization
  • setting up cross compilation and deployment to PyPI

For all these tasks I'll need support by project maintainers. Please let me know how to proceed.

@124C41p 124C41p changed the title WIP: Native (de-)serialization based on Rust and PyO3 Native (de-)serialization based on Rust and PyO3 Sep 22, 2023
@Gobot1234
Copy link
Collaborator

I'd prefer if it was opt in personally, configuring benchmarks is something I'd like to get working (maybe take a look at #308) and the best way to build things I've found is with https://github.com/pypa/cibuildwheel

@124C41p
Copy link
Contributor Author

124C41p commented Oct 9, 2023

I don't know cibuildwheel - it might be a good fit. But have you considered PyO3/maturin-action? It is especially designed to build maturin projects (and works very well for me). Also maturin autogenerates a CI config file when creating a new project (by executing maturin new). It just needs a few modifications to build for Python >= 3.7.

However, I do struggle with configuring Poetry: The problem is that betterproto now has a dependency which is part of the same repository but has not been published so far. I don't know how to set things up so that the CI scripts of betterproto can run. Do we need to move betterproto-extras into a separate repository?

@Gobot1234
Copy link
Collaborator

Superseded by #545

@Gobot1234 Gobot1234 closed this Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants