Consider native serializer support for numpy.ndarray #231

gsmecher · 2024-04-15T21:59:22Z

Things to check first

I have searched the existing issues and didn't find my feature already requested there

Feature description

Currently, the easiest way to serialize numpy.ndarrays using cbor2 is something like (neglecting error checks)

import numpy as np
import cbor2

x = np.ones(10)
y = cbor2.dumps(x, default=lambda x, y: x.encode(y.tolist()))

This requires numpy to traverse the array and convert it to a Python list, which is then handed off to cbor2 for another traversal - there are several traversals and transient allocations involved.

Because both Numpy and CBOR have clean C APIs, would you consider a direct conversion implemented in the C extension module? It's worth noting that the orjson JSON library does this already.

Use case

Low-overhead serialization of numpy arrays.

The text was updated successfully, but these errors were encountered:

agronholm · 2024-04-15T22:38:56Z

I'm open to the idea if this can be done cleanly. What should ndarrays serialize to, in CBOR terms?

gsmecher · 2024-04-15T23:36:17Z

Oops, I see this is already discussed in #59.

It looks like there are some options:

Homogeneous typed arrays per RFC 8746
Classic CBOR arrays with individual type tags
Some numpy-specific type tag

The combination of (1) and (2) seems ideal, with (1) as a fastpath and (2) as a fallback. I'm optimistic Python ndarrays carry enough type metadata to decide between them without traversing the array.

(3) seems easy to rule out, and I'm only including it to say so out loud.

agronholm · 2024-04-16T07:25:46Z

Option 1 sounds like the best for encoding, but decoding may be an issue, particularly when numpy isn't present.

gsmecher · 2024-08-26T15:46:37Z

Note we've got an out-of-tree implementation here:

https://github.com/gsmecher/tuberd/blob/master/tuber/codecs.py

It's BSD 3-clause - if there's any ambiguity about whether you can borrow code I'm happy to chase down the contributor and ask about relicensing. I'd be happy to see this code end up in your project, and we'll eventually be able to remove it from ours.

agronholm · 2024-08-26T15:53:36Z

Nice to hear, but right now I don't have any bandwidth for this, as my attention is currently focused on AnyIO, APScheduler and Typeguard.

gsmecher · 2024-08-26T16:47:07Z

Bandwidth is a terribly scarce resource. :) Thanks for all of your work.

gsmecher added the enhancement label Apr 15, 2024

agronholm mentioned this issue Apr 16, 2024

Release Plan #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider native serializer support for numpy.ndarray #231

Consider native serializer support for numpy.ndarray #231

gsmecher commented Apr 15, 2024

agronholm commented Apr 15, 2024

gsmecher commented Apr 15, 2024

agronholm commented Apr 16, 2024

gsmecher commented Aug 26, 2024

agronholm commented Aug 26, 2024

gsmecher commented Aug 26, 2024

Consider native serializer support for numpy.ndarray #231

Consider native serializer support for numpy.ndarray #231

Comments

gsmecher commented Apr 15, 2024

Things to check first

Feature description

Use case

agronholm commented Apr 15, 2024

gsmecher commented Apr 15, 2024

agronholm commented Apr 16, 2024

gsmecher commented Aug 26, 2024

agronholm commented Aug 26, 2024

gsmecher commented Aug 26, 2024