Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong/misleading documentation for default= argument of encoder, which leads to corrupt CBOR when followed #228

Open
2 tasks done
burnpanck opened this issue Apr 1, 2024 · 0 comments
Labels

Comments

@burnpanck
Copy link

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

cbor2 version

5.6.2

Python version

3.12.2

What happened?

The documentation on "Customizing encoding and decoding" currently states the follows in the leading paragraph (emphasis mine):

On the encoder side, this is accomplished by passing a callback as the default constructor argument. This callback will receive an object that the encoder could not serialize on its own. The callback should then return a value that the encoder can serialize on its own, ...

This description seems to roughly match the api provided by the default= argument of the json module.
Based on this description, I would have expected the following code to work:

from types import SimpleNamespace
import cbor2

def default(encoder, o):
    if isinstance(o, SimpleNamespace):
        return vars(ret)
    raise TypeError(type(o).__name__)

orig = SimpleNamespace(a=1)
encoded = cbor2.dumps(orig,default=default)
print(encoded)
decoded = cbor2.loads(encoded) # <--- throws CBORDecodeEOF
assert vars(orig) == decoded

However, in reality, the encoder expects the default callback to instead call encoder.encode(...) to serialise exactly one value. If you forget to do that, you silently end up with an invalid CBOR stream.

Thus, I believe this manifests two related but separate issues:

  • The documentation for the default= argument is misleading
  • The current implementation of the serialiser does not detect wrong use of it's API

The fix for the first one is obvious (in fact, further down the documentation, there is an example that shows the right use of the API - otherwise I wouldn't have figured this out at all).

I argue the second issue should be fixed as-well; it seems too easy to accidentally serialise either none or multiple values to the encoder, and the consequences are too dire and too difficult to debug. Because of the misuse potential for the current API, I further believe that the json-like API is in fact better, but it may be too late to change the API at this point for backwards compatibility reasons. In this case, I suggest that the encoder actively count the number of objects serialised within a call to default and raise an exception if that number is not one. Care must be taken when default= gets called recursively though.

How can we reproduce the bug?

See MWE above.

@burnpanck burnpanck added the bug label Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant