Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N^2 Performance on decoding large definite bytestrings #240

Open
2 tasks done
kpwelsh opened this issue Jun 14, 2024 · 5 comments
Open
2 tasks done

N^2 Performance on decoding large definite bytestrings #240

kpwelsh opened this issue Jun 14, 2024 · 5 comments
Labels

Comments

@kpwelsh
Copy link

kpwelsh commented Jun 14, 2024

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

cbor2 version

5.6.4

Python version

3.9, 3.10, 3.12.1, 3.12.3

What happened?

I work with a lab that receives fairly high frequency, large, compressed images in a CBOR format off of a piece of hardware.
Frequencies range from 10hz to 200hz, and the size of the compressed images range from 1MB to 24MB.

After failing to read messages quickly enough, we discovered that cbor2.loads was the culprit and >90% of the time was spent decode.c:613/614.

I understand that this N^2 memcopy was implemented in #204 to address some memory crash related issues with malicious inputs. However this implementation is unacceptable for us and we have switched to a different python cbor implementation.

I would prefer to continue using this one, as it is clearly more active/supported :).

How can we reproduce the bug?

Easy to reproduce, and easy to see from the code.
Here is a script that makes a plot, though, comparing it to cbor :).

import time
import cbor2

MB = 1024 * 1024
messages = [cbor2.dumps(b"0" * (i * MB)) for i in range(1, 20)]
sizes = [len(message) for message in messages]

cbor2_times = []
for message in messages:
    start = time.time()
    cbor2.loads(message)
    end = time.time()
    cbor2_times.append(end - start)


import cbor

cbor_times = []
for message in messages:
    start = time.time()
    cbor.loads(message)
    end = time.time()
    cbor_times.append(end - start)


import matplotlib.pyplot as plt

plt.plot(sizes, cbor2_times, label="cbor2")
plt.plot(sizes, cbor_times, label="cbor")
plt.xlabel("Message size (MB)")
plt.ylabel("Time (s)")
plt.legend()
plt.show()
@kpwelsh kpwelsh added the bug label Jun 14, 2024
@agronholm
Copy link
Owner

agronholm commented Jun 14, 2024

I can see why this would have N^2 performance. But if I were to preallocate the entire thing, it would allow easy attacks using malicious data that specifies a gigantic bytestring without actually providing that data. On the other hand, if I just store an array of buffers, I would eventually need to copy them into a larger one, briefly requiring 2x the amount of memory.

Suggestions?

@kpwelsh
Copy link
Author

kpwelsh commented Jun 14, 2024

Briefly requiring 2x the memory is not an issue for us.

I appreciate the guard against the attacks, and the desire not to regress there.

Could we:

  1. Allow a "trusted" version of loads that doesn't worry about it? In our case at least we are communicating on a secure LAN between our own managed processes, so malicious attacks is not a concern.
  2. Check to see if we might run out of memory and fall back to the safer load if it might be an issue?

Disclaimer: no idea how to do 2.

@kpwelsh
Copy link
Author

kpwelsh commented Jun 14, 2024

Note:

After realizing that the other cbor library doesn't support RFC 8949 and only supports 7049, we switched back to this one using the pure Python implementation, but made our own decoder subclass that doesn't use a chunked buffer copy.

This should be an acceptable solution for a while (perhaps forever), but it would be nice to use a C implementation here :). The difference does not completely break our workflow, though.

@Changaco
Copy link
Contributor

I suggest using an mmap instead of a bytearray. Something like this:

                 left = length
-                buffer = bytearray()
+                try:
+                    buffer = mmap(-1, left)
+                except (OSError, OverflowError):
+                    raise CBORDecodeValueError("invalid length for bytestring 0x%x" % left)
                 while left:
-                    chunk_size = min(left, 65536)
-                    buffer.extend(self.read(chunk_size))
-                    left -= chunk_size
+                    chunk = self.read(min(left, 65536))
+                    buffer.write(chunk)
+                    left -= len(chunk)
 
                 result = bytes(buffer)

and its equivalent in C.

@Changaco
Copy link
Contributor

In pure Python the “conversion” to a bytestring at the end is actually a copy, so both the current code and the one I suggested require 2× the memory. In C it should be possible to create a bytestring pointing directly to the mmapped data, thus avoiding the copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants