Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): guide cython to optimize code generation #1905

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

penguin-wwy
Copy link
Contributor

@penguin-wwy penguin-wwy commented Oct 24, 2024

What does this PR do?

Optimize the C++ code generated by Cython by modifying the function implementations of the Buffer class.

Using get_int8 as an example, the following C++ code is generated before modification.

static PyObject *__pyx_pf_6pyfury_5_util_6Buffer_26get_int8(struct __pyx_obj_6pyfury_5_util_Buffer *__pyx_v_self, uint32_t __pyx_v_offset) {
  ...
  __pyx_t_1 = __pyx_f_6pyfury_5_util_6Buffer_get_int8(__pyx_v_self, __pyx_v_offset, 1);
  if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 119, __pyx_L1_error)
  __pyx_t_2 = __Pyx_PyInt_From_int8_t(__pyx_t_1);
  if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 119, __pyx_L1_error)
  ...

Since get_int32 returns a C++ type, the cython needs to generate calling PyLong_FromLong, and guard code such as calling PyErr_Occurred.
However, it is known that int32 can always be used to generate a PyLongObject with PyLong_FromLong. Therefore, when we manually call this, the cython only needs to check if the return value is null, as shown in the following code.

  __pyx_t_1 = __pyx_f_6pyfury_5_util_6Buffer_get_int8(__pyx_v_self, __pyx_v_offset, 1);
  if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 130, __pyx_L1_error)

Related issues

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

Microbenchmark case:

def benchmark_get_bool(buffer):
    buffer.get_bool(0)
    buffer.get_bool(1)
    buffer.get_bool(100)
    buffer.get_bool(1000)

def benchmark_get_int32(buffer):
    buffer.get_int32(0)
    buffer.get_int32(1)
    buffer.get_int32(100)
    buffer.get_int32(1000)


def benchmark_get_float(buffer):
    buffer.get_float(0)
    buffer.get_float(1)
    buffer.get_float(100)
    buffer.get_float(1000)


def benchmark_read(buffer):
    buffer.reader_index = 0
    buffer.read_int8()
    buffer.read_int16()
    buffer.read_int24()
    buffer.read_int32()
    buffer.read_int64()
    buffer.read_float()
    buffer.read_double()

Result:

# before
python benchmark.py --affinity 0
.....................
benchmark_get_bool: Mean +- std dev: 220 ns +- 4 ns
.....................
benchmark_get_int32: Mean +- std dev: 276 ns +- 10 ns
.....................
benchmark_get_float: Mean +- std dev: 254 ns +- 2 ns
.....................
benchmark_read: Mean +- std dev: 409 ns +- 11 ns

# after
python benchmark.py --affinity 0                                                                                                                                                                                                                                                                                    
.....................
benchmark_get_bool: Mean +- std dev: 215 ns +- 4 ns
.....................
benchmark_get_int32: Mean +- std dev: 264 ns +- 9 ns
.....................
benchmark_get_float: Mean +- std dev: 243 ns +- 6 ns
.....................
benchmark_read: Mean +- std dev: 380 ns +- 4 ns

@chaokunyang
Copy link
Collaborator

Hi @penguin-wwy , thanks for your effort. I see we replaced all specific types to generic pyobject. This may introduce overhead if we invoke from cython side. Are there some annotation to reduce checks for us automatically so we can avoiding change return types?

@penguin-wwy
Copy link
Contributor Author

Hi @penguin-wwy , thanks for your effort. I see we replaced all specific types to generic pyobject. This may introduce overhead if we invoke from cython side. Are there some annotation to reduce checks for us automatically so we can avoiding change return types?

Yes, I found that this modification indeed disrupts cython's annotation optimization. However, I think we should write an appropriate Python microbenchmark, similar to what is done in integration_tests. This way, it will be easier to identify differences during optimization. I will draft this PR and proceed with the modifications once the microbenchmark is completed.

@penguin-wwy penguin-wwy marked this pull request as draft October 28, 2024 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants