feat(python): guide cython to optimize code generation #1905

penguin-wwy · 2024-10-24T11:04:17Z

What does this PR do?

Optimize the C++ code generated by Cython by modifying the function implementations of the Buffer class.

Using get_int8 as an example, the following C++ code is generated before modification.

static PyObject *__pyx_pf_6pyfury_5_util_6Buffer_26get_int8(struct __pyx_obj_6pyfury_5_util_Buffer *__pyx_v_self, uint32_t __pyx_v_offset) {
  ...
  __pyx_t_1 = __pyx_f_6pyfury_5_util_6Buffer_get_int8(__pyx_v_self, __pyx_v_offset, 1);
  if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 119, __pyx_L1_error)
  __pyx_t_2 = __Pyx_PyInt_From_int8_t(__pyx_t_1);
  if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 119, __pyx_L1_error)
  ...

Since get_int32 returns a C++ type, the cython needs to generate calling PyLong_FromLong, and guard code such as calling PyErr_Occurred.
However, it is known that int32 can always be used to generate a PyLongObject with PyLong_FromLong. Therefore, when we manually call this, the cython only needs to check if the return value is null, as shown in the following code.

  __pyx_t_1 = __pyx_f_6pyfury_5_util_6Buffer_get_int8(__pyx_v_self, __pyx_v_offset, 1);
  if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 130, __pyx_L1_error)

Related issues

[Question] Could use Python bindings instead of Cython #1887

Does this PR introduce any user-facing change?

Does this PR introduce any public API change?
Does this PR introduce any binary protocol compatibility change?

Benchmark

Microbenchmark case:

def benchmark_get_bool(buffer):
    buffer.get_bool(0)
    buffer.get_bool(1)
    buffer.get_bool(100)
    buffer.get_bool(1000)

def benchmark_get_int32(buffer):
    buffer.get_int32(0)
    buffer.get_int32(1)
    buffer.get_int32(100)
    buffer.get_int32(1000)


def benchmark_get_float(buffer):
    buffer.get_float(0)
    buffer.get_float(1)
    buffer.get_float(100)
    buffer.get_float(1000)


def benchmark_read(buffer):
    buffer.reader_index = 0
    buffer.read_int8()
    buffer.read_int16()
    buffer.read_int24()
    buffer.read_int32()
    buffer.read_int64()
    buffer.read_float()
    buffer.read_double()

Result:

# before
python benchmark.py --affinity 0
.....................
benchmark_get_bool: Mean +- std dev: 220 ns +- 4 ns
.....................
benchmark_get_int32: Mean +- std dev: 276 ns +- 10 ns
.....................
benchmark_get_float: Mean +- std dev: 254 ns +- 2 ns
.....................
benchmark_read: Mean +- std dev: 409 ns +- 11 ns

# after
python benchmark.py --affinity 0                                                                                                                                                                                                                                                                                    
.....................
benchmark_get_bool: Mean +- std dev: 215 ns +- 4 ns
.....................
benchmark_get_int32: Mean +- std dev: 264 ns +- 9 ns
.....................
benchmark_get_float: Mean +- std dev: 243 ns +- 6 ns
.....................
benchmark_read: Mean +- std dev: 380 ns +- 4 ns

chaokunyang · 2024-10-26T14:34:39Z

Hi @penguin-wwy , thanks for your effort. I see we replaced all specific types to generic pyobject. This may introduce overhead if we invoke from cython side. Are there some annotation to reduce checks for us automatically so we can avoiding change return types?

penguin-wwy · 2024-10-28T11:06:47Z

Hi @penguin-wwy , thanks for your effort. I see we replaced all specific types to generic pyobject. This may introduce overhead if we invoke from cython side. Are there some annotation to reduce checks for us automatically so we can avoiding change return types?

Yes, I found that this modification indeed disrupts cython's annotation optimization. However, I think we should write an appropriate Python microbenchmark, similar to what is done in integration_tests. This way, it will be easier to identify differences during optimization. I will draft this PR and proceed with the modifications once the microbenchmark is completed.

feat(python): guide cython to optimize code generation

95fdb21

penguin-wwy requested a review from chaokunyang as a code owner October 24, 2024 11:04

Fix code style

8d08fb4

penguin-wwy marked this pull request as draft October 28, 2024 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): guide cython to optimize code generation #1905

feat(python): guide cython to optimize code generation #1905

penguin-wwy commented Oct 24, 2024 •

edited

Loading

chaokunyang commented Oct 26, 2024

penguin-wwy commented Oct 28, 2024

feat(python): guide cython to optimize code generation #1905

Are you sure you want to change the base?

feat(python): guide cython to optimize code generation #1905

Conversation

penguin-wwy commented Oct 24, 2024 • edited Loading

What does this PR do?

Related issues

Does this PR introduce any user-facing change?

Benchmark

chaokunyang commented Oct 26, 2024

penguin-wwy commented Oct 28, 2024

penguin-wwy commented Oct 24, 2024 •

edited

Loading