Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add otel codegen forward serialization #42

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-jopel
Copy link
Collaborator

@sfc-gh-jopel sfc-gh-jopel commented Nov 8, 2024

Description of PR

  • Adds code generator as a protoc plugin to generate pure python marshaling logic
    • The marshaler serializes forwards to a bytearray buffer, front to back by pre-calculating sizes
    • The code generator is triggered using ./scripts/proto_codegen.sh, and the logic of the protoc plugin is in ./scripts/plugin.py. The plugin parses the protoc CodeGeneratorRequest which contains descriptors for the custom message definitions. It extracts relevant information for serialization and then uses the Jinja2 template ./scripts/templates/template.py.jinja2 to generate the code
    • The generated code is under src/snowflake/telemetry/_internal/opentelemetry/proto/**
    • The code generator adds a dev and test dependency on black, isort, grpcio-tools, and Jinja2, but does not add any runtime dependencies
    • The generated code size is 3k lines and ~100kB, which is much smaller than protobuf package at 2MB
  • A major performance improvement of this PR was inlining all of the primitive type serialization and size calculation utils from src/snowflake/telemetry/_internal/serialize/**
    • Function calls in python are very expensive so this led to 25% performance improvement alone
    • The downside is that it increases complexity of code generator a little and also increases generated code size
  • bytearray with resizing was used instead of BytesIO w/ fixed size buffer because BytesIO writing is slower than bytearray even for buffers up to 4MB, this change was about a 10% performance improvement

Follow up PRs:

Testing

  • Adds a merge gate check-codegen to ensure that the generated code is the latest generated code from
  • Adds property-based testing of marshaler message types using hypothesis module. This module randomly generates inputs, and the test ensure the serialized bytestrings of this custom serialization module and protobuf are the exact same
  • Adds a simple unit test to execute the code generator and ensure it works as expected

@sfc-gh-jopel sfc-gh-jopel marked this pull request as ready for review November 8, 2024 20:21
@sfc-gh-jopel sfc-gh-jopel requested a review from a team as a code owner November 8, 2024 20:21
# If generating the code produces any changes from what is currently checked in, the workflow will fail and prompt the user to regenerate the code.
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Check Codegen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

self._size = self.calculate_size()
return self._size

def __bytes__(self) -> bytes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems exactly the same as SerializeToString. I am curious what this is for?

@sfc-gh-sili
Copy link
Collaborator

Thanks for the work! Looks great

Things verified:

Nit:

  • inline_size_function could have an example in the docstring
  • gen-requirements.txt could go into scripts

Copy link
Collaborator

@sfc-gh-tmonk sfc-gh-tmonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me. I like the demonstration of interface equivalence in test_proto_serialization.py.

One nit is to remove the from __future__ import annotations lines, because they should not be required in this library. It only supports Python 3.8 and above.

Thanks for all the great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants