-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 roadmap proposal #230
base: main
Are you sure you want to change the base?
V2 roadmap proposal #230
Conversation
Signed-off-by: Federico Busetti <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Federico Busetti <[email protected]>
@xSAVIKx pinging you in case you missed this. I realized it's a bit verbose, if it helps I can add a class/methods diagram for the actual version and the proposed changes to better show the differences. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@febus982 thanks a lot for your work! this indeed is a great proposal and a roadmap. For now, I've just added some comments to the places where I think we should discuss smth a little bit more.
I'd love to dedicate more time to this right away, but there are multiple other things everyone needs to get done till the end of the year, so just don't think I'll have any available slots soon, but we can keep this async.
Again, thanks a lot for your time and effort and let's try to make this happen 👍 🙏
* Validate required fields and generation of defaults | ||
* Marshalling/unmarshalling the cloudevents core and extensions fields | ||
* Marshalling/unmarshalling the `data` field | ||
* Encode/decode the marshalled fields to/from the specific binding format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd arguably say this is the responsibility of the specific binding format extension rather than the core library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a list of all the operations needed to transform a python class in a specific event format implementation
* `cloudevents.formats` - This would contain the structured formats implementations (JSON, AVRO, Protobuf) | ||
* `cloudevents.bindings` - This would contain the specific binding implementations (kafka, http) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of moving forward and scoping modules like they are scoped in the spec. Looks like a good go-to solution to me
The `data` field depends on the specific event and the `contentdatatype` attribute and | ||
its marshalling/unmarshalling logic should be the same for any binding/format. It seems | ||
a `CloudEvent` class responsibility to take care of this. (We should implement this logic | ||
in overridable methods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So from one perspective, I do agree with the statement, but from the other, it sometimes is beneficial to have the marshalling/unmarshalling logic moved elsewhere. IMO we can achieve that by using mixin classes. Again, this is more of an implementation detail
a `CloudEvent` class responsibility to take care of this. (We should implement this logic | ||
in overridable methods) | ||
|
||
Ideally the end user will be able to extend the `CloudEvent` class for their necessity. Ie: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend or create a separate one from scratch and just the behavior should be the same. I guess we may benefit from having a Protocol
with behavior description
|
||
Ideally the end user will be able to extend the `CloudEvent` class for their necessity. Ie: | ||
|
||
* adding new extensions fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the responsibility of the libraries CloudEvent implementation to handle all the possible extensions fields as far as I can tell. There may be special extension-specific mixins that provide additional behavior beneficial for this or that implementation, but overall we should be able to digest any extension fields with just the base implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be special extension-specific mixins that provide additional behavior
Yes, I used the word "extend" in a generic meaning. It could be also a mixin with some custom implementation.
The idea is that it should be possible (and easy) to create a class MyBaseEvent(MyMixin, CloudEvent)
at application level to extend the sdk functionalities.
We also want a DTO that can be accepted by the most common libraries (ie. JSON) to keep the | ||
`CloudEvent` class decoupled from the formats/bindings. Using a dictionary seems a good idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what we want is a canonical representation. That's what Event
and BaseEvent
are in the current implementation. I don't like the way it is right now.
But I am also not sure if we want to always keep a canonical representation as dict. Maybe it's better to have a dataclass instead or expose some immutable data structure over dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the end user to never deal with this DTO. They should always rely on either the CloudEvent
class or the specific format output, which could be a class (ie. KafkaEvent
)
This DTO structure was thought to be used only internally, to move data between the base class, the formats and the bindings.
I thought about dictionaries because they are fast, and libraries we might want to use for formats and bindings will probably support them out of the box (the classic example would be the JSON library that maps JSON objects to dicts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we'll then use Protocol definition as the base DTO. It already is immutable, well-defined in the spec as well as has a to_json
and to_text
implementations out of the box. The drawback here is that we're dragging Protobuf inside the core.
I'm kinda just thinking out loud here as I don't have a strong opinion on this at the moment. Also depending on whether the canonical representation is gonna be part of the CloudEvent
instance or can be just created on the fly we might need to use different approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we'll then use Protocol definition as the base DTO. It already is immutable, well-defined in the spec as well as has a to_json and to_text implementations out of the box. The drawback here is that we're dragging Protobuf inside the core.
Ideally I think it would be more maintainable to keep formats implementation separate from the base class, so that they can be implemented and tested independently.
The more formats we implement, the bigger the class will become and the created objects as a consequence. I am thinking about performance in systems where we want to handle a high number of events/sec and allocating memory for larger objects could become impactful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also keep the existing implementation interface where we have "get_attributes" and "get_data" and use them in bindings and formats.
If we find at some point that having a dict in the middle of the pipeline might help we can introduce it.
QUESTION: Can we define a data type for the `data` field after it's been marshalled? | ||
(ie. can we say it will be either `str` or binary data?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the type should be known in advance.
### The `cloudevents.formats` module | ||
|
||
Formats do solve a single responsibility: | ||
|
||
* Marshalling/unmarshalling the cloudevents core and extensions fields | ||
|
||
Their implementation has to be used when other bindings implement their | ||
structured input/output and make sure their implementation is compatible. | ||
|
||
Each format accepts the dictionary from the `CloudEvent` as param | ||
and produces an output based on the format itself. Ie. | ||
|
||
```python | ||
def to_json(event: dict) -> str: ... | ||
def from_json(event: str) -> dict: ... | ||
``` | ||
|
||
TODO: Review typing as JSON produces a string but other formats might be different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is also a little bit tricky. I'd rather avoid doing a CloudEvent
-> dict
-> json
conversion if we can do CloudEvent
-> json
without the dict part (also not sure if that's possible, most probably it'd be done somewhere under the hood anyway).
The next part is the API. As we've faced in the current implementation, it should be structured in a way that it's super easy to swap the underlying implementation (e.g. replace native json
with smth else) or at least propagate some configuration options. I'm also not sure if we want to have that capability as part of the API or we just should provide an easy way to extend the API instead, e.g. have our own different version of to_json
where we use different libraries and have them as optional dependencies (or maybe just as a separate library like cloudevents-ext
|
||
Bindings do solve these responsibilities: | ||
|
||
* Marshalling/unmarshalling the cloudevents core and extensions fields **(if binary format)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bindings sometimes support structured formats as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is for structured formats to use the formats implementation, to marshall the specific fields. (The assumption here is that formats are the same for all the binding implementations)
Ie.
If I need to marshall a CloudEvent in a JSON structured HTTP event, I expect the SDK to:
- use the JSON formatter (which would marshall the single fields as specified by the format)
- compose headers and data fields (but not marshalling the single fields)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that assumption is not true taking into account the spec. I'm not quite sure I fully understand the example, but in HTTP event attributes are converted to headers and data is the HTTP request payload AFAIK.
So not sure how the JSON formatter is relevant here. Can you maybe expand your example a little bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the HTTP structured mode the whole event (attributes and data) is marshalled into the body.
Headers MAY include the attributes, but attributes MUST be in the body.
For JSON conversion I'd expect something like this:
def to_http_json(cloudevent: CloudEvent):
dict_event = cloudevent.to_dict()
body = formats.to_json(dict_event)
headers = [] # compose headers according to spec, eventually grabbing data from dict_event
return headers, body
## Pydantic support | ||
|
||
The current Pydantic implementation is implemented as a possible substitute of | ||
HTTP Event class, by looking at its conversion module and tests, but it's used | ||
by calling the functions in conversion module. | ||
|
||
Pydantic offers good support and performance for data validation, defaults and | ||
JSON serialization (some of the requirements we have identified) | ||
|
||
We need to make a decision: | ||
|
||
* We use pydantic as a base for `CloudEvent` class: | ||
* PROs | ||
* No need to implement data validation | ||
* Performance (pydantic V2 is written in Rust) | ||
* We can lay down some base logic on `data` field serialization (ie. nested pydantic models) | ||
* Enable users to use pydantic for automatic documentation functionalities | ||
* CONs | ||
* We introduce a new package dependency, and depend on its implementation | ||
* _minor_ If we use pydantic JSON serialization it is implemented in the model class (slightly different from what proposed but we can use wrapper functions in the JSON format module) | ||
* We remove pydantic support: | ||
* PROs | ||
* We have more control over validation and defaults implementation | ||
* We don't depend on other packages | ||
* CONs | ||
* Performance on validation and defaults (we can still use libraries like `orjson` for JSON performance) | ||
* Additional effort in writing and maintaining data validation and required/default fields | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I love Pydantic, I don't think we should have it as a core dependency and I'd rather have a Pydantic-specific extension/implementation of CloudEvents.
I do agree that it kinda solves a lot of questions out of the box, but still is a dependency we can't drag in. Especially while there are multiple projects that are still using Pydantic v1 while the v2 is kinda the mainstream already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here. I just thought it might be easier to maintain a single approach for more consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely might be, but as I mentioned, some libraries currently depend on CloudEvents and they may not be able to move forward if CloudEvents brings Pydantic as a required dependency.
@xSAVIKx I replied to some comments but I'm happy to see we're roughly on the same page, with only details to be discussed. I'm happy to continue asynchronously. |
Hey @xSAVIKx , unfortunately because of unforeseen circumstances I don't think I can dedicate more time to this roadmap. Feel free to pick up from this idea, if you think it's still relevant, and iterate over it. |
Hey @febus982. Thx for the update. I hope everything is ok. I also hope to find some extra time myself at some point to move forward with the v2 |
Initial draft for a major refactor of the repository.
One line description for the changelog
V2 roadmap proposal