Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Schema Versioning #697

Closed
damaru-inc opened this issue Jan 21, 2022 · 20 comments
Closed

Introduce Schema Versioning #697

damaru-inc opened this issue Jan 21, 2022 · 20 comments
Labels
stale 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md)

Comments

@damaru-inc
Copy link
Contributor

It is common in Event Driven Architectures to support several versions of a schema. One can imagine an application that sends two different message types to two different channels of a broker, each represented by a different version of a schema, because different downstream applications consume different versions of the message.

Currently AsyncAPI does not explicitly support different schema versions in the same file (unless each is directly in the payload section of a different message). If we want different schema versions to be under components schemas, we need to introduce a hack such as appending the version to the schema name like so:

components:
  schemas:
    Person_1_0_0:
      title: Person
      properties: ...
    Person_1_1_0:
      title: Person
      properties: ...

This is not ideal, as any software consuming this would have to know the convention of how the schema name and version get combined. It would be better if the spec supported construct like this:

components:
  schemas:
    Person:
      versions:
        "1.0.0":
          title: Person
          properties: ...
        "1.1.0":
          title: Person
          properties: ...
@damaru-inc damaru-inc added the 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md) label Jan 21, 2022
@dalelane
Copy link
Collaborator

Another hack - one that I've used - is to use oneOf when describing a message, allowing me to describe all possible versions inside of that. I think rely on hints like you describe (names, descriptions, etc.) to explain to human readers what the intent is of the different schemas inside the oneOf.

That's not to disagree with the point you're making - that being able to describe versions (without depending on ad-hoc descriptions or naming conventions) would help.

@damaru-inc
Copy link
Contributor Author

Thanks Dale. Using oneOf in a message, if you wanted the schema versions to share the same name, they would have to be inline under messages right? If they were references to /componens/schemas/ they would have to have distinct names. That makes it harder to reuse one of these schema versions.

@dalelane
Copy link
Collaborator

yes, that's true

@damaru-inc
Copy link
Contributor Author

Come to think of it, if they were embedded in a message, they wouldn't have a name, they'd be anonymous, in terms of their path within the AsyncAPI file.

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Jan 24, 2022

As I remember @damaru-inc we talked about this once in a discussion on slack :)

The idea with versions is fine and is currently supported from within AsyncAPI, i.e. the specification is still correct and bug-free. For people who read this, the SchemaObject in AsyncAPi allows you to define custom fields but they have no meaning in the validation/parsing sense. However, there are two problems with this idea that have already been mentioned in this discussion:

  • referencing - the user must know that a given schema has several versions and must make absolute references to the version field i.e. $ref: #/components/schemas/${schemaName}/versions/${version} ($ref: #/components/schemas/${schemaName} will paste all versions).

  • tooling support. Any official supported tool in our org don't knows what "versions" is inside a scheme and e.g. Studio will not render such a scheme:

    components:
      schemas:
        Person:
          versions:
            "1.0.0":
              title: Person
              properties: ...
            "1.1.0":
              title: Person
              properties: ...

In the JSON Schema specification there are metadata that are used more for making references (like $ref, $id) or for describing data (like $comment) that have no meaning to the end user but only to the developer, so we can go with the keyword $versions which will actually specify the versions of a given schema but will not have that validation sense. This word of course we can support in our tooling like ParserJS. With the referencing problem, it seems to me that we won't be able to handle it and the user will have to know about these additional versions and proper ref path.

Additionally, having this keyword we can always create several versions and specify the default version like:

components:
  schemas:
    Person:
      # have in mind that we haven't clarify how to override part of spec - related issue https://github.com/asyncapi/spec/issues/649
      $ref: "#/components/schemas/Person/$versions/latest"
      $versions:
        "1.0.0":
          title: Person
          properties: ...
        "1.1.0":
          title: Person
          properties: ...
        "latest":
          $ref: "#/components/schemas/Person/$versions/1.1.0"

Currently if someone doesn't need to keep several versions of the same schema in one AsyncAPI file, can always split those versions into a separate file and already it is supported (also by tooling):

# asyncapi.yaml
asyncapi: 2.2.0
info: ...
servers: ...
channels:
  someChannel:
    message:
      payload:
        $ref: ./some-schema.yaml#1.0.0
        
# some-schema.yaml
"1.0.0": 
  ...
"2.0.0":
  ...

@damaru-inc
Copy link
Contributor Author

Thanks, that's helpful. I think we need to figure out a standard way to do this so that parser-js can return the list of schema versions properly. As you noted, Studio will not render the schemas if we add the versions layer, and that also means that other code generators won't be able to see them either.

@magicmatatjahu
Copy link
Member

I wrote about our problem in the official json-schema slack channel and I will give feedback: https://json-schema.slack.com/archives/C5CF75URH/p1643044315100800 Maybe core team can help us and suggest another solution :)

@magicmatatjahu
Copy link
Member

I'm pasting the conversation with Ben Hutton (core maintainer of JSON Schema):

Me: Hello everyone! I haven't looked at this channel for a long time and maybe someone has solved this problem in the past, also I used Github search to find similar issue but didn't find anything similar.
Well, my problem: how to define several versions of the same scheme, but not in separate files but in the same one, e.g. in definitions?
The basis of the problem: in AsyncAPI we have just opened an issue #697 where we have a problem how to define several versions of the same scheme (by the same "name" or as you prefer item in definitions). Of course we can go and define the versions of a given scheme in a separate file and make references to the corresponding version, but how to do it when we want to have everything in one file and not in several? We have two custom solutions:
The first one is to add a suffix to the "name" of the scheme e.g. SomeSchema_1_0_0 etc but this is not a good solution.
The second solution is to use custom meta keyword like $versions and there define those versions e.g. as:

SomeSchema:
 $versions:
   "1.0.0": ...
   "2.0.0": ... 

A neater solution but you still have to remember with references that you have to make a path like /SomeSchema/$versions/1.0.0. Has anyone had a similar problem and come up with something better?


Ben: The general solution is multiple files. Good validation tooling can cope with multiple files. Trying to avoid multiple files for tooling moving forward is going to be increasingly frustrating. A lot of the best practice documentation we are going to put out will suggest using multiple files.
The bundling process brings multiple schemas into one file, but tooling still has to get on board with references. Assume you’ve read the bundling blog post?


Me: Thanks for reply! Yes I read that but I don't know if that will help us with AsyncAPI. Currently we are still on draft 07, I don't know when we will move to draft 2020 and from what I remember e.g. $id still in draft 07 works as a new $anchor. Additionally in AsyncAPI we have the problem that the scheme is located in components/schemas (as well as in other places) and the bundling system would have to be "pinned" only to those sections and not to the whole specification. Not to mention of course the support of other schema formats like avro/raml etc... This idea with identifiers would be very good e.g. $id: https://example.com/schemas/Person/1.0.0 and then we can define schemas like:

Person1.0:
 $id: https://example.com/schemas/Person/1.0.0
Person2.0:
 $id: https://example.com/schemas/Person/2.0.0

so we will operate not on "name" of schema but on id in referencing.
Although I think splitting the schemas into several files would be the best solution.


Ben: You can totally do that with draft 07.
$id can take a full or relative URI, or plane name fragment.
The most popular tool which generates schemas uses plane name fragments everywhere which are also pointless.
Here’s an example of where I’ve used full URIs in $id and referenced them to compose schemas from multiple files
https://github.com/ga4gh-discovery/ga4gh-case-discovery/tree/master/json_schema/schemas_source

The bundling process was only defined in 2020-12, but it’s the same process I would happily take for draft-07, and good validators (like ajv) will support the resulting schema.


Me: I forgot about the fact that id's in draft 07 can have URI's 😅 Thanks for all the replies! 🙂 I wonder if the current json-schema-ref-parser version supports this or not. However, in the long run, this still doesn't fix the problem of supporting other custom schema formats (not just JSON Schema).


Ben: json-schema-ref-parser is notoriously broken.
You can use the bundler Jason Desrosiers created for draft-04 through 2020-12: https://github.com/hyperjump-io/json-schema-bundle


Mentioned blog post about bundling JSON Schema: https://json-schema.org/blog/posts/bundling-json-schema-compound-documents.

At the moment we don't have a clue how referencing in AsyncAPi should work (issue #649), but if we will add the possibility to use $id then we could use versions with those identifiers, so our spec with several versions of the same schema would look like this:

# I use the proposal for 3.0.0 https://github.com/asyncapi/spec/issues/618
asyncapi: 3.0.0
info: ...

operations:
  someOperation:
    message: 
      payload: 
        $ref: https://example.com/schemas/Person/1.0.0
        
components:
  schemas:
    Person_1_0_0:
      $id: https://example.com/schemas/Person/1.0.0
    Person_2_0_0:
      $id: https://example.com/schemas/Person/2.0.0

@fmvilas
Copy link
Member

fmvilas commented Feb 2, 2022

Just leaving a thought here for the future. IMHO, I think whatever solution we come up with, we should make sure it can easily integrate with external schema registries.

Also, this feature together with #628, opens the door for a pure schema registry file. And that's exciting! 🙌

@fmvilas
Copy link
Member

fmvilas commented Feb 2, 2022

Random thought. We can probably encode the version number in the schema name as @damaru-inc is proposing in his first example but in a more standardized way. Example:

components:
  schemas:
    Person: # No version; or version = undefined?
      title: Person
      properties: ...
    [email protected]:
      title: Person
      properties: ...
    [email protected]:
      title: Person
      properties: ...

This seems to be more readable but it has a big constraint: if we ever want to evolve into some kind of schema changelog and want to add more info about what this new version provides compared to the previous one, we would not be able. Not saying we should evolve into a schema changelog but just saying this should be taken into account.

@magicmatatjahu
Copy link
Member

magicmatatjahu commented Feb 2, 2022

Thanks for today meeting! That moment when we were talking about this issue and someone suggested (I don't remember who exactly, sorry) that it should be possible to define "info" for schemas came up with an idea to exactly have info field inside schema. Together with @fmvilas in previous week in the meantime we talked about my topic (Proposal to allow defining schema format other than default one (AsyncAPI Schema)) and he suggested (thanks!) to have Schema Object as object with definition and meta fields and we can add next one info, so we can end with:

components:
  schemas:
    Person:
      info:
        id: 'some:company:Person' # unique id for schema. It will be very helpful for schema registries
        version: '1.0.0'
        description: '...'
        comment: '' # developer comment with possible changes in opposite to previous one version etc
        ... # other fields like externalDocs, contact, license etc
      meta: # field related to the https://github.com/asyncapi/spec/issues/622 issue to define meta information related to the schema format type
        format: ... # schema format
        ... # other fields like `namespace` for avro, `rootObject` for XSD when we want to use external definition by $ref
      definition:
        ... # definition for that schema in JSON Schema or other format
        type: object
        properties: ...
        ... etc
    [email protected]:
      info:
        id: 'some:company:Person'
        version: '1.1.0'
        ...
    ...

I hope the idea itself is quite understandable. The only problem would still be in referencing a given schema. We would still have to operate on the absolute path to a given schema by operating on the path in components/schemas, but the question is: shouldn't we change the way (or rather the logic) of referencing for AsyncAPI itself to have more possibilities like in XPointer - I just think out loud, but maybe we should have possibility to reference by complex statements like $ref: ./components/schemas/^Person(info.id=='some:company:Person' && info.version=='1.0.0')? I think this is a topic for another discussion.

Of course the same idea with the info field could be applied for message, operation and channel (and if need in other fields) and also have the versions of the given structure, its id etc, like:

components:
  messages:
    SomeMessage:
      info:
        id: ...
        description: ...
        comment: ...
      definition:
        payload: ...
        headers: ...

WDYT? @fmvilas @jessemenning @damaru-inc @derberg @jonaslagoni

EDIT: We should consider using JSONPath in AsyncAPI, not as current JSON Pointer https://support.smartbear.com/alertsite/docs/monitors/api/endpoint/jsonpath.html

@fmvilas
Copy link
Member

fmvilas commented Feb 3, 2022

components:
  schemas:
    Person:
      info:
        id: 'some:company:Person' # unique id for schema. It will be very helpful for schema registries
        version: '1.0.0'
        description: '...'
        comment: '' # developer comment with possible changes in opposite to previous one version etc
        ... # other fields like externalDocs, contact, license etc
      meta: # field related to the https://github.com/asyncapi/spec/issues/622 issue to define meta information related to the schema format type
        format: ... # schema format
        ... # other fields like `namespace` for avro, `rootObject` for XSD when we want to use external definition by $ref
      definition:
        ... # definition for that schema in JSON Schema or other format
        type: object
        properties: ...
        ... etc
    [email protected]:
      info:
        id: 'some:company:Person'
        version: '1.1.0'
        ...
    ...

@magicmatatjahu How do you plan to have multiple versions of the same schema using this structure? It's not clear to me 🤔 If you're gonna rely on the @1.1.0 at the end of the schema name, then what if you get something like this?

components:
  schemas:
    Person:
      info:
        version: '1.1.0'
      definition:
        type: object
        properties: ...
    [email protected]:
      info:
        id: 'some:company:Person'
        version: '1.1.0'

Notice you get two objects with version 1.1.0. I know you can make it fail at parser level but IMHO we should encourage a structure that doesn't allow this kind of conflicts in first place. Also the @1.1.0 and version: 1.1.0 is redundant.

@magicmatatjahu
Copy link
Member

@fmvilas

How do you plan to have multiple versions of the same schema using this structure? It's not clear to me 🤔 If you're gonna rely on the @1.1.0 at the end of the schema name, then what if you get something like this?

Personally I don't like that solution with @1.0.0 and I prefer to go with version inside schema (I only used your previous idea). I remember that Jessy and Michael mentioning about wanting to have schema registry inside AsyncAPI (or rather AsyncAPI should have some mechanism/architecture that could be read and integrated with schema registry from Solace/Mulesoft etc). Currently this is not possible (or possible but to a small extent). If we can come to a consensus on what Jonas is working - multiple meaning of AsyncAPI Document file - then the registry idea itself will become easier to implement (not only for schemas), but there will still be a lack of this "metadata" of schema, so I propose that info object. We need to think about the impact of this version inside the schema "key", because it is a very custom solution and maybe it will "prevent" us some functionality like referencing, is @ allowed in JSON Pointer?

Notice you get two objects with version 1.1.0. I know you can make it fail at parser level but IMHO we should encourage a structure that doesn't allow this kind of conflicts in first place. Also the @1.1.0 and version: 1.1.0 is redundant.

I don't think the key itself in components/schemas should have any important value in the specification, it is just an identifier for referencing. If you have a name and version inside a scheme, e.g. with info.title and info.version, you can always use that data for e.g. rendering in HTML. Making $ref: #/components/schemas/[email protected] you get rid of the name and version information, so they still have to be inside the schema, I know it's redundant but it safer solution. Also someone can define multiple versions of given schema in separate files. It is wrong, but still possible, in which case it is the fault of the person and not the specification and that is how we should look at it.

There is also another option using the versions (mentioned by Michael in first comment) field and there defining separate versions of given schema, e.g. as:

components:
  schemas:
    Person:
      versions:
        1.0.0:
          info:
            ...
          definition:
            ...
         2.0.0:
           info:
            ...
          definition:
            ...
      definition: # default definition
        $ref: #/components/schemas/Person/versions/2.0.0

NOTE: I know about duplicated definition field inside schema root and versions objects, but it's only an idea

Before you ask, why do we need info object if we can define version, externalDocs, title inside schema itself? If we use JSON Schema (aka AsyncAPI Schema Object) then we can, but if we have custom format like avro/raml/xsd? Well, we should standardize it.

@derberg
Copy link
Member

derberg commented Feb 3, 2022

I'm not sure about @VERSION. We will add a feature that is super hard to extend and can be a breaking change in future. I think idea from @magicmatatjahu follows one of our principle to add features that are most bullet-proof for future as can be easily extended. The only issue with example from @magicmatatjahu is that @ is still there and that we could simplify it:

components:
  schemas:
    Person_v1: 
      metadata:
        version: '1.0.0' #all others like description, format can be added when we have use case (for format we actually have use case already
      schema:
        type: object
        properties: ...
        ... etc
    Person_v2:
      metadata:
        version: '2.0.0' #all others like description, format can be added when we have use case (for format we actually have use case already
      schema:
        type: object
        properties: ...
        ... etc
    ...

But yeah, tbh I don't understand how would that be consumed on a message level. I understand the need of versioning but don't understand how all suddenly thanks to versioning we solve:

One can imagine an application that sends two different message types to two different channels of a broker, each represented by a different version of a schema, because different downstream applications consume different versions of the message.

How would that work on code generation level for example? I know version but still do not know which one to use

@magicmatatjahu
Copy link
Member

But yeah, tbh I don't understand how would that be consumed on a message level. I understand the need of versioning but don't understand how all suddenly thanks to versioning we solve:

How would that work on code generation level for example? I know version but still do not know which one to use

Yeah, I agree, it's very problematic. I think the bigger problem is the inability to define the version for the schema itself and then use the corresponding reference in the message. The version itself (as well as the metadata.id field - we can consider it as schemaID, just like we have operationID) can then be rendered for the end user in HTML and pass information that the given schema is used (by id) but in a different version.

@jonaslagoni
Copy link
Member

jonaslagoni commented Feb 3, 2022

My 5 cents, I fear you might be mixing and matching solutions to problems that have not been raised 😅

  1. Are you trying to define a group of related schemas?
  2. Are you trying to explicitly define a version for a schema?
  3. Are you trying to explicitly define the changes between versions?
  4. ???

Based on your issue @damaru-inc I am assuming (please correct me if I misinterpreted your issue) you are trying to solve the problem of having multiple versions of a specific schema. I.e. grouping all related schema versions for your Person.

   Person:
      versions:
        "1.0.0":
          title: Person
          properties: ...
        "1.1.0":
          title: Person
          properties: ...

If this is the case, you don't really care about explicitly defining the version through something like @magicmatatjahu suggestion with info object and version: '1.0.0' property.

Some of the solutions proposed here all fall back to what @damaru-inc (assumably) does not want to do, define multiple schemas in an unrelated way. Regardless of whether you use @ or absolute name Person_1_0_0 (or any other for that matter 😆) to define the name of a schema, it does not group related schema versions together.

components:
  schemas:
    Person_1_0_0:
      title: Person
      properties: ...
    [email protected]:
      title: Person
      properties: ...

This might need multiple issues to keep the discussion on track 🤔 Otherwise I fear it will be a problem reaching a consensus 😅

@magicmatatjahu
Copy link
Member

Right, I'm starting to get lost in this problem too. We should create other issues where we should try to solve other problems discussed here and leave this issue only for schema versioning. So the issues I see to be separated:

  • schema definition compatible with schema registry
  • what was mentioned in our conversation yesterday and what we are missing, i.e. new schema version causes the message version to be bumped up, message version causes operation version etc.
  • a nicer way to referencing that "versioned" schemas

@github-actions
Copy link

github-actions bot commented Jun 4, 2022

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@github-actions github-actions bot added the stale label Jun 4, 2022
@derberg derberg removed the stale label Jun 7, 2022
@nicholasdgoodman
Copy link

A somewhat outsider perspective - and playing the devil's advocate here - is what exactly is the pressing need to declare multiple schemas of the same object type within a single Async API document?

AFAIK the sister spec of OpenAPI does not have nor need this capability, and I see nothing particularly different about an event-driven world that warrants this added complexity and capability.

In the REST + microservices world, it is fairly common that a single service or client will need to speak multiple versions of a particular object type to various upstream services -- but this does not mean the API spec supports all of them at once. To the contrary, the downstream application resolves this by having multiple client versions each derived from a single, versioned OpenAPI specification.

In an event-driven architecture world, it is possible to invoke two versions of the same action -- each with their respective data schemas and broker channels -- by simply consuming two API specs and generating the respective client-side code.

Doing so keeps an API and likewise an Async API specification as "atomically versioned" - all channels and data types within represent a fully-compatible snapshot in time: a versioned release. Doing anything else opens up some very complex questions about cross-dependencies and coupling, and muddies the water on "what version of the API" a client is attempting to use.

@github-actions
Copy link

github-actions bot commented Nov 4, 2022

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md)
Projects
None yet
Development

No branches or pull requests

7 participants