Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a tool to compare two ATD files for compatibility #352

Open
mjambon opened this issue Sep 29, 2023 · 7 comments
Open

Provide a tool to compare two ATD files for compatibility #352

mjambon opened this issue Sep 29, 2023 · 7 comments
Labels
feature request Big and small feature requests new target Issue about deriving code for a new target language

Comments

@mjambon
Copy link
Collaborator

mjambon commented Sep 29, 2023

@jbergler wrote:

To detect breaking changes [in an ATD file] I currently do something like this and manually review the changes

~/co/semgrep-interfaces % git diff v1.17.0..main *.atd

But these diffs are huge, so I don't really get much confidence out of asking a human to review the changes.
My past experience tells me the only way we get high confidence outcomes is to automate the review for these kinds of nuanced details.
It's ok if we want to say we don't think the effort to get there is required. Before we do that I want to brainstorm how we might achieve that.
In protobuf land, there are cli tools that compare two proto files and print all the breaking changes. Could you have a guess at how much work it would be to build something similar for atd?

I think it would be a relatively low effort to produce a tool (atddiff?) that compares two ATD files and reports differences:

  • incompatibilities in one direction (left: writer, right: reader)
  • incompatibilities in the other direction (left: reader, right: writer)
  • <...> annotations that changed (but without trying to interpret them)
  • bonus: track renames and ignore name changes that don't affect data compatibility

For example, let's compare two files, Interface_v1.atd against Interface_v2.atd:

(* Interface, version 1 *)

type fruit = [ Kiwi | Strawberry ]

type veggie = [ Potato | Eggplant ]

type basket = {
  owner: string;
  fruits: fruit list;
  veggies: veggie list;
  ?notes: string option;
}
(* Interface, version 2 *)

type fruit = [ Kiwi | Strawberry | Plum ]

type vegetable = [ Eggplant | Potato ]

type basket = {
  fruits: fruit list;
  ~veggies: vegetable list;
  notes: string;
  price: int;
}

type drink = [ Water | Unicorn_blood ]
$ atddiff Interface_v1.atd Interface_v2.atd
Backward incompatibilities result in the following problems:
- cannot read old data using new implementation
- cannot read responses from old server in new client
- cannot read requests from old client in new server
Forward incompatibilities result in the following problems:
- cannot read new data using old implementation
- cannot read responses from new server in old client
- cannot read requests from new client in old server

Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Backward incompatibility: the 'notes' field is required by newer implementations but used to be optional.

Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Backward incompatibility: the new 'price' field is required by newer implementations.

Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Forward incompatibility: the 'owner' field was removed but is still required by older implementations.

Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Forward incompatibility: the new 'Plum' case is unsupported by older implementations.

Old file Interface_v1.atd, line 123,
New file Interface_v2.atd, line 321:
Forward incompatibility: the 'veggies' field is required by older implementations but is now optional.

Regarding types whose name changes, I think this is generally not a problem and we can ignore these changes e.g. veggievegetable. However, name changes for root types (types that are not referenced by other types) may have to be reported because it's possible that a root type gets both renamed and changes its structure. In such a case, we don't know for sure that it's the same type and we can't report incompatibilities confidently.

Update: changed language from left-to-right/right-to-left to backward/forward. There's a 50% chance I got this right.

@mjambon mjambon added feature request Big and small feature requests new target Issue about deriving code for a new target language labels Sep 29, 2023
@jbergler
Copy link
Contributor

jbergler commented Oct 2, 2023

Thanks @mjambon - this looks great. A couple of thoughts:

First, I'd love a --output json or similar so that we can get a machine parseable representation of these, with your example, maybe something like

$ atddiff --output=json Interface_v1.atd Interface_v2.atd
[
  {
    "location": { "old": "Interface_v1.atd:123", "new": "Interface_v2.atd:321" },
    "field": "my_type.notes",
    "change": { "old": "optional", "new": "required" },
    "kind": "incompatible backwards"
  },
  ...
]

Second, if we wanted to use this as a commit hook, we'd probably need to be able to specify the constraints to enforce.
Until atd has a "required on create, optional on parse" I think that means we need to be able to say that we want to allow the 'soft' transition (optional -> required) as a warning, while having the 'hard' transition (missing to required) be an error.

In the JSON version above that might mean a pair of fields for type and severity instead of just kind

@mjambon
Copy link
Collaborator Author

mjambon commented Oct 2, 2023

@jbergler JSON output makes sense. Let's do this in addition to text.

@zyannes
Copy link
Contributor

zyannes commented Oct 2, 2023

What about having the MVP only display a diff of the ASTs (no checking) at the type-level. Something like

$ atddiff --output=json Interface_v1.atd Interface_v2.atd
[
  {
    "type": "basket",
    "location": { "old": "Interface_v1.atd:123", "new": "Interface_v2.atd:321" },
    "fields": [
    {
        "old": {"name": "owner", "type": "string", "required": true},
        "new": {"removed": true}
    },
    {
        "old": {"name": "veggies", "type": "veggie list", "required": true},
        "new": {"name": "veggies", "type": "vegetable list", "required": false}
    },
    ...
]

@mjambon
Copy link
Collaborator Author

mjambon commented Oct 2, 2023

Tentative design for the command-line interface:

atddiff OLD_ATD_FILE NEW_ATD_FILE [OPTIONS]

Exit status:
- 0 if there's nothing to report
- 1 if anything is being reported
- 2 if anything goes wrong

Options:
--help
  ...
--output {text|json|quiet}
  Specify the output format. Default: 'text'
--report {all|backward|forward}
  Specify which kind of incompatibilities to report. Default: 'all'
  [explain]
--report-type-name-changes
  Report type name changes.
  [explain]
--no-error
  Exit with success status (0) instead of 1 when something is being reported.

GNU diff and git diff have a lot of nice formatting options that ideally we wouldn't have to reimplement. I'm not sure about the best approach here.

Update:

git difftool (man git-difftool) provides a convenient way to select and compare two versions of the same file(s). Here, we tell it to use the plain diff command without options. It gets invoked as diff OLD_FILE NEW_FILE.

$ git difftool -x diff HEAD^ cli/Makefile

Viewing (1/1): 'cli/Makefile'
Launch 'diff' [Y/n]? 
99c99
< 	PYTEST_USE_OSEMGREP=true  $(PYTEST) -n auto -m 'osempass and not todo' tests/e2e
---
> 	PYTEST_USE_OSEMGREP=true $(PYTEST) -n auto -m 'osempass and not todo' tests/e2e
103c103,105
< 	PYTEST_USE_OSEMGREP=true  $(PYTEST) -n auto -v --no-summary -m 'not no_semgrep_cli and not todo and not osempass' tests/e2e
---
> 	PYTEST_USE_OSEMGREP=true $(PYTEST) -n auto -v --no-summary \
> 	  -m 'not no_semgrep_cli and not todo and not osempass' tests/e2e \
> 	 | (grep PASSED || echo 'Found no new passing tests')

@mjambon
Copy link
Collaborator Author

mjambon commented Oct 2, 2023

@zyannes imo, an MVP is this:

$ diff -u <(atdcat old.atd) <(atdcat new.atd)

atdcat removes the comments are reformats the type definitions, resulting in good diffs (except if the definitions are reordered). For better results, we could add an option to atdcat to sort the type definitions alphabetically since their order doesn't matter.

But yeah, the issue is probably to explain the implications of the diffs e.g. explain what the following will break:

$ diff -u old.atd new.atd 
--- old.atd	2023-10-02 12:33:41.470105275 -0700
+++ new.atd	2023-10-02 12:33:54.258098342 -0700
@@ -1,3 +1,3 @@
 type a = {
-  things: int list;
+  ~things: int list;
 }

@mjambon
Copy link
Collaborator Author

mjambon commented Oct 2, 2023

@jbergler see #354 regarding "required on create, optional on parse"

@aryx
Copy link

aryx commented Oct 3, 2023

Looks great! Let's do it!

mjambon added a commit to mjambon/opam-repository that referenced this issue Oct 16, 2023
CHANGES:

* atdts: Stop compiler errors on generated typescript (ahrefs/atd#348)
* atdts: Don't fail on `wrap` constructs (ahrefs/atd#353)
* atdcat: New option `-remove-wraps` which pretty-prints the type
  definitions without `wrap` constructs (ahrefs/atd#353)
* atdd: Add `dlang` backend to generate D code from ATD definitions (ahrefs/atd#349)
* new tool: atddiff. Compares two versions of an ATD file and reports
  possible incompatibilities in the JSON data. Atddiff ships as part of the
  `atd` package together with `atdcat` (ahrefs/atd#352, ahrefs/atd#358)
nberth pushed a commit to nberth/opam-repository that referenced this issue Jun 18, 2024
CHANGES:

* atdts: Stop compiler errors on generated typescript (ahrefs/atd#348)
* atdts: Don't fail on `wrap` constructs (ahrefs/atd#353)
* atdcat: New option `-remove-wraps` which pretty-prints the type
  definitions without `wrap` constructs (ahrefs/atd#353)
* atdd: Add `dlang` backend to generate D code from ATD definitions (ahrefs/atd#349)
* new tool: atddiff. Compares two versions of an ATD file and reports
  possible incompatibilities in the JSON data. Atddiff ships as part of the
  `atd` package together with `atdcat` (ahrefs/atd#352, ahrefs/atd#358)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Big and small feature requests new target Issue about deriving code for a new target language
Projects
None yet
Development

No branches or pull requests

4 participants