Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic ADT generation from JSON Schema docs? #2137

Open
nfi-hashicorp opened this issue Jan 30, 2025 · 4 comments
Open

Automatic ADT generation from JSON Schema docs? #2137

nfi-hashicorp opened this issue Jan 30, 2025 · 4 comments

Comments

@nfi-hashicorp
Copy link

Hi there, new Rascal user, just kicking the tires. 👋

I'm wondering if it would be easy/possible/palatable to add some feature to the Rascal stdlib to generate ADT code given a JSON Schema. I don't know Rascal nor JSON Schema super well, but they seem to map pretty closely.

My use case is that I'm sketching a DSL for Github Workflows that's a little nicer to work with versus their YAML representation. Right now, I'm using lang::yaml::Model to source-to-source transform my DSL ADTs into YAML, and then dumpYAMLing into files. I was thinking I could also write an ADT that is closer to the YAML representation, and transform into that, which would then transform to lang::yaml::Model. There's already a JSON Schema for workflows and it would be nice if it I could just generate ADTs from it.

Just an idea!

@jurgenvinju
Copy link
Member

It's a very nice feature request. Been thinking about it for a while myself. There are many many applications that would benefit.

@nfi-hashicorp
Copy link
Author

It doesn't seem like a ton of work so I might take a crack at it. I could use some guidance.

I'm thinking of taking the code generation approach, but maybe there are others? I can imagine just dynamically loading the schema into in-memory ADTs perhaps, but I'm not sure if that sort of mechanism is exposed.

Is there some prior art I can look at for doing generation of Rascal code? Kind of a tricky thing to search for :)

@jurgenvinju
Copy link
Member

jurgenvinju commented Feb 3, 2025

Cool!

Steps could be:

  • Manually working through some small JSON schema's to get the hang of it. The schema's should align with the functionality of readJSON. This is the hardest part probably. JSON schema's have features that Rascal data types do not. Some simplification or simulation is necessary. Typically we are not worried if the contract becomes weaker (more instances allowed than intended), and we are worried if the Rascal types are stronger (excluding valid json instances). An example that comes to mind is "sub-typing" or "extend". JSON has it but Rascal does not. Using the most general supertype everywhere is a possible solution.
  • Manually developing a set of data types for JSON schema's, such that json::IO can be used to bind a json schema to Rascal constructors. This is also a good learning experience because here we learn about all the possible constructs of JSON schema's
  • Writing a conversion of JSON schemas to Rascal adt in three (plus one) steps:
    1. Schema to schema cleanup; removal of schema features by simulation or over-approximation using simpler features of JSON.
    2. Transformationtion to Rascal ADT grammar in Production and Symbol format (see the Type module). This could be a one-to-one mapping
    3. Cleanup of the Rascal ADT grammar. For example renaming or all kinds of small refactorings to make the grammar more idiomatic. The resulting in memory definitions can be used with readJSON immediately.
    4. Using lang::rascal:: grammars::format to print the grammar to rascal source code.

All this under the assumption that readJSON is perfect as it is. Which is probably not the case. But it's good to minimize changes there, since read/write JSON semantics is a contract for many Rascal applications; including the LSP and VScode.

Remember that JSON is a serialization format and not an expression language. For example integers have no binary bounds.

What do you think of the above?

@jurgenvinju
Copy link
Member

If you look in the Type module you will find the Production adt which has a constructor constructor that represents a single alternative rule for a data-type. The left hand side and right hand side of such a rule are made of Symbols; they represent types in Rascal such as \int() to represent `int

If you use the type reification operator on an existing definition then you can get examples of the target format you are converting to; try #int or #list[int] or #MyDataType. Working from these examples can save a lot of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants