Automatic ADT generation from JSON Schema docs? #2137

nfi-hashicorp · 2025-01-30T18:22:14Z

Hi there, new Rascal user, just kicking the tires. 👋

I'm wondering if it would be easy/possible/palatable to add some feature to the Rascal stdlib to generate ADT code given a JSON Schema. I don't know Rascal nor JSON Schema super well, but they seem to map pretty closely.

My use case is that I'm sketching a DSL for Github Workflows that's a little nicer to work with versus their YAML representation. Right now, I'm using lang::yaml::Model to source-to-source transform my DSL ADTs into YAML, and then dumpYAMLing into files. I was thinking I could also write an ADT that is closer to the YAML representation, and transform into that, which would then transform to lang::yaml::Model. There's already a JSON Schema for workflows and it would be nice if it I could just generate ADTs from it.

Just an idea!

The text was updated successfully, but these errors were encountered:

jurgenvinju · 2025-01-31T10:22:23Z

It's a very nice feature request. Been thinking about it for a while myself. There are many many applications that would benefit.

nfi-hashicorp · 2025-01-31T18:28:05Z

It doesn't seem like a ton of work so I might take a crack at it. I could use some guidance.

I'm thinking of taking the code generation approach, but maybe there are others? I can imagine just dynamically loading the schema into in-memory ADTs perhaps, but I'm not sure if that sort of mechanism is exposed.

Is there some prior art I can look at for doing generation of Rascal code? Kind of a tricky thing to search for :)

jurgenvinju · 2025-02-03T09:00:02Z

Cool!

Steps could be:

Manually working through some small JSON schema's to get the hang of it. The schema's should align with the functionality of readJSON. This is the hardest part probably. JSON schema's have features that Rascal data types do not. Some simplification or simulation is necessary. Typically we are not worried if the contract becomes weaker (more instances allowed than intended), and we are worried if the Rascal types are stronger (excluding valid json instances). An example that comes to mind is "sub-typing" or "extend". JSON has it but Rascal does not. Using the most general supertype everywhere is a possible solution.
Manually developing a set of data types for JSON schema's, such that json::IO can be used to bind a json schema to Rascal constructors. This is also a good learning experience because here we learn about all the possible constructs of JSON schema's
Writing a conversion of JSON schemas to Rascal adt in three (plus one) steps:
1. Schema to schema cleanup; removal of schema features by simulation or over-approximation using simpler features of JSON.
2. Transformationtion to Rascal ADT grammar in Production and Symbol format (see the Type module). This could be a one-to-one mapping
3. Cleanup of the Rascal ADT grammar. For example renaming or all kinds of small refactorings to make the grammar more idiomatic. The resulting in memory definitions can be used with readJSON immediately.
4. Using lang::rascal:: grammars::format to print the grammar to rascal source code.

All this under the assumption that readJSON is perfect as it is. Which is probably not the case. But it's good to minimize changes there, since read/write JSON semantics is a contract for many Rascal applications; including the LSP and VScode.

Remember that JSON is a serialization format and not an expression language. For example integers have no binary bounds.

What do you think of the above?

jurgenvinju · 2025-02-03T09:12:04Z

If you look in the Type module you will find the Production adt which has a constructor constructor that represents a single alternative rule for a data-type. The left hand side and right hand side of such a rule are made of Symbols; they represent types in Rascal such as \int() to represent `int

If you use the type reification operator on an existing definition then you can get examples of the target format you are converting to; try #int or #list[int] or #MyDataType. Working from these examples can save a lot of time.

DavyLandman added enhancement library labels Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic ADT generation from JSON Schema docs? #2137

Automatic ADT generation from JSON Schema docs? #2137

nfi-hashicorp commented Jan 30, 2025

jurgenvinju commented Jan 31, 2025

nfi-hashicorp commented Jan 31, 2025

jurgenvinju commented Feb 3, 2025 •

edited

Loading

jurgenvinju commented Feb 3, 2025

Automatic ADT generation from JSON Schema docs? #2137

Automatic ADT generation from JSON Schema docs? #2137

Comments

nfi-hashicorp commented Jan 30, 2025

jurgenvinju commented Jan 31, 2025

nfi-hashicorp commented Jan 31, 2025

jurgenvinju commented Feb 3, 2025 • edited Loading

jurgenvinju commented Feb 3, 2025

jurgenvinju commented Feb 3, 2025 •

edited

Loading