Skip to content

Commit

Permalink
AVRO-3404: Add IDL syntax for schema definitions (#1589)
Browse files Browse the repository at this point in the history
Add schema syntax to the IDL syntax. This allows a full schema file
(.avsc) equivalent: the IDL can now define any named or unnamed schema
to be returned by the parser.
The IDL documentation also includes examples.
  • Loading branch information
opwvhk authored Sep 19, 2023
1 parent 1b78e69 commit afa8ea6
Show file tree
Hide file tree
Showing 20 changed files with 552 additions and 155 deletions.
164 changes: 120 additions & 44 deletions doc/content/en/docs/++version++/IDL Language/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ This document defines Avro IDL, a higher-level language for authoring Avro schem
The aim of the Avro IDL language is to enable developers to author schemata in a way that feels more similar to common programming languages like Java, C++, or Python. Additionally, the Avro IDL language may feel more familiar for those users who have previously used the interface description languages (IDLs) in other frameworks like Thrift, Protocol Buffers, or CORBA.

### Usage
Each Avro IDL file defines a single Avro Protocol, and thus generates as its output a JSON-format Avro Protocol file with extension .avpr.
Each Avro IDL file defines either a single Avro Protocol, or an Avro Schema with supporting named schemata in a namespace. When parsed, it thus yields either a Protocol or a Schema. These can be respectively written to JSON-format Avro Protocol files with extension .avpr or JSON-format Avro Schema files with extension .avsc.

To convert a _.avdl_ file into a _.avpr_ file, it may be processed by the `idl` tool. For example:
```shell
Expand All @@ -44,6 +44,16 @@ $ head /tmp/namespaces.avpr
"protocol" : "TestNamespace",
"namespace" : "avro.test.protocol",
```
To convert a _.avdl_ file into a _.avsc_ file, it may be processed by the `idl` tool too. For example:
```shell
$ java -jar avro-tools.jar idl src/test/idl/input/schema_syntax_schema.avdl /tmp/schema_syntax.avsc
$ head /tmp/schema_syntax.avsc
{
"type": "array",
"items": {
"type": "record",
"name": "StatusUpdate",
```
The `idl` tool can also process input to and from _stdin_ and _stdout_. See `idl --help` for full usage information.
A Maven plugin is also provided to compile .avdl files. To use it, add something like the following to your pom.xml:
Expand All @@ -56,7 +66,7 @@ A Maven plugin is also provided to compile .avdl files. To use it, add something
<executions>
<execution>
<goals>
<goal>idl-protocol</goal>
<goal>idl</goal>
</goals>
</execution>
</executions>
Expand All @@ -65,6 +75,48 @@ A Maven plugin is also provided to compile .avdl files. To use it, add something
</build>
```
## Defining a Schema in Avro IDL
An Avro IDL file consists of exactly one (main) schema definition. The minimal schema is defined by the following code:
```java
schema int;
```
This is equivalent to (and generates) the following JSON schema definition:
```json
{
"type": "int"
}
```
More complex schemata can also be defined, for example by adding named schemata like this:
```java
namespace default.namespace.for.named.schemata;
schema Message;

record Message {
string? title = null;
string message;
}
```
This is equivalent to (and generates) the following JSON schema definition:
```json
{
"type" : "record",
"name" : "Message",
"namespace" : "default.namespace.for.named.schemata",
"fields" : [ {
"name" : "title",
"type" : [ "null", "string" ],
"default": null
}, {
"name" : "message",
"type" : "string"
} ]
}
```
Schemata in Avro IDL can contain the following items:
* Imports of external protocol and schema files (only named schemata are imported).
* Definitions of named schemata, including records, errors, enums, and fixeds.
## Defining a Protocol in Avro IDL
An Avro IDL file consists of exactly one protocol definition. The minimal protocol is defined by the following code:
```java
Expand Down Expand Up @@ -109,7 +161,7 @@ Files may be imported in one of three formats:
`import schema "foo.avsc";`
Messages and types in the imported file are added to this file's protocol.
When importing into an IDL schema file, only (named) types are imported into this file. When importing into an IDL protocol, messages are imported into the protocol as well.
Imported file names are resolved relative to the current IDL file.
Expand All @@ -135,7 +187,7 @@ Fixed fields are defined using the following syntax:
```
fixed MD5(16);
```
This example defines a fixed-length type called MD5 which contains 16 bytes.
This example defines a fixed-length type called MD5, which contains 16 bytes.
## Defining Records and Errors
Records are defined in Avro IDL using a syntax similar to a struct definition in C:
Expand All @@ -161,19 +213,20 @@ A type reference in Avro IDL must be one of:
* A primitive type
* A logical type
* A named schema defined prior to this usage in the same Protocol
* A named schema (either defined or imported)
* A complex type (array, map, or union)
### Primitive Types
The primitive types supported by Avro IDL are the same as those supported by Avro's JSON format. This list includes _int_, _long_, _string_, _boolean_, _float_, _double_, _null_, and _bytes_.
### Logical Types
Some of the logical types supported by Avro's JSON format are also supported by Avro IDL. The currently supported types are:
Some of the logical types supported by Avro's JSON format are directly supported by Avro IDL. The currently supported types are:
* _decimal_ (logical type [decimal]({{< relref "specification#decimal" >}}))
* _date_ (logical type [date]({{< relref "specification#date" >}}))
* _time_ms_ (logical type [time-millis]({{< relref "specification#time-millisecond-precision" >}}))
* _timestamp_ms_ (logical type [timestamp-millis]({{< relref "specification#timestamp-millisecond-precision" >}}))
* _local_timestamp_ms_ (logical type [local-timestamp-millis]({{< relref "specification#local_timestamp_ms" >}}))
* _uuid_ (logical type [uuid]({{< relref "specification#uuid" >}}))
For example:
Expand Down Expand Up @@ -226,23 +279,25 @@ record RecordWithUnion {
union { decimal(12, 6), float } number;
}
```
Note that the same restrictions apply to Avro IDL unions as apply to unions defined in the JSON format; namely, a record may not contain multiple elements of the same type. Also, fields/parameters that use the union type and have a default parameter must specify a default value of the same type as the **first** union type.
Note that the same restrictions apply to Avro IDL unions as apply to unions defined in the JSON format; namely, a union may not contain multiple elements of the same type. Also, fields/parameters that use the union type and have a default parameter must specify a default value of the same type as the **first** union type.
Because it occurs so often, there is a special shorthand to denote a union of `null` with another type. In the following snippet, the first three fields have identical types:
Because it occurs so often, there is a special shorthand to denote a union of `null` with one other schema. The first three fields in the following snippet have identical schemata, as do the last two fields:
```java
record RecordWithUnion {
union { null, string } optionalString1 = null;
string? optionalString2 = null;
string? optionalString3; // No default value
string? optionalString4 = "something";
union { string, null } optionalString4 = "something";
string? optionalString5 = "something else";
}
```
Note that unlike explicit unions, the position of the `null` type is fluid; it will be the first or last type depending on the default value (if any). So in the example above, all fields are valid.
Note that unlike explicit unions, the position of the `null` type is fluid; it will be the first or last type depending on the default value (if any). So all fields are valid in the example above.
## Defining RPC Messages
The syntax to define an RPC message within a Avro IDL protocol is similar to the syntax for a method declaration within a C header file or a Java interface. To define an RPC message add which takes two arguments named _foo_ and _bar_, returning an _int_, simply include the following definition within the protocol:
The syntax to define an RPC message within a Avro IDL protocol is similar to the syntax for a method declaration within a C header file or a Java interface. To define an RPC message _add_ which takes two arguments named _foo_ and _bar_, returning an _int_, simply include the following definition within the protocol:
```java
int add(int foo, int bar = 0);
```
Expand All @@ -252,7 +307,7 @@ To define a message with no response, you may use the alias _void_, equivalent t
```java
void logMessage(string message);
```
If you have previously defined an error type within the same protocol, you may declare that a message can throw this error using the syntax:
If you have defined or imported an error type within the same protocol, you may declare that a message can throw this error using the syntax:
```java
void goKaboom() throws Kaboom;
```
Expand All @@ -263,20 +318,22 @@ void fireAndForget(string message) oneway;
## Other Language Features
### Comments
### Comments and documentation
All Java-style comments are supported within a Avro IDL file. Any text following _//_ on a line is ignored, as is any text between _/*_ and _*/_, possibly spanning multiple lines.
Comments that begin with _/**_ are used as the documentation string for the type or field definition that follows the comment.
### Escaping Identifiers
Occasionally, one will need to use a reserved language keyword as an identifier. In order to do so, backticks (`) may be used to escape the identifier. For example, to define a message with the literal name error, you may write:
Occasionally, one may want to distinguish between identifiers and languages keywords. In order to do so, backticks (`) may be used to escape
the identifier. For example, to define a message with the literal name error, you may write:
```java
void `error`();
```
This syntax is allowed anywhere an identifier is expected.
### Annotations for Ordering and Namespaces
Java-style annotations may be used to add additional properties to types and fields throughout Avro IDL.
Java-style annotations may be used to add additional properties to types and fields throughout Avro IDL. These can be custom properties, or
special properties as used in the JSON-format Avro Schema and Protocol files.
For example, to specify the sort order of a field within a record, one may use the `@order` annotation before the field name as follows:
```java
Expand Down Expand Up @@ -319,46 +376,64 @@ record MyRecord {
string @aliases(["oldField", "ancientField"]) myNewField;
}
```
Some annotations like those listed above are handled specially. All other annotations are added as properties to the protocol, message, schema or field.
Some annotations like those listed above are handled specially. All other annotations are added as properties to the protocol, message, schema or field. You can use any identifier or series of identifiers separated by dots and/or dashes as property name.
## Complete Example
The following is an example of an Avro IDL file that shows most of the above features:
The following is an example of two Avro IDL files that together show most of the above features:
### schema.avdl
```java
/*
* Header with license information.
*/
/**
* An example protocol in Avro IDL
* Header with license information.
*/
@namespace("org.apache.avro.test")
protocol Simple {
/** Documentation for the enum type Kind */
@aliases(["org.foo.KindOf"])
enum Kind {
FOO,
BAR, // the bar enum value
BAZ
} = FOO; // For schema evolution purposes, unmatched values do not throw an error, but are resolved to FOO.
// Optional default namespace (if absent, the default namespace is the null namespace).
namespace org.apache.avro.test;
// Optional main schema definition; if used, the IDL file is equivalent to a .avsc file.
schema TestRecord;
/** Documentation for the enum type Kind */
@aliases(["org.foo.KindOf"])
enum Kind {
FOO,
BAR, // the bar enum value
BAZ
} = FOO; // For schema evolution purposes, unmatched values do not throw an error, but are resolved to FOO.
/** MD5 hash; good enough to avoid most collisions, and smaller than (for example) SHA256. */
fixed MD5(16);
/** MD5 hash; good enough to avoid most collisions, and smaller than (for example) SHA256. */
fixed MD5(16);
record TestRecord {
/** Record name; has no intrinsic order */
string @order("ignore") name;
record TestRecord {
/** Record name; has no intrinsic order */
string @order("ignore") name;
Kind @order("descending") kind;
Kind @order("descending") kind;
MD5 hash;
MD5 hash;
/*
Note that 'null' is the first union type. Just like .avsc / .avpr files, the default value must be of the first union type.
*/
union { null, MD5 } /** Optional field */ @aliases(["hash"]) nullableHash = null;
// Shorthand syntax; the null in this union is placed based on the default value (or first is there's no default).
MD5? anotherNullableHash = null;
/*
Note that 'null' is the first union type. Just like .avsc / .avpr files, the default value must be of the first union type.
*/
union { null, MD5 } /** Optional field */ @aliases(["hash"]) nullableHash = null;
array<long> arrayOfLongs;
}
```
array<long> arrayOfLongs;
}
### protocol.avdl
```java
/*
* Header with license information.
*/
/**
* An example protocol in Avro IDL
*/
@namespace("org.apache.avro.test")
protocol Simple {
// Import the example file above
import idl "schema.avdl";
/** Errors are records that can be thrown from a method */
error TestError {
Expand All @@ -375,6 +450,7 @@ protocol Simple {
void ping() oneway;
}
```
Additional examples may be found in the Avro source tree under the `src/test/idl/input` directory.
## IDE support
Expand Down
6 changes: 3 additions & 3 deletions doc/content/en/docs/++version++/Specification/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -812,7 +812,7 @@ The following schema represents a date:
}
```

### Time (millisecond precision)
### Time (millisecond precision) {#time_ms}
The `time-millis` logical type represents a time of day, with no reference to a particular calendar, time zone or date, with a precision of one millisecond.

A `time-millis` logical type annotates an Avro `int`, where the int stores the number of milliseconds after midnight, 00:00:00.000.
Expand All @@ -822,7 +822,7 @@ The `time-micros` logical type represents a time of day, with no reference to a

A `time-micros` logical type annotates an Avro `long`, where the long stores the number of microseconds after midnight, 00:00:00.000000.

### Timestamp (millisecond precision)
### Timestamp (millisecond precision) {#timestamp_ms}
The `timestamp-millis` logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one millisecond. Please note that time zone information gets lost in this process. Upon reading a value back, we can only reconstruct the instant, but not the original representation. In practice, such timestamps are typically displayed to users in their local time zones, therefore they may be displayed differently depending on the execution environment.

A `timestamp-millis` logical type annotates an Avro `long`, where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC.
Expand All @@ -832,7 +832,7 @@ The `timestamp-micros` logical type represents an instant on the global timeline

A `timestamp-micros` logical type annotates an Avro `long`, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC.

### Local timestamp (millisecond precision)
### Local timestamp (millisecond precision) {#local_timestamp_ms}
The `local-timestamp-millis` logical type represents a timestamp in a local timezone, regardless of what specific time zone is considered local, with a precision of one millisecond.

A `local-timestamp-millis` logical type annotates an Avro `long`, where the long stores the number of milliseconds, from 1 January 1970 00:00:00.000.
Expand Down
21 changes: 19 additions & 2 deletions lang/java/idl/src/main/java/org/apache/avro/idl/IdlFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -32,25 +32,39 @@
* the protocol containing the schemas.
*/
public class IdlFile {
private final Schema mainSchema;
private final Protocol protocol;
private final String namespace;
private final Map<String, Schema> namedSchemas;
private final List<String> warnings;

IdlFile(Protocol protocol, List<String> warnings) {
this(protocol.getNamespace(), protocol.getTypes(), protocol, warnings);
this(protocol.getNamespace(), protocol.getTypes(), null, protocol, warnings);
}

private IdlFile(String namespace, Iterable<Schema> schemas, Protocol protocol, List<String> warnings) {
IdlFile(String namespace, Schema mainSchema, Iterable<Schema> schemas, List<String> warnings) {
this(namespace, schemas, mainSchema, null, warnings);
}

private IdlFile(String namespace, Iterable<Schema> schemas, Schema mainSchema, Protocol protocol,
List<String> warnings) {
this.namespace = namespace;
this.namedSchemas = new LinkedHashMap<>();
for (Schema namedSchema : schemas) {
this.namedSchemas.put(namedSchema.getFullName(), namedSchema);
}
this.mainSchema = mainSchema;
this.protocol = protocol;
this.warnings = Collections.unmodifiableList(new ArrayList<>(warnings));
}

/**
* The (main) schema defined by the IDL file.
*/
public Schema getMainSchema() {
return mainSchema;
}

/**
* The protocol defined by the IDL file.
*/
Expand Down Expand Up @@ -106,6 +120,9 @@ String outputString() {
if (protocol != null) {
return protocol.toString();
}
if (mainSchema != null) {
return mainSchema.toString();
}
if (namedSchemas.isEmpty()) {
return "[]";
} else {
Expand Down
Loading

0 comments on commit afa8ea6

Please sign in to comment.