Initial support for Avro formats #386

mwylde · 2023-10-30T04:30:39Z

This PR adds initial support for the Avro format, initially only for deserialization in sources:

CREATE TABLE pizza_orders_avro (
    value TEXT
) WITH (
    connector = 'kafka',
    bootstrap_servers = 'localhost:9092',
    topic = 'pizza_orders_avro',
    type = 'source',
    format = 'avro',
    'schema_registry.endpoint' = 'http://localhost:8081',
    'avro.confluent_schema_registry' = 'true'
);

Initially supported:

Reading Avro data with either an embedded schema, a pre-registered schema, or with schemas fetched from Confluent Schema Repository
Struct generation from a subset of Avro features (records, primitives, and unions of [T, null]
"Unstructured" avro (via avro-to-json conversion)
Schema evolution (via confluent schema registry or pre-registered schema)

Followup work for more complete avro support will include:

Serialization (sink) support
Complete union/array support

The current implementation is pretty inefficient. In order to support avro records with features that we don't currently support statically, currently avro is parsed, then converted to JSON, then deserialized into the struct via our normal JSON deserialization pathway. This allows us to utilize the existing RawJson approach for unsupported features.

This PR also reworks how we interact with Confluent Schema Registry. The schema registry is now configured as part of the Kafka connection, so as to not require redefining it each time you create a table. We also now to the schema resolution as part of connection table creation, rather than having the frontend request the schema and then fill it in.

jacksonrnewhouse

Just a couple of comments

jacksonrnewhouse · 2023-10-31T16:51:15Z

arroyo-rpc/src/schema_resolver.rs

+}
+
+/// A schema resolver that return errors when schemas are requests; this is intended
+/// to be used when schemas as embedded into the message and we do not expect to


There's a wording issue here.

jacksonrnewhouse · 2023-10-31T16:51:53Z

arroyo-rpc/src/schema_resolver.rs

+
+#[async_trait]
+pub trait SchemaResolver: Send {
+    async fn resolve_schema(&self, id: u32) -> Result<Option<String>, String>;


Why not use anyhow errors?

Because the errors are meant to be used as the message in a UserError, which takes a string. You could rely on anyhow::Error::to_string, but that includes various formatting logic that I don't want; the SchemaResolver should be responsible for producing a good String error message that can be shown to the user.

mwylde added 11 commits October 25, 2023 16:48

work on avro support

d8dccdf

work

d4a40bd

progress

a3a91dd

re-implemented json confluent schema support

bf7bcaf

handling avro schemas

c2666ce

Code generation for avro

63bad7e

work

d2d2698

checkpoint

4ad5783

avro to json conversion

491ada0

cleanups

d5c1771

formatted

3859c91

mwylde requested review from jacksonrnewhouse and jbeisen October 30, 2023 04:30

jacksonrnewhouse approved these changes Oct 31, 2023

View reviewed changes

fix comment

b16dacf

mwylde enabled auto-merge (squash) October 31, 2023 17:58

mwylde merged commit 5fcec8c into master Oct 31, 2023
8 checks passed

zh4ngx pushed a commit to StrikeTeamOne/arroyo that referenced this pull request Nov 16, 2023

Initial support for Avro formats (ArroyoSystems#386)

97e893a

mwylde mentioned this pull request Nov 27, 2023

Add support for writing Avro data #422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for Avro formats #386

Initial support for Avro formats #386

mwylde commented Oct 30, 2023

jacksonrnewhouse left a comment

jacksonrnewhouse Oct 31, 2023

jacksonrnewhouse Oct 31, 2023

mwylde Oct 31, 2023

Initial support for Avro formats #386

Initial support for Avro formats #386

Conversation

mwylde commented Oct 30, 2023

jacksonrnewhouse left a comment

Choose a reason for hiding this comment

jacksonrnewhouse Oct 31, 2023

Choose a reason for hiding this comment

jacksonrnewhouse Oct 31, 2023

Choose a reason for hiding this comment

mwylde Oct 31, 2023

Choose a reason for hiding this comment