Avro Reading and Writing

Writing Avro

I created a file named "avro_writer.py" which uses the avro-python3 library (not the avro library) in python to write avro files. This getting started with python post on apache helped me get going in under 30 minutes (note: it's written for python2 so a bit outdated).

In the "avro_writer.py" I ran the following tests.

write to a file with a giving schema, also check incorrect writing.
write to a new file with an updated schema.
write to a single file with both schemas.

Reading Avro

I created a file named "avro_reader.py" which also uses the avro-python3 library.

I ran the following tests

read without a schema
read with the right schema
read with an old schema
read a mixed schema file with the new schema
read with an incomplete schema (missing fields)
read with an incompatible schema to see if it fails

Reading from command line

Following the tutorial Reading and Writing Avro Files from the Command Line I'm able to use the command line to read the files with and without a schema. You can download avro-tools-1.8.2.jar

java -jar avro-tools-1.8.2.jar tojson users_v1.avro

Message Schemas

Benefits of Schema Evolution

schema evolution

A field with a default value is added.
A field that was previously defined with a default value is removed.
A field's doc attribute is changed, added or removed.
A field's order attribute is changed, added or removed.
A field's default value is added, or changed.
Field or type aliases are added, or removed.
A non-union type may be changed to a union that contains only the original type, or vice-versa.

With Confluent Kafka

Installing via the Confluent Quick Start Instructions. I went through all the steps to set up the cp-all-in-one container, have it running, and test it.

For python, using the Confluent Python Library.

docker-compose exec broker kafka-topics --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic example.users

docker-compose exec broker kafka-topics --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic example.numtest

Note: our stage kafka should have a check which looks for what topics are there. The producers can auto generate topics which could get unruly if not maintained.

Note: schema registry kind of sucks.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
schemas		schemas
scripts		scripts
.gitignore		.gitignore
README.md		README.md
avro_kafka_consumer.py		avro_kafka_consumer.py
avro_kafka_producer.py		avro_kafka_producer.py
avro_reader.py		avro_reader.py
avro_writier.py		avro_writier.py
kafka_consumer.py		kafka_consumer.py
kafka_producer.py		kafka_producer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Avro Reading and Writing

Writing Avro

Reading Avro

Reading from command line

Message Schemas

Benefits of Schema Evolution

With Confluent Kafka

About

Releases

Packages

Languages

earthastronaut/kafka-investigation

Folders and files

Latest commit

History

Repository files navigation

Avro Reading and Writing

Writing Avro

Reading Avro

Reading from command line

Message Schemas

Benefits of Schema Evolution

With Confluent Kafka

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages