This post is in continuation with my earlier posts on Apache Avro - Introduction and Apache Avro - Generating classes from Schema.
In this post, we will discuss about reading (deserialization) and writing(serialization) of Avro generated classes.
"Apache Avro™ is a data serialization system." We use DatumReader<T>
and DatumWriter<T>
for de-serialization and serialization of data, respectively.
Apache Avro formats
Apache Avro supports two formats, JSON and Binary.
Let's move to an example using JSON format.
Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build(); DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class); byte[] data; try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Encoder jsonEncoder = EncoderFactory.get().jsonEncoder(Employee.getClassSchema(), baos); employeeWriter.write(employee, jsonEncoder); jsonEncoder.flush(); data = baos.toByteArray(); } // serialized data System.out.println(new String(data)); DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class); Decoder decoder = DecoderFactory.get().jsonDecoder(Employee.getClassSchema(), new String(data)); employee = employeeReader.read(null, decoder); //data after deserialization System.out.println(employee);
Explanation on the way :)
Line 1: We create an object of class Employee
(AVRO generated)
Line 3: We create an object of SpecificDatumWriter<T>
which implements DatumWriter<T>
Also, there exists other implementation of DatumWriter
viz. GenericDatumWriter
and ReflectDatumWriter
.
Line 6: We create JsonEncoder
by passing Schema and OutputStream
where we want the serialized data and In our case, it is in-memory ByteArrayOutputStream
.
Line 7: We call #write method on DatumWriter
with Object and Encoder
.
Line 8: We flushed the JsonEncoder
. Internally, it flushes the OutputStream
passed to JsonEncoder
.
Line 15: We created object of SpecificDatumReader<T>
which implements DatumReader<T>
. Also, there exists other implementation of DatumReader
viz. GenericDatumReader
and ReflectDatumReader
.
Line 16: We create JsonDecoder
passing Schema and input String
which will be deserialized.
Let's move to serialization and de-serialization example with Binary format.
Employee employee = Employee.newBuilder().setFirstName("Gaurav").setLastName("Mazra").setSex(SEX.MALE).build(); DatumWriter<Employee> employeeWriter = new SpecificDatumWriter<>(Employee.class); byte[] data; try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Encoder binaryEncoder = EncoderFactory.get().binaryEncoder(baos, null); employeeWriter.write(employee, binaryEncoder); binaryEncoder.flush(); data = baos.toByteArray(); } // serialized data System.out.println(data); DatumReader<Employee> employeeReader = new SpecificDatumReader<>(Employee.class); Decoder binaryDecoder = DecoderFactory.get().binaryDecoder(data, null); employee = employeeReader.read(null, decoder); //data after deserialization System.out.println(employee);
All the example is same except Line 6 and Line 16 where we are creating an object of BinaryEncoder
and BinaryDecoder
.
This is how to we can serialize and deserialize data with Apache Avro. I hope you found this article informative and useful. You can find the full example on github.