I did a comparison of Java default serialization and Apache Avro serialization of data and results were very astonishing.
You can read my older posts for Java serialization process and Apache Avro Serialization.
Apache Avro consumed 15-20 times less memory to store the serialized data. I created a class with three fields (two String
and one enum
and serialized them with Avro and Java.
The memory used by Avro is 14 bytes and Java used 231 bytes (length of byte[]
)
Reason for generating less bytes by Avro
Java Serialization
The default serialization mechanism for an object writes the class of the object, the class signature, and the values of all non-transient and non-static fields. References to other objects (except in transient or static fields) cause those objects to be written also. Multiple references to a single object are encoded using a reference sharing mechanism so that graphs of objects can be restored to the same shape as when the original was written.
Apache Avro
writes only the schema as String
and data of class being serialized. There is no per field overhead of writing the class of the object, the class signature as in Java. Also, the fields are serialized in pre-determined order.
You can find the full Java example on github.
java.lang.StackOverflowError
whereas Java's default serialization can handle it.
(example code for Avro and example code for Java serialization)
Another observation is that Avro have no direct way of defining inheritance in the Schema (Classes) but Java's default serialization support inheritance with its own constraints like super class either need to implements Serializable
interface or have default no-args constructor accessible till top hierarchy (otherwise will throw java.io.NotSerializableException
).