is now
Showing posts with label AVRO format. Show all posts

This post is in continuation to my previous post on Apache Avro - Introduction. In this post, we will discuss about generating classes from Schema.

How to create Apache Avro schema?

There are two ways to generate AVRO classes from Schema.

  • Pragmatically generating schema
  • Using maven Avro plugin

Consider we have following schema in "src/main/avro"

  "type" : "record",
  "name" : "Employee",
  "namespace" : "com.gauravbytes.avro",
  "doc" : "Schema to hold employee object",
  "fields" : [{
    "name" : "firstName",
    "type" : "string"
    "name" : "lastName",
    "type" : "string"
    "name" : "sex", 
    "type" : {
      "name" : "SEX",
      "type" : "enum",
      "symbols" : ["MALE", "FEMALE"]

Pragmatically generating classes

Classes can be generated for schema using SchemaCompiler.

public class PragmaticSchemaGeneration {
 private static final Logger LOGGER = LoggerFactory.getLogger(PragmaticSchemaGeneration.class);

 public static void main(String[] args) {
  try {
   SpecificCompiler compiler = new SpecificCompiler(new Schema.Parser().parse(new File("src/main/avro/employee.avsc")));
   compiler.compileToDestination(new File("src/main/avro"), new File("src/main/java"));
  } catch (IOException e) {
   LOGGER.error("Exception occurred parsing schema: ", e);

At line number 6, we create the object of SpecificComplier. It has two constructor, one take Protocolas an argument and other take Schema as an argument.

Using Maven plugin to generate schema

There is maven plugin which can generate schema for you. You need to add following configuration to your pom.xml.


This is how we can generate classes from Avro schema. I hope you find this post informative and helpful. You can find the full project on Github.

In this post, we will discuss following items

  • What is Apache Avro?
  • What is Avro schema and how to define it?
  • Serialization in Apache Avro.

What is Apache Avro?

"Apache Avro is data serialization library" That's it, huh. This is what you will see when you open their official page.Apache Avro is:

  • Schema based data serialization library.
  • RPC framework (support).
  • Rich data structures (Primary includes null, string, number, boolean and Complex includes Record, Array, Map etc.).
  • A compact, fast and binary data format.

What is Avro schema and how to define it?

Apache Avro serialization concept is based on Schema. When you write data, schema is written along with it. When you read data, schema will always be present. The schema along with data makes it fully self describing.

Schema is representation of AVRO datum(Record). It is of two types: Primitive and Complex.

Primitive types

These are the basic type supported by Avro. It includes null, int, long, bytes, string, float and double. One quick example:

{"type": "string"}

Complex types

Apache Avro support six complex types i.e. record, enum, array, map, fixed and union.


Record uses the name type 'record' and has following attributes.

  • name: a JSON string, providing the name of the record (required).
  • namespace: A JSON string that qualifies the name.
  • doc: A JSON string representing the documentation for the record.
  • aliases: A JSON array, providing alternate name for the record
  • fields: A JSON array, listing fields (required). It has own attributes.
    • name: A JSON string, providing the name of the field (required).
    • type: A JSON object, defining a schema or record definition (required).
    • doc: A JSON string, providing documentation for the field.
    • default: A default value for the field if the instance lack recognition of the field value.
  "type": "record",
  "name": "Node",
  "aliases": ["SinglyLinkedNodes"],
  "fields" : [
    {"name": "value", "type": "string"},
    {"name": "next", "type": ["null", "Node"]}

Enum uses the type "enum" and support attributes i.e. name, namespace, aliases, doc and symbols (A JSON array).

  "type": "enum",
  "name": "Move",
  "symbols" : ["LEFT", "RIGHT", "UP", "DOWN"]

Array uses the type "array" and support single attribute item.

{"type": "array", "items": "string"}

Map uses the type "map" and support one attribute values. Its key by default are of type string.

{"type": "map", "values": "long"}

Unions are represented by JSON array as ["null", "string"] which means the value type could be null or string.


Fixed uses type "fixed" and support two attributes i.e. name and size.

{"type": "fixed", "size": 16, "name": "md5"}

Serialization in Apache Avro

Apache Avro data is always serialized with its schema. It supports two types of encoding i.e. Binary and JSON . You can read more on serialization on their official specification and/ or can see the example usage here.