美文网首页
Avro简介

Avro简介

作者: halfempty | 来源:发表于2021-10-26 21:24 被阅读0次

    1. What

    Avro是一个数据序列化系统,用于支持大批量数据交换的应用

    Apache Avro™ is a data serialization system.

    Avro provides:

    • Rich data structures.
    • A compact, fast, binary data format.
    • A container file, to store persistent data.
    • Remote procedure call (RPC).
    • Simple integration with dynamic languages. Code generation is not required to read or write - data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

    2. Why

    • 动态类型, 读写更灵活
      A key feature of Avro is robust support for data schemas that change over time — often called schema evolution. Avro handles schema changes like missing fields, added fields and changed fields; as a result, old programs can read new data and new programs can read old data.

    • 非标记数据, 减少序列化信息, 更小更快
      Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size.

    3. How

    添加依赖

    <dependency>
      <groupId>org.apache.avro</groupId>
      <artifactId>avro</artifactId>
      <version>1.10.2</version>
    </dependency>
    

    通过读写avro样例代码, 可以发现均使用到schema, 这也印证了avro的存储特点

    3.1 写avro

    package org.leon;
    
    
    import org.apache.avro.Schema;
    import org.apache.avro.file.DataFileWriter;
    import org.apache.avro.generic.GenericData;
    import org.apache.avro.generic.GenericDatumWriter;
    import org.apache.avro.generic.GenericRecord;
    import org.apache.avro.io.DatumWriter;
    
    import java.io.File;
    import java.io.IOException;
    
    public class App {
    
        public static void main(String[] args) throws IOException {
    
            String SCHEMA = "{\n" +
                    "    \"type\": \"record\",\n" +
                    "    \"name\": \"User\",\n" +
                    "    \"fields\": [\n" +
                    "        {\"name\": \"name\", \"type\": \"string\"},\n" +
                    "        {\"name\": \"favorite_number\",  \"type\": [\"int\", \"null\"]},\n" +
                    "        {\"name\": \"favorite_color\", \"type\": [\"string\", \"null\"]}\n" +
                    "    ]\n" +
                    "}";
    
            Schema schema = new Schema.Parser().parse(SCHEMA);
    
            GenericRecord u1 = new GenericData.Record(schema);
            u1.put("name", "刘小M");
            u1.put("favorite_number", 150);
    
            GenericRecord u2 = new GenericData.Record(schema);
            u2.put("name", "李二蛋");
            u2.put("favorite_color", "blue");
    
            File file = new File("user.avro");
            DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
            DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
            dataFileWriter.create(schema, file);
            dataFileWriter.append(u1);
            dataFileWriter.append(u2);
            dataFileWriter.close();
        }
    }
    

    3.2 读avro

    package org.leon;
    
    
    import org.apache.avro.Schema;
    import org.apache.avro.file.DataFileReader;
    import org.apache.avro.generic.GenericDatumReader;
    import org.apache.avro.generic.GenericRecord;
    import org.apache.avro.io.DatumReader;
    
    import java.io.File;
    import java.io.IOException;
    
    public class App {
    
        public static void main(String[] args) throws IOException {
            String SCHEMA = "{\n" +
                    "    \"type\": \"record\",\n" +
                    "    \"name\": \"User\",\n" +
                    "    \"fields\": [\n" +
                    "        {\"name\": \"name\", \"type\": \"string\"},\n" +
                    "        {\"name\": \"favorite_number\",  \"type\": [\"int\", \"null\"]},\n" +
                    "        {\"name\": \"favorite_color\", \"type\": [\"string\", \"null\"]}\n" +
                    "    ]\n" +
                    "}";
    
            Schema schema = new Schema.Parser().parse(SCHEMA);
    
            File file = new File("user.avro");
            DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
            DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);
    
            GenericRecord user = null;
            while (dataFileReader.hasNext()) {
                user = dataFileReader.next();
                System.out.println(user);
            }
        }
    }
    

    {"name": "刘小M", "favorite_number": 150, "favorite_color": null}
    {"name": "李二蛋", "favorite_number": null, "favorite_color": "blue"}

    相关文章

      网友评论

          本文标题:Avro简介

          本文链接:https://www.haomeiwen.com/subject/ixvyaltx.html