1. What
Avro是一个数据序列化系统,用于支持大批量数据交换的应用
Apache Avro™ is a data serialization system.
Avro provides:
- Rich data structures.
- A compact, fast, binary data format.
- A container file, to store persistent data.
- Remote procedure call (RPC).
- Simple integration with dynamic languages. Code generation is not required to read or write - data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
2. Why
-
动态类型, 读写更灵活
A key feature of Avro is robust support for data schemas that change over time — often called schema evolution. Avro handles schema changes like missing fields, added fields and changed fields; as a result, old programs can read new data and new programs can read old data. -
非标记数据, 减少序列化信息, 更小更快
Since the schema is present when data is read, considerably less type information need be encoded with data, resulting in smaller serialization size.
3. How
添加依赖
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.10.2</version>
</dependency>
通过读写avro样例代码, 可以发现均使用到schema, 这也印证了avro的存储特点
3.1 写avro
package org.leon;
import org.apache.avro.Schema;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumWriter;
import java.io.File;
import java.io.IOException;
public class App {
public static void main(String[] args) throws IOException {
String SCHEMA = "{\n" +
" \"type\": \"record\",\n" +
" \"name\": \"User\",\n" +
" \"fields\": [\n" +
" {\"name\": \"name\", \"type\": \"string\"},\n" +
" {\"name\": \"favorite_number\", \"type\": [\"int\", \"null\"]},\n" +
" {\"name\": \"favorite_color\", \"type\": [\"string\", \"null\"]}\n" +
" ]\n" +
"}";
Schema schema = new Schema.Parser().parse(SCHEMA);
GenericRecord u1 = new GenericData.Record(schema);
u1.put("name", "刘小M");
u1.put("favorite_number", 150);
GenericRecord u2 = new GenericData.Record(schema);
u2.put("name", "李二蛋");
u2.put("favorite_color", "blue");
File file = new File("user.avro");
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
dataFileWriter.create(schema, file);
dataFileWriter.append(u1);
dataFileWriter.append(u2);
dataFileWriter.close();
}
}
3.2 读avro
package org.leon;
import org.apache.avro.Schema;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumReader;
import java.io.File;
import java.io.IOException;
public class App {
public static void main(String[] args) throws IOException {
String SCHEMA = "{\n" +
" \"type\": \"record\",\n" +
" \"name\": \"User\",\n" +
" \"fields\": [\n" +
" {\"name\": \"name\", \"type\": \"string\"},\n" +
" {\"name\": \"favorite_number\", \"type\": [\"int\", \"null\"]},\n" +
" {\"name\": \"favorite_color\", \"type\": [\"string\", \"null\"]}\n" +
" ]\n" +
"}";
Schema schema = new Schema.Parser().parse(SCHEMA);
File file = new File("user.avro");
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);
GenericRecord user = null;
while (dataFileReader.hasNext()) {
user = dataFileReader.next();
System.out.println(user);
}
}
}
{"name": "刘小M", "favorite_number": 150, "favorite_color": null}
{"name": "李二蛋", "favorite_number": null, "favorite_color": "blue"}
网友评论