Rust和大数据

作者: 天之見證 | 来源:发表于2024-01-17 11:26 被阅读0次

rust数据类型
读Rust程序设计语言 - 04
【Rust】Rust和Java的对比--数据类型
Rust 数据类型
二零一七年二月
关于 Rust & WebAssembly
Rust语言编程实例100题-073
Rust Option 模式匹配简介
rust语言特性汇总
CYBEX&LONGHASH赞助Rust.CC社区，共建技术社区

笔者从事大数据行业，最近对Rust语言比较感兴趣，特地关注了一下Rust在大数据生态中的建设情况，以下是一些由Rust编写的大数据框架，感兴趣的同学可以关注相关项目：

Apache Arrow Ballista

VS Spark：

Although Ballista is largely inspired by Apache Spark, there are some key differences.

The choice of Rust as the main execution language means that memory usage is deterministic and avoids the overhead of GC pauses.

Ballista is designed from the ground up to use columnar data, enabling a number of efficiencies such as vectorized processing (SIMD and GPU) and efficient compression. Although Spark does have some columnar support, it is still largely row-based today.

The combination of Rust and Arrow provides excellent memory efficiency and memory usage can be 5x - 10x lower than Apache Spark in some cases, which means that more processing can fit on a single node, reducing the overhead of distributed compute.

The use of Apache Arrow as the memory model and network protocol means that data can be exchanged between executors in any programming language with minimal serialization overhead.

总结来说就是以下3点：

Rust避免了GC，效率更高
纯列式存储
采用Arrow内存模型更高效

arroyo

VS Flink:

Serverless operations: Arroyo pipelines are designed to run in modern cloud environments, supporting seamless scaling, recovery, and rescheduling

High performance SQL: SQL is a first-class concern, with consistently excellent performance

Designed for non-experts: Arroyo cleanly separates the pipeline APIs from its internal implementation. You don’t need to be a streaming expert to build real-time data pipelines.

总结来说是以下3点：

Serverless，更加适用与云生态
高性能SQL
易上手

Databend

VS Snowflake*

Cloud-Friendly: Seamlessly integrates with various cloud storages like AWS S3, Azure Blob, Google Cloud, and more.

High Performance: Built in Rust, utilizing SIMD and vectorized processing for rapid analytics. See ClickBench.

Cost-Efficient Elasticity: Innovative design for separate scaling of storage and computation, optimizing both costs and performance.

Easy Data Management: Integrated data preprocessing during ingestion eliminates the need for external ETL tools.

Data Version Control: Offers Git-like multi-version storage, enabling easy data querying, cloning, and reverting from any point in time.

Rich Data Support: Handles diverse data formats and types, including JSON, CSV, Parquet, ARRAY, TUPLE, MAP, and JSON.

AI-Enhanced Analytics: Offers advanced analytics capabilities with integrated AI Functions.

Community-Driven: Benefit from a friendly, growing community that offers an easy-to-use platform for all your cloud analytics.

总结来说是以下3点：

云友好
高性能+低成本
丰富的数据支持和管理
开源

网友评论

本文标题：Rust和大数据

本文链接：https://www.haomeiwen.com/subject/upmlodtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Rust和大数据

Apache Arrow Ballista

arroyo

Databend

相关文章

rust数据类型

读Rust程序设计语言 - 04

【Rust】Rust和Java的对比--数据类型

Rust 数据类型

二零一七年二月

关于 Rust & WebAssembly

Rust语言编程实例100题-073

Rust Option 模式匹配简介

rust语言特性汇总

CYBEX&LONGHASH赞助Rust.CC社区，共建技术社区

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读