Hadoop3 更新

作者: clive0x | 来源:发表于2019-02-17 10:05 被阅读0次

Hadoop3 更新
技术相关进程详解
hadoop集群配置
【工作】Presto 集群实测，以及与Spark3、Hive3性
ZooKeeper分布式安装部署
在Hadoop3中提交任务到YARN中执行所需配置
Hadoop3.2.0使用详解
Hadoop3.0 Java API使用指南
Hadoop3-伪分布式模式安装
大数据Hadoop2.x与Hadoop3.x相比较有哪些变化

Hadoop 3.0

JDK 8+

Support for erasure coding in HDFS 用于备份历史数据，１.Ｘ份数据存３个Reps

YARN Timeline Service v.2

Shell script rewrite

Shaded client jars Hadoop依赖的老大难问题，只能用Shade jar规避

Support for Opportunistic Containers and Distributed Scheduling.

MapReduce task-level native optimization

Support for more than 2 NameNodes.

Default ports of multiple services have been changed.

Support for Microsoft Azure Data Lake and Aliyun Object Storage System filesystem connectors　大数据向云靠

Intra-datanode balancer

Reworked daemon and task heap management

S3Guard: Consistency and Metadata Caching for the S3A filesystem client

HDFS Router-Based Federation

API-based configuration of Capacity Scheduler queue configuration

YARN Resource Types　CPU+MEMORY外，+GPU+其它资源

new features大多来自2.9版本。

Hadoop 3.1

Yarn Service framework provides first class support and APIs to host long running services natively in YARN. 用于集成Docker

In a nutshell, it serves as a container orchestration platform for managing containerized services on YARN. It supports both docker container and traditional process based containers in YARN.

First-class GPU scheduling and isolation (For both docker/non-docker containers) on YARN. 用于Docker和深度学习

First-class FPGA scheduling and isolation (For both docker/non-docker containers) on YARN.　用于Docker和深度学习

Support more expressive placement constraints in YARN. Such constraints can be crucial for the performance and resilience of applications, especially those that include long-running containers, such as services, machine-learning and streaming workloads.

For example, it may be beneficial to co-locate the allocations of a job on the same rack (affinity constraints) to reduce network costs, spread allocations across machines (anti-affinity constraints) to minimize resource interference, or allow up to a specific number of allocations in a node group (cardinality constraints) to strike a balance between the two. Placement decisions also affect resilience. For example, allocations placed within the same cluster upgrade domain would go offline simultaneously.

Support administrators to specify absolute resources (X Memory, Y VCores, Z GPUs, etc.) to a queue instead of providing percentage based values. This provides better control for admins to configure required amount of resources for a given queue.

Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a DataNode.

如果要跑Docker/深度学习，以Hadoop3.1起步。

Hadoop 3.2

node Attributes Support in YARN

Node Attributes helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels.

More details are available in the Node Attributes documentation.

Hadoop Submarine on YARN

Hadoop Submarine enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster where data resides. 分布式深度学习

More details are available in the Hadoop Submarine documentation.

Storage Policy Satisfier

Supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories.

More details are available in the Storage Policy Satisfier documentation.

ABFS Filesystem connector

Supports the latest Azure Datalake Gen2 Storage.

Enhanced S3A connector

Support of an enhanced S3A connector, including better resilience to throttled AWS S3 and DynamoDB IO.

Upgrades for YARN long running services

Supports in-place seamless upgrades of long running containers via YARN Native Service API and CLI.

More details are available in the YARN Service Upgrade documentation.

大数据深度学习版本。

存储和计算这块，很玖没啥新东西了，总体来讲向分布式深度学习靠，风口导向，很庆幸两年前转了深度学习。