Hadoop 3.0
JDK 8+
Support for erasure coding in HDFS 用于备份历史数据,1.X份数据存3个Reps
YARN Timeline Service v.2
Shell script rewrite
Shaded client jars Hadoop依赖的老大难问题,只能用Shade jar规避
Support for Opportunistic Containers and Distributed Scheduling.
MapReduce task-level native optimization
Support for more than 2 NameNodes.
Default ports of multiple services have been changed.
Support for Microsoft Azure Data Lake and Aliyun Object Storage System filesystem connectors 大数据向云靠
Intra-datanode balancer
Reworked daemon and task heap management
S3Guard: Consistency and Metadata Caching for the S3A filesystem client
HDFS Router-Based Federation
API-based configuration of Capacity Scheduler queue configuration
YARN Resource Types CPU+MEMORY外,+GPU+其它资源
new features大多来自2.9版本。
Hadoop 3.1
Yarn Service framework provides first class support and APIs to host long running services natively in YARN. 用于集成Docker
In a nutshell, it serves as a container orchestration platform for managing containerized services on YARN. It supports both docker container and traditional process based containers in YARN.
First-class GPU scheduling and isolation (For both docker/non-docker containers) on YARN. 用于Docker和深度学习
First-class FPGA scheduling and isolation (For both docker/non-docker containers) on YARN. 用于Docker和深度学习
Support more expressive placement constraints in YARN. Such constraints can be crucial for the performance and resilience of applications, especially those that include long-running containers, such as services, machine-learning and streaming workloads.
For example, it may be beneficial to co-locate the allocations of a job on the same rack (affinity constraints) to reduce network costs, spread allocations across machines (anti-affinity constraints) to minimize resource interference, or allow up to a specific number of allocations in a node group (cardinality constraints) to strike a balance between the two. Placement decisions also affect resilience. For example, allocations placed within the same cluster upgrade domain would go offline simultaneously.
Support administrators to specify absolute resources (X Memory, Y VCores, Z GPUs, etc.) to a queue instead of providing percentage based values. This provides better control for admins to configure required amount of resources for a given queue.
Provided storage allows data stored outside HDFS to be mapped to and addressed from HDFS. It builds on heterogeneous storage by introducing a new storage type, PROVIDED, to the set of media in a DataNode.
如果要跑Docker/深度学习,以Hadoop3.1起步。
Hadoop 3.2
node Attributes Support in YARN
Node Attributes helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels.
More details are available in the Node Attributes documentation.
Hadoop Submarine enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster where data resides. 分布式深度学习
More details are available in the Hadoop Submarine documentation.
Supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories.
More details are available in the Storage Policy Satisfier documentation.
Supports the latest Azure Datalake Gen2 Storage.
Support of an enhanced S3A connector, including better resilience to throttled AWS S3 and DynamoDB IO.
Upgrades for YARN long running services
Supports in-place seamless upgrades of long running containers via YARN Native Service API and CLI.
More details are available in the YARN Service Upgrade documentation.
大数据深度学习版本。
存储和计算这块,很玖没啥新东西了,总体来讲向分布式深度学习靠,风口导向,很庆幸两年前转了深度学习。
网友评论