002 9 个Hadoop 最受欢迎的特性

作者: 胡巴Lei特 | 来源:发表于2019-07-29 22:37 被阅读0次

002 9 个Hadoop 最受欢迎的特性
大数据分析Hadoop优秀书籍推荐
hadoop分布式环境搭建
Python Web开发系列课程之——初探 Django Adm
JavaScript编程中容易出BUG的几点小知识
扩展Hadoop 3.x新特性概述
002 为什么学习大数据和 Hadoop-学习 Hadoop 的
Java8之十分钟学会《Lambda表达式》
从“类型系统”角度看Kotlin空类型
15个最受欢迎的Python开源框架

002 9 Features Of Hadoop That Made It The Most Popular

1. Features of Hadoop and Design Principles – Objective

1. Hadoop 特点及设计原则--目标

In this features of Hadoop and design principles tutorial we will discuss the Hadoop Features, characteristics and design principles of Hadoop. In this feratures of hadoop blog we will also discuss the assumptions around which the Hadoop in built. So let’s get started with the characteristics of Hadoop and design principles tutorial.
To install and configure Hadoop follow this installation guide.

我们将讨论 Hadoop 的特性、特性和设计原则.在这篇 hadoop 博客的文章中，我们还将讨论 Hadoop 内置的假设.让我们从 Hadoop 的特性和设计原则教程开始.
安装和配置 Hadoop遵循本安装指南.

List of Hadoop Features and design principles.

2. Features of Hadoop

Apache Hadoop is the most popular and powerful big data tool, Hadoop provides world’s most reliable storage layer – HDFS, a batch Processing engine – MapReduce and a Resource Management Layer – YARN. In this section of features of Hadoop, Let us discuss important features of Hadoop which are given below-

Apache Hadoop 是最受欢迎和最强大的 大数据 Hadoop 提供了世界上最可靠的存储层 HDFS 一个批量处理引擎MapReduce 和资源管理层-Yarn.在 Hadoop 特性的这一部分中，让我们讨论 Hadoop 的重要特性，如下所示:

2.1. Open Source
Apache Hadoop is an open source project. It means its code can be modified according to business requirements.

Apache Hadoop 是一个开源项目.这意味着它的代码可以根据业务需求进行修改.

2.2. Distributed Processing
As data is stored in a distributed manner in HDFS across the cluster, data is processed in parallel on a cluster of nodes.

2.2.分布式处理
由于数据以分布式的方式存储在集群中的 HDFS 中，所以数据在节点集群上并行处理.

2.3. Fault Tolerance
This is one of the very important features of Hadoop. By default 3 replicas of each block is stored across the cluster in Hadoop and it can be changed also as per the requirement. So if any node goes down, data on that node can be recovered from other nodes easily with the help of this characteristic. Failures of nodes or tasks are recovered automatically by the framework. This is how Hadoop is fault tolerant.

2.3.容错
这是 Hadoop 非常重要的特性之一.默认情况下，每个副本有 3 个块存储在 Hadoop 中的整个集群中，也可以根据要求进行更改.因此，如果任何节点出现故障，借助这一特性，该节点上的数据可以很容易地从其他节点恢复.框架会自动恢复节点或任务的故障.Hadoop 就是这样的容错.

2.4. Reliability
Due to replication of data in the cluster, data is reliably stored on the cluster of machine despite machine failures. If your machine goes down, then also your data will be stored reliably due to this charecteristic of Hadoop.

2.4.可靠性
由于集群中数据的复制，尽管机器出现故障，数据还是可靠地存储在机器集群中.如果你的机器停机，那么由于 Hadoop 的这种特性，你的数据也将被可靠地存储.

2.5. High Availability
Data is highly available and accessible despite hardware failure due to multiple copies of data. If a machine or few hardware crashes, then data will be accessed from another path.

2.5.高可用性
数据是高可用尽管由于多个数据副本导致硬件故障，但仍然可以访问.如果一台机器或几个硬件崩溃，那么数据将从另一个路径访问.

2.6. Scalability
Hadoop is highly scalable in the way new hardware can be easily added to the nodes. This feature of Hadoop also provides horizontal scalability which means new nodes can be added on the fly without any downtime.

2.6.可扩展性
Hadoop 在新的硬件可以很容易地添加到节点的方式是高度可扩展的.Hadoop 的这一功能还提供了横向可扩展性，这意味着可以在不停机的情况下动态添加新节点.

2.7. Economic

2.7.经济

Apache Hadoop is not very expensive as it runs on a cluster of commodity hardware. We do not need any specialized machine for it. Hadoop also provides huge cost saving also as it is very easy to add more nodes on the fly here. So if requirement increases, then you can increase nodes as well without any downtime and without requiring much of pre-planning.

Apache Hadoop 运行在一个商品硬件集群上，成本并不高.我们不需要专门的机器.Hadoop 还提供了巨大的成本节约，因为在这里很容易动态添加更多节点.因此，如果需求增加，那么您也可以在不停机的情况下增加节点，而不需要太多的预规划.

2.8. Easy to use
No need of client to deal with distributed computing, the framework takes care of all the things. So this feature of Hadoop is easy to use.

2.8.使用方便
该框架不需要客户端来处理分布式计算，它可以处理所有的事情.因此，Hadoop 的这一功能易于使用.

2.9. Data Locality
This one is a unique features of Hadoop that made it easily handle the Big Data. Hadoop works on data locality principle which states that move computation to data instead of data to computation. When a client submits the MapReduce algorithm, this algorithm is moved to data in the cluster rather than bringing data to the location where the algorithm is submitted and then processing it.

2.9.数据局部性
这是 Hadoop 的一个独特的特性，它可以很容易地处理大数据.Hadoop 的工作原理是将计算转移到数据上，而不是将数据移动到计算上.当客户端提交 MapReduce 算法时，该算法会移动到集群中的数据，而不是将数据带到提交算法的位置，然后进行处理.

These were the Characteristics of Hadoop that differentiates it from the other Data Management systems. Now, after the features of Hadoop, lets go through some Hadoop Assumptions that are to be considered before using Hadoop.

这些是 Hadoop 区别于其他数据管理系统的特点.现在，在了解了 Hadoop 的特性之后，让我们来看看在使用 Hadoop 之前需要考虑的一些 Hadoop 假设.

Read: Hadoop Ecosystem components

阅读:Hadoop 生态系统组件

3. Hadoop Assumptions

Hadoop is written with large clusters of computers in mind and is built around the following hadoop assumptions:

Hadoop 是在考虑大型计算机集群的情况下编写的，它是围绕以下 hadoop 假设构建的:

Hardware may fail, (as commodity hardware can be used)
Processing will be run in batches. Thus there is an emphasis on high throughput as opposed to low latency.
Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size.
Applications need a write-once-read-many access model.
Moving Computation is Cheaper than Moving Data.
硬件可能会出现故障 (因为可以使用普通硬件)
将分批运行处理.因此，与低延迟相比，强调高吞吐量.
在 HDFS 上运行的应用程序有大量数据集.HDFS 中的典型文件大小为千兆字节到兆字节.
应用程序需要多次写一次访问模型.
移动计算比移动数据便宜.

Hadoop Quiz

4. Design Principles of Hadoop

4. Hadoop 设计原则

Below are the design principles of Hadoop on which it works:
a) System shall manage and heal itself

下面是 Hadoop 的设计原则，它在 Hadoop 上工作:
A) 系统应自我管理和自我修复

Automatically and transparently route around failure (Fault Tolerant)
Speculatively execute redundant tasks if certain nodes are detected to be slow
围绕故障自动透明地路由 (容错)
投机如果检测到某些节点速度较慢，则执行冗余任务

b) Performance shall scale linearly

B) 性能应线性扩展

Proportional change in capacity with resource change (Scalability)
容量随资源变化的比例变化 (可扩展性)

c) Computation should move to data

C) 计算应该移动到数据

Lower latency, lower bandwidth (Data Locality)
低延迟、低带宽 (数据局部性)

d) Simple core, modular and extensible (Economical)

D) 简单的核心，模块化和可扩展 (经济)

5. Conclusion – Characteristics of Hadoop and Design Principles Tutorial

5. 结论 -- Hadoop 的特性和设计原理教程

In this tutorial we have not only covered features of Hadoop but also its design principle on which Hadoop works.Now when you know about the characteristics of hadoop and its design principles, go through our Hadoop installation guide to use Hadoop functionality.

在本教程中，我们不仅介绍了 Hadoop 的特性，还介绍了 Hadoop 的设计原理.Hadoop 安装指南使用 Hadoop 功能.

This was all about the characteristics of Hadoop and design principles tutorial. Hope you like the characteristics of Hadoop tutorial.
See Also-

这是关于 Hadoop 和设计原则教程的所有特性.希望大家喜欢 Hadoop 教程的特点.
另见-

Got any queries or feedback about this features of hadoop and design principle tutorial? Just leave your message in the comment section and we will get back to you.

有关于 hadoop 和设计原理教程的这些功能的任何查询或反馈？在评论部分留下你的信息，我们会给你回复.

https://data-flair.training/blogs/features-of-hadoop-and-design-principles

002 9 个Hadoop 最受欢迎的特性
002 9 Features Of Hadoop That Made It The Most Popular 1....
大数据分析Hadoop优秀书籍推荐
还在苦恼的四处搜寻Hadoop书籍吗？本篇文章就来为您推荐时下最权威最受欢迎的hadoop及其相关组件学习书籍，这...
hadoop分布式环境搭建
环境介绍 hadoop001：192.168.199.102 hadoop002：192.168.199.103 ...
Python Web开发系列课程之——初探 Django Adm
【前置课程】Django中Form的妙用 Django Admin 是 Django 框架中最受欢迎的特性之一。该...
JavaScript编程中容易出BUG的几点小知识
JavaScript是如今最受欢迎的编程语言之一，但受欢迎同时就是该语言自身的各种特性带来的副作用，无论该语言多美...
扩展Hadoop 3.x新特性概述
扩展Hadoop 3.x新特性概述 Hadoop3.x中增强了很多特性，在Hadoop3.x中，不再允许使用jdk...
002 为什么学习大数据和 Hadoop-学习 Hadoop 的
002 Why Learn Big Data And Hadoop – Top Reasons To Learn ...
Java8之十分钟学会《Lambda表达式》
简介 Lambda表达式（也称闭包），是Java8发布的新特性中最受期待和欢迎的新特性之一。在Java语法层面La...
从“类型系统”角度看Kotlin空类型
前言要说Kotlin哪个特性最受欢迎，我觉得毫无疑问是“Null safety”（空安全/空类型安全），有图为证...
15个最受欢迎的Python开源框架
15个最受欢迎的Python开源框架从GitHub中整理出的15个最受欢迎的Python开源框架。这些框架包括事...