2018-01-10 Hadoop Platform and A

作者: 鸭鸭学语言 | 来源:发表于2018-01-10 19:00 被阅读0次

2018-01-10 Hadoop Platform and A
更换hadoop本地库为64位版本
2018-01-07 Hadoop Platform and A
2018-01-11 Hadoop Platform and A
2018-01-05 Hadoop Platform and A
2018-01-09 Hadoop Platform and A
2018-01-10
python import sklearn包问题
python判断当前操作系统环境
027_ReactNative: Platform Specif

YARN

It support classic MapReduce framework

It also support other open source / commercial applications running on it, like Impala, Storm and they do not need change anything.

It also support user developed applications

It also enables frameworks like Tez, Spark

Execution Frameworks: YARN, Tez, Spark

Support DAG(directed acyclic graph) of tasks.

In memeory caching of data

MapReduce

Application engine.

Applications fits the MapReduce paradigm: need know the distributed data chains, and which are independent of each other, and then have the shuffle process that will feed the data into the reduce process.

Application does not fit the MapReduce paradigm:

Interactive data exploration - load data into memeory to avoid loading data from disk again and again.

Iterative data procesing - Machine Learing algorithms.

Tez

Application engine.

Features:

Handle Dataflow graphs with expressive API.

Support customized data types and customized logic application, so no restriction as on MapReduce of framework.

Can run complex DAG of tasks

Dynamic DAG changes

Reuse resource(containers) to avoid those costs of containers startup. More efficient.

Compare MapReduce and Tez on :

Use case:

SELECT a.vendor, COUNT(*), AVG(c.cost) FROM a JOIN b ON (a,id=b.id) JOIN a ON (a.itemid=c.itemid) GROUP BY a.vendor

MapReduce

Tez

Spark

Application engine.

It could run on HDFS directly without YARN is needed. It can also run on other storage too.

Features:

Advance DAG execution engine - Data can be shared across DAGs, between iterations and reused. So much faster than other DAG engines.

Support cyclic data flow

In-memory computing. If out of memory, it excels at gracefully spilling over to disks.

Can be accessd from Java, Scala, Python, R

Existing optimized libraries

Hadoop Resource Scheduling

Schedulers：

FIFO (default)

Fairshare - balance resource between application, default resource is memory but we can add CPUs as resource.

Balance out resource allocation among apps over time.

Can organize into queues/sub-queues

Garrantee minimum shares

Weighted app priorities

Capacity - guaratee resource for each application

Queues and sub-queues

Capacity Guarantee with elasticity

ACLs for security

Runtime changes/draining apps

Resource based scheduling

Lesson 4 Slides

网友评论

本文标题：2018-01-10 Hadoop Platform and A

本文链接：https://www.haomeiwen.com/subject/poignxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

2018-01-10 Hadoop Platform and A

YARN

MapReduce

Tez

Spark

Hadoop Resource Scheduling

相关文章

2018-01-10 Hadoop Platform and A

更换hadoop本地库为64位版本

2018-01-07 Hadoop Platform and A

2018-01-11 Hadoop Platform and A

2018-01-05 Hadoop Platform and A

2018-01-09 Hadoop Platform and A

2018-01-10

python import sklearn包问题

python判断当前操作系统环境

027_ReactNative: Platform Specif

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读