MapReduce

作者: daxiadoraemon | 来源:发表于2018-07-18 02:21 被阅读0次

大数据学习day_5
MapReduce入门
MapReduce 原理分析
分布式计算系统MapReduce
3. Hadoop：MapReduce 编程及 shuffle
四、分布式计算框架MapReduce
Hadoop - MapReduce
Data Algorithm
Hadoop大数据平台架构与实践 | MapReduce
6_大数据之MapReduce_1

MapReduce: Simplified Data Processing on Large Clusters

1. Execution Overview:

(1). split input file (16-64mb), copy them to the cluster.

(2). master assigns the map task and reduce task.

(3). workers begin to deal with the map tasks

(4). periodically, buffered pairs are written to local disks, using partition function to partition the pairs into R regions and then tell the master their locations.

(5). reduce worker first sort the intermediate data.

(6). reduce worker use reduce function.

(7). when finished, master wake up user program.

2. Master Data Structure:

stores the states of map and reduce tasks and the identity of workers.

the location and size information of files

3. fault tolerance:

(1). worker failure:

master ping workers periodically, if failure, the worker will be reset to idle and be eligible for rescheduling.

failure map should be re-executed since the data stored on local disks, while reducing not since data stored on gfs.

If map re-executes, master tell all the reduce executions

(2). Master failure:

a new copy can be started from the last checkpointed state.

(3). Semantics in the presence of failure

4. Locality

5. Task granularity

6. Backup tasks

when a mp operation is close to completion, the master schedules backup executions the remaining in-process tasks

7. Refinement:

(1). Partition function:

tend to result in fairly well-balanced partition

(2). ordering guarantee:

easy to sort

(3). combiner function

the same as reduce function. the only difference is how to handle the outputs. can speed up

(4). input & output types

网友评论

本文标题：MapReduce

本文链接：https://www.haomeiwen.com/subject/fqynpftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

MapReduce

相关文章

大数据学习day_5

MapReduce入门

MapReduce 原理分析

分布式计算系统MapReduce

3. Hadoop：MapReduce 编程及 shuffle

四、分布式计算框架MapReduce

Hadoop - MapReduce

Data Algorithm

Hadoop大数据平台架构与实践 | MapReduce

6_大数据之MapReduce_1

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读