keywords:编程模型,编程方法,分而治之的思想;
YARN: Hadoop2.0后资源管理器,所有MR任务都由期控制。分为三部分:
1.ResourceManager
-分配和调度资源
-启动并监控ApplicationMaster
-监控NodeManager
2.ApplicationMaster
-为MR类型程序申请资源,分配给内部任务;
-负责数据切分;
-监控任务的执行及容错;
3.NodeManager
-管理单个节点资源
-处理来自ResourceManager的命令
-处理来自ApplicationMaster的命令
MapReduce编程模型

MapReduce编程简单示例
#map文件
import sys
def read_input(file):
for line in file:
yield line.split()
def main():
data = read_input(sys.stdin)
for words in data:
for word in words:
print("%s%s%d" % (word, '\t', 1))
if __name__ == '__main__':
main()
#reduce文件
import sys
from operator import itemgetter
from itertools import groupby
def read_mapper_output(file, separator='\t'):
for line in file:
yield line.rstrip().split(separator, 1)
def main():
data = read_mapper_output(sys.stdin)
for current_word, group in groupby(data, itemgetter(0)):
# print current_word
# for i in group:
# print i
# print "\n"
total_count = sum(int(count) for current_word, count in group)
print("%s%s%d" % (current_word, '\t', total_count))
if __name__ == '__main__':
main()
#运行代码
echo "a b c d a a b"|python2.7 map.py | sort -k1 | python2.7 reduce.py
a 3
b 2
c 1
d 1
网友评论