使用Hadoop Streaming运行Python版Wordc

作者: 苟雨 | 来源:发表于2017-07-14 08:56 被阅读72次

使用Hadoop Streaming运行Python版Wordc
2018-01-31 Hadoop Streaming 编程
Hadoop-Streaming(流)
Hadoop Streaming的使用
使用 hadoop streaming 编程的几点经验和教训
Hadoop Streaming
hadoop streaming编程
ES-HADOOP配置
Hadoop Streaming
hadoop streaming

编写map函数
wordcount_mapper.py

#!/usr/bin/env python   

# ---------------------------------------------------------------
#This mapper code will input a line of text and output <word, 1>
# 
# ---------------------------------------------------------------

import sys            

for line in sys.stdin:  
    line = line.strip()  
    keys = line.split() 
    for key in keys:    
        value = 1        
        print('{0}\t{1}'.format(key, value) ) #the {} is replaced by 0th,1st items in format list

reduce函数
word count_reducer.py

#!/usr/bin/env python

# ---------------------------------------------------------------
#This reducer code will input a line of text and 
#    output <word, total-count>
# ---------------------------------------------------------------
import sys

last_key      = None              
running_total = 0

# -----------------------------------
# 使用循环读取输入并计数
#  --------------------------------
for input_line in sys.stdin:
    input_line = input_line.strip()
    this_key, value = input_line.split("\t", 1) 
    value = int(value)           
 
    if last_key == this_key:     
        running_total += value   # add value to running total

    else:
        if last_key:          
            print( "{0}\t{1}".format(last_key, running_total) )
                               
        running_total = value    #reset values
        last_key = this_key

if last_key == this_key:
    print( "{0}\t{1}".format(last_key, running_total)) 

    ```


如果你是Yarn的话，需要另外下载streaming的jar包[参考地址](http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-streaming/2.7.3). input 事先准备一些文件。

streaming 使用绝对地址，output 不能是已经存在的目录 mapper 和reducer使用绝对地址

hadoop jar /Download/hadoop-streaming-2.7.3.jar
-input /hello \
-output /output
-mapper /usr/local/yarn/hadoop-2.7.3/wordcount/wordcount_mapper.py
-reducer /usr/local/yarn/hadoop-2.7.3/wordcount/wordcount_reducer.py

然后查看/output就可以看到结果。

网友评论

数据工程师

本文标题：使用Hadoop Streaming运行Python版Wordc

本文链接：https://www.haomeiwen.com/subject/rsfihxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

使用Hadoop Streaming运行Python版Wordc

streaming 使用绝对地址，output 不能是已经存在的目录 mapper 和reducer使用绝对地址

相关文章

使用Hadoop Streaming运行Python版Wordc

2018-01-31 Hadoop Streaming 编程

Hadoop-Streaming(流)

Hadoop Streaming的使用

使用 hadoop streaming 编程的几点经验和教训

Hadoop Streaming

hadoop streaming编程

ES-HADOOP配置

Hadoop Streaming

hadoop streaming

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

数据工程师