美文网首页
python+hadoop学习笔记1

python+hadoop学习笔记1

作者: 肠粉白粥_Hoben | 来源:发表于2018-02-07 01:32 被阅读0次

因为近期有个需求是关于分布式的,因此想用Python来实现一下hadoop。

今天的坑还挺多的。

1.Xshell连不上,呃。。要这样,玄学,我也不知道为什么

莫名其妙

2.编写Python的map-reduce程序

mapper代码如下:

#! /usr/bin/env python

import sys

# input comes from STDIN (standard input) 

for line in sys.stdin:

# remove leading and trailing whitespace 

line = line.strip()

# split the line into words 

words = line.split()

# increase counters 

for word in words:

# write the results to STDOUT (standard output); 

# what we output here will be the input for the 

# Reduce step, i.e. the input for reducer.py 

# tab-delimited; the trivial word count is 1 

print '%s\t%s' % (word, 1)

这博主坑啊。。\t写错,:没打,上面那个import还没用/

复制到/usr/local/hadoop上,并记得用

chmod +x mapper.py

赋予脚本权限

还有还有!用vim mapper.py打开,

我的天,怎么会有这么坑的转换

最后输入

echo "foo foo quux labs foo bar zoo zoo hying" | ./mapper.py

测试

结果

mapper成功啦

3.Reducer

代码如下

#!/usr/bin/env python

from operator import itemgetter

import sys

current_word = None

current_count = 0

word = None

for line in sys.stdin:

    line = line.strip()

    word, count = line.split('\t', 1)

    try:

        count = int(count)

    except ValueError: 

        continue

    if current_word == word:

        current_count += count

    else:

        if current_word:

            print ("%s\t%s") % (current_word, current_count)

            current_count = count

            current_word = word

if word == current_word: 

    print ("%s\t%s") % (current_word, current_count)

输入

echo "pib foo foo quux labs foo bar quux" | ./mapper.py | sort -k1,1 | ./reducer.py

结果

分词成功啦,撒花

相关文章

网友评论

      本文标题:python+hadoop学习笔记1

      本文链接:https://www.haomeiwen.com/subject/gfznzxtx.html