Spark Streaming 初见

Spark Streaming 初见

作者: 东皇Amrzs | 来源:发表于2016-07-20 12:01 被阅读122次


    [root@test spark]# uname -a
    Linux test 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
    [root@test spark]# cat /etc/issue
    CentOS release 6.5 (Final)
    [root@test ~]# ls
    jdk-7u79-linux-x64.tar.gz  spark-1.6.0-bin-hadoop2.6.tgz

    这里我假设你已经安装并且配置好了运行spark的环境,本文只记录官网教程给出的Spark Streaming 的WordCount程序的一个python版本。


    cd  /usr/local/spark



    Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
    Usage: network_wordcount.py <hostname> <port>
    <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.

    To run this on your local machine, you need to first run a Netcat server
    $ nc -lk 9999
    and then run the example
    $ bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost 9999


    1. 下载netcat安装包
    wget http://sourceforge.net/projects/netcat/files/netcat/0.7.1/netcat-0.7.1-1.i386.rpm
    1. 执行安装: rpm -ihv netcat-0.7.1-1.i386.rpm
    rpm -ihv netcat-0.7.1-1.i386.rpm  
    warning: netcat-0.7.1-1.i386.rpm: Header V3 DSA/SHA1 Signature, key ID b2d79fc1: NOKEY  
    error: Failed dependencies:  
            libc.so.6 is needed by netcat-0.7.1-1.i386  
            libc.so.6(GLIBC_2.0) is needed by netcat-0.7.1-1.i386  
            libc.so.6(GLIBC_2.1) is needed by netcat-0.7.1-1.i386  
            libc.so.6(GLIBC_2.3) is needed by netcat-0.7.1-1.i386  
    1. 解决依赖包问题
    [root@test streaming]# yum list glibc*
    Loaded plugins: fastestmirror
    Loading mirror speeds from cached hostfile
     * base: mirrors.aliyun.com
     * epel: ftp.cuhk.edu.hk
     * extras: mirrors.aliyun.com
     * rpmforge: ftp.neowiz.com
     * updates: mirrors.aliyun.com
    Installed Packages
    glibc.i686                                          2.12-1.192.el6                                @base
    glibc.x86_64                                        2.12-1.192.el6                                @base
    glibc-common.x86_64                                 2.12-1.192.el6                                @base
    glibc-devel.x86_64                                  2.12-1.192.el6                                @base
    glibc-headers.x86_64                                2.12-1.192.el6                                @base
    glibc-static.x86_64                                 2.12-1.192.el6                                @base
    glibc-utils.x86_64                                  2.12-1.192.el6                                @base
    Available Packages
    glibc-devel.i686                                    2.12-1.192.el6                                base 
    glibc-static.i686                                   2.12-1.192.el6                                base 
    1. 安装依赖包:
    yum install glibc.i686
    1. 再次执行安装:
    rpm -ihv netcat-0.7.1-1.i386.rpm
    warning: netcat-0.7.1-1.i386.rpm: Header V3 DSA/SHA1 Signature, key ID b2d79fc1: NOKEY  
    Preparing...                ########################################### [100%]  
       1:netcat                 ########################################### [100%]  


    1. 执行指令nc -lk 9999
    nc: invalid option -- 'k'
    Try `nc --help' for more information.


    S O L V E D The consultant installed netcat so I uninstalled netcat and then nc was not working. So I also removed and reinstalled nc again. Now -k option is working now Thanks for your helps – Murat Apr 1 '15 at 10:03

    1. 解决netcat问题
    [root@test ~]# yum remove netcat
    Loaded plugins: fastestmirror
    Setting up Remove Process
    Resolving Dependencies
    --> Running transaction check
    ---> Package netcat.i386 0:0.7.1-1 will be erased
    --> Finished Dependency Resolution


    [root@test ~]# yum install nc
    Loaded plugins: fastestmirror
    Loading mirror speeds from cached hostfile
     * base: mirrors.aliyun.com
     * epel: mirror.premi.st
     * extras: mirrors.aliyun.com
     * rpmforge: ftp.neowiz.com
     * updates: mirrors.aliyun.com
    Setting up Install Process
    Resolving Dependencies
    --> Running transaction check
    ---> Package nc.x86_64 0:1.84-24.el6 will be installed
    --> Finished Dependency Resolution
    1. 执行程序
    [root@test spark]# nc -lk 9999


    [root@test spark]# bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost 9999
    1. 测试输出

    在nc 那端的窗口输入:

    hello nihao my name is xzp hello world!


    Time: 2016-07-20 11:56:41
    (u'my', 1)
    (u'is', 1)
    (u'nihao', 1)
    (u'world!', 1)
    (u'xzp', 1)
    (u'name', 1)
    (u'hello', 2)
    Time: 2016-07-20 11:56:42
    Time: 2016-07-20 11:56:43




          本文标题:Spark Streaming 初见
