美文网首页
Hadoop建立单机模式和运行它的自带的例子

Hadoop建立单机模式和运行它的自带的例子

作者: 波洛的汽车电子世界 | 来源:发表于2019-08-25 03:35 被阅读0次

    三种安装模式:单机模式,伪分布式模式,分布式模式
    Standalone模式是默认的,不需要修改配置文件(只改了Java_home),也不需要启动进程。
    Pseudo-Distributed也是单机就能跑的,但必须改配置文件(改了Java_home,设置了pid在的文件夹,设置了core-site,hdfs-site),也要启动对应的进程。
    Distributed是完全模式,需要多台主机,每台主机都需要合理配置并启动进程。

    安装

    1. ssh localhost
      如果不行就生成密钥
    $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    
    1. brew install hadoop
    2. 看hadoop安装的版本,终端输入 hadoop version,可知安装在/usr/local/Cellar/hadoop/3.1.2下。
    3. 首先设置/usr/local/Cellar/hadoop/3.1.2/libexec/etc/hadoop/hadoop-env.sh的JAVA_HOME。
      查看Java的安装路径:终端执行 /usr/libexec/java_home -V
      得到/Library/Java/JavaVirtualMachines/openjdk-12.0.2.jdk/Contents/Home
      将这个写入JAVA_HOME,将#去掉,得:
    export JAVA_HOME=/Library/Java/JavaVirtualMachines/openjdk-12.0.2.jdk/Contents/Home
    

    此时注意,如果是单机模式,只需要修改这个配置。千万不要修改完其他的配置后再去运行单机模式,不然会出现 connection refused的错误!
    接下来就运行单机模式的案例了。

    单机模式

    来源:https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/SingleCluster.html
    以上提供了grep案例。假设我们想在很多文件中找出以'dfs'开头的单词和个数。那么就需要一个放着文件的文件夹,这里首先新建了一个文件夹input,然后复制了hadoop的配置文件中所有xml格式的文件到input文件夹里。然后执行map-reduce计算,计算结果放在output里面,然后输出output。

    运行

    Huizhi$ etc Huizhi$ cd /usr/local/Cellar/hadoop/3.1.2/libexec
    libexec Huizhi$ mkdir input #新建文件夹input
    libexec Huizhi$ cp etc/hadoop/*.xml input #复制文件
    libexec Huizhi$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep input output 'dfs[a-z.]+' #map-reduce计算
    libexec Huizhi$ cat output/* #输出output
    

    这里,我为了试验输出的结果,加了两个以dfs开头的单词。最后的结果是

    $ cat output/*
    1   dfstwo
    1   dfsone
    1   dfsadmin
    

    'dfs[a-z.]+'的

    出现了以下这个错误,是因为改动了其他配置,要么就修改恢复默认配置,要么像我一样,卸载重装:

    2019-08-23 16:14:16,220 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    2019-08-23 16:14:17,048 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:9000
    java.net.ConnectException: Call From HuizhiXu.local/172.16.233.171 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    

    运行完这个例子之后我想看下系统自带的这个share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar都有哪些功能,运行

    libexec Huizhi$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar 
    

    会出现结果。
    要想知道这些功能怎么运用,可以

    hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar  功能名
    例如:hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar aggregatewordcount
    

    就会出现

    usage: inputDirs outDir [numOfReducer [textinputformat|seq [specfile [jobName]]]]
    
    1. aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
      这个语句用来计算word在文本中出现的次数。
      注意:当输入为普通文本时会出现以下错误:
    Caused by: java.io.IOException: file:/usr/local/Cellar/hadoop/3.1.2/libexec/input/capacity-scheduler.xml not a SequenceFile
    

    因为该语句只能对二进制SequenceFile进行解析,需要hadoop 的api把普通文本转换成SequenceFile。

    1. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
      这个语句用来绘制字数出现的直方图。
    2. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

    dbcount: An example job that count the pageview counts from a database.
    distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
    grep: A map/reduce program that counts the matches of a regex in the input.
    对输入文件按正则表达式查找,把结果写到输出文件上
    join: A job that effects a join over sorted, equally partitioned datasets
    multifilewc: A job that counts words from several files.
    pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
    pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
    格式 :
    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar pi map的个数 样本的个数
    例子:
    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar pi 10 50
    结果:

    Job Finished in 2.973 seconds
    Estimated value of Pi is 3.16000000000000000000
    

    randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
    randomwriter: A map/reduce program that writes 10GB of random data per node.
    secondarysort: An example defining a secondary sort to the reduce.
    sort: A map/reduce program that sorts the data written by the random writer.
    sudoku: A sudoku solver.
    teragen: Generate data for the terasort
    terasort: Run the terasort
    teravalidate: Checking results of terasort
    wordcount: A map/reduce program that counts the words in the input files.
    这就是最著名的数单词字数的功能。
    wordmean: A map/reduce program that counts the average length of the words in the input files.
    wordmedian: A map/reduce program that counts the median length of the words in the input files.
    wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

    Linux命令知识点:

    1. cd: cd命令用于切换当前工作目录至 dirName(目录参数)。
      格式:cd [dirName]
      "~" 也表示为 home 目录 的意思,"." 则是表示目前所在的目录,".." 则表示目前目录位置的上一层目录。
      例子:cd ~ cd ../..
      来源

    2. grep是一个最初用于Unix操作系统的命令行工具。在给出文件列表或标准输入后,grep会对匹配一个或多个正则表达式的文本进行搜索,并只输出匹配的行或文本。(来源:维基百科)

    3. mkdir: Linux mkdir命令用于建立名称为 dirName的文件夹。
      格式:mkdir dirName

    4. cp: 主要用于复制文件或目录。
      格式:cp [options] source dest
      options 常用的是 -r
      例子:$ cp –r test/ newtest 将test/下所有的文件和文件夹都复制到newtest下
      scp除了复制文件要输密码外,其余和cp是一样的

    参考资料:
    AggregateWordCount源代码注释
    MapReduce的输入格式
    https://docs.microsoft.com/bs-latn-ba/azure/hdinsight/hadoop/apache-hadoop-run-samples-linux?view=netcore-2.0

    相关文章

      网友评论

          本文标题:Hadoop建立单机模式和运行它的自带的例子

          本文链接:https://www.haomeiwen.com/subject/ptkxectx.html