美文网首页大数据
Hadoop安装(单机模式和伪分布模式)和spark安装,运行w

Hadoop安装(单机模式和伪分布模式)和spark安装,运行w

作者: MountSong | 来源:发表于2019-04-02 16:08 被阅读0次

    Hadoop系列产品安装(单机模式和伪分布模式):

    使用Ubuntu系统

    1. 安装jdk,配置环境;(.bashrc中配置)

    2. 安装ssh;(单机模式不用)

    3. 下载hadoop安装包,解压;

    4. hadoop配置:

    4.1. 单机模式:若解压成功,则单机模式安装成功。

    查看Hadoop内置的例子。(黄色线圈的是单词计数的例子)

    在hadoop目录下创建input目录,在input输入需要测试文件,执行命如下命令在mapreduce上进行词频统计:

     ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount ./input/ ./output

    输出结果在output文件中。

    4.2. 伪分布模式:

    Hadoop安装路径为/home/hp/hadoop_env/hadoop-2.8.5。

    在安装目录下创建tmp临时文件目录

    修改Hadoop安装主目录下etc/hadoop目录下的配置文件core-site.xml:

    <configuration>

            <property>

                    <name>hadoop.tmp.dir</name>

                    <value>/home/hp/hadoop_env/hadoop-2.8.5/tmp</value>

                    <description>Abase for other temporary directories.</description>

            </property>

            <property>

                    <name>fs.defaultFS</name>

                    <value>hdfs://localhost:9000</value>

            </property>

    </configuration>

    修改配置文件hdfs-site.xml:

    <configuration>

            <property>

                    <name>dfs.replication</name>

                    <value>1</value>

            </property>

            <property>

                    <name>dfs.namenode.name.dir</name>

                    <value>/home/hp/hadoop_env/hadoop-2.8.5/tmp/dfs/name</value>

            </property>

            <property>

                    <name>dfs.datanode.data.dir</name>

                    <value>/home/hp/hadoop_env/hadoop-2.8.5/tmp/dfs/data</value>

            </property>

    </configuration>

    修改配置文件mapred-site.xml:

    <configuration>

    <property>

                    <name>mapreduce.framework.name</name>

                    <value>yarn</value>

            </property>

    </configuration>

    修改配置文件yarn-site.xml:

    <configuration>

            <property>

                    <name>yarn.resourcemanager.hostname</name>

                    <value>localhost</value>

            </property>       

    <property>

                    <name>yarn.nodemanager.aux-services</name>

                    <value>mapreduce_shuffle</value>

            </property>

    </configuration>

    手动添加JAVA_HOME,编辑安装主目录下etc/hadoop目录下的文件hadoop-env.sh:

    添加export JAVA_HOME=/home/hp/jdk1.8.0_191

    格式化NameNode:主目录中执行(只能执行1次)

    ./bin/hdfs namenode -format

    开启HDFS守护进程(NameNode和DataNode)守护进程:

     ./sbin/start-dfs.sh

    开启YARN守护进程:

    ./sbin/start-yarn.sh

    开启作业历史服务器:

     ./sbin/mr-jobhistory-daemon.sh start historyserver  

    验证:输入jps,出现如下结果则成功打开。

    浏览器中打开“127.0.0.1:50070”查看hdfs的文件系统;

    运行wordcount程序:

    1.在HDFS中创建用户目录:

    ./bin/hdfs dfs -mkdir -p /user/hadoop

    2.创建输入文件夹+传入文本+查看

    ./bin/hdfs dfs -mkdir /user/hadoop/input

    ./bin/hdfs dfs -put ./input/inputWords /user/hadoop/input

    ./bin/hdfs dfs -ls /user/hadoop/input

    3.执行wordcount程序

    ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /user/hadoop/input /user/hadoop/output

    4.查看结果

    ./bin/hdfs dfs -ls /user/hadoop/output

    ./bin/hdfs dfs -cat /user/hadoop/output/part-r-00000

    关闭Hadoop:

    ./sbin/mr-jobhistory-daemon.sh stop historyserver

    ./sbin/stop-yarn.sh

    ./sbin/stop-dfs.sh

    Spark安装

    1. [endif]下载scala和spark安装包:

    https://www.scala-lang.org/download/

    http://spark.apache.org/downloads.html

    注意:

    1)使用目前的spark2.4.0版本,不能使用scala2.12安装包,这里用2.11;

    2)由于安装过hadoop,spark安装包使用不继承hadoop版本,这里使用spark-2.4.0-bin-without-hadoop

    2. [endif]环境配置;

    2.1.编辑.bashrc,添加

    #set spark environment

    export SPARK_HOME=/home/hp/hadoop_env/spark-2.4.0-bin-without-hadoop

    export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

    #set scala environment

    export SCALA_HOME=/home/hp/hadoop_env/scala-2.11.12

    export PATH=$PATH:$SCALA_HOME/bin

    2.2.修改spark目录下的conf/spark-env.sh,添加

    export SCALA_HOME=/home/hp/hadoop_env/scala-2.11.12

    export SPARK_WORKER_MEMORY=2g

    export SPARK_MASTER_IP=hp-notebook(主机名)

    export MASTER=spark://hp-notebook:7077

    export JAVA_HOME=/home/hp/jdk1.8.0_191

    export HADOOP_HOME=/home/hp/hadoop_env/hadoop-2.8.5

    export SPARK_DIST_CLASSPATH=$CLASSPATH:$($HADOOP_HOME/bin/hadoop classpath)

    export HADOOP_CONF_DIR=/home/hp/hadoop_env/hadoop-2.8.5/etc/hadoop

    3. 运行sbin/start-all.sh,启动master和worker进程。浏览器访问”主机名:8080”

    4. 测试wordcount程序。

    4.1在hdfs中创建输入文件:

    ./bin/hdfs dfs -mkdir -p /spark

    vim spark.txt

    ./bin/hdfs dfs -put spark.txt /spark

    4.2启动spark-shell,执行wordcount:

    spark-shell

    scala> sc.textFile("/spark/spark.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).saveAsTextFile("/spark/out")

    ./bin/hdfs dfs -cat /spark/out/p*

    相关文章

      网友评论

        本文标题:Hadoop安装(单机模式和伪分布模式)和spark安装,运行w

        本文链接:https://www.haomeiwen.com/subject/ovpgbqtx.html