美文网首页
【学习】Mac上hadoop3.1.2+hive3.1.1+sc

【学习】Mac上hadoop3.1.2+hive3.1.1+sc

作者: X_Ran_0a11 | 来源:发表于2019-07-16 23:46 被阅读0次

    一、准备

    1、shell常用命令
    https://www.cnblogs.com/gsliuruigang/p/6487084.html
    2、mac安装homebrew
    https://blog.csdn.net/liaoningxinmin/article/details/85992752
    3、ssh免密登录配置
    https://blog.csdn.net/liaoningxinmin/article/details/85992752

    二、安装jdk

    一定要安装8版本以下的!(不然完全找不到解决方法!!)
    https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    三、安装hadoop,配置伪分布式环境

    https://blog.csdn.net/liaoningxinmin/article/details/85992752
    https://blog.csdn.net/vbirdbest/article/details/88189753

    1、brew 安装 hadoop

    可以在主目录下 用 brew list来查看brew安装了哪些文件。
    安装命令

    $ brew install hadoop
    
    2、配置Hadoop相关文件(此处伪分布式,还有单机模式和完全分布式模式)
    • a 环境变量配置 :

    找到java安装路径

    /usr/libexec/java_home
    

    vim ~/.bash_profile

    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home    #java的安装路径
    export HADOOP_HOME=/usr/local/Cellar/hadoop/3.1.2/libexec
    export HADOOP_ROOT_LOGGER=DEBUG,console
    export PATH=$PATH:${HADOOP_HOME}/bin
    
    #esc+:q!退出     esc+:wq保存退出
    

    source ~/.bash_profile立即执行

    • b.core-site.xml配置
    cd /usr/local/Cellar/hadoop/3.1.2/libexec/etc/hadoop
    open -e core-site.xml
    

    将core-site.xml中代码修改为:

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>
    
    • c.hadoop-env.sh配置

    找到java安装路径

    /usr/libexec/java_home
    

    把找到的java路径添加到hadoop-env.sh文件中

    cd /usr/local/Cellar/hadoop/3.1.2/libexec/etc/hadoop
    ls
    open -e hadoop-env.sh
    

    在打开的hadoop-env.sh文件中添加java路径

    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
    
    
    • d.hdfs-site.xml配置
      将hdfs-site.xml中代码修改为:
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>  
    </configuration>
    
    • e.mapred-site.xml配置
      将mapred-site.xml中代码修改为:
    <configuration>
        <property>
             <name>mapreduce.framework.name</name>
             <value>yarn</value>
         </property>
    </configuration>
    

    如果文件后缀是 .xml.example,改为 .xml

    • f.yarn-site.xml配置
      将yarn-site.xml中代码修改为:
    <configuration>
        <property> 
            <name>yarn.resourcemanager.hostname</name>
                      <value>ranmodeiMac.local</value>
        </property>
        <property> 
            <name>yarn.nodemanager.aux-services</name> 
            <value>mapreduce_shuffle</value> 
        </property>
    
    </configuration>
    

    ranmodeiMac.local是自己主机的名字,可以用 $hostname查看

    3、运行hadoop

    进入到 /usr/local/Cellar/hadoop/3.1.2/libexec/bin 路径中,对文件系统进行格式化:

    cd /usr/local/Cellar/hadoop/3.1.2/libexec/bin
    hdfs namenode -format
    

    进入 /usr/local/Cellar/hadoop/3.1.2/libexec/sbin 路径,启动NameNode和datanode:

    cd /usr/local/Cellar/hadoop/3.1.2/sbin/
    ./start-all.sh
    

    这时候NameNode和DataNode都已经启动成功了,我们可以在网页中看到Overview页面了:
    NameNode - http://localhost:9870
    在浏览器中查看All Applications 界面:
    ResourceManager - http://localhost:8088

    jps查看进行,发现缺少datanode

    (base) ranmodeiMac:~ ranmo$ jps
    55057 NameNode
    55665 Jps
    2548 
    53429 SecondaryNameNode
    55579 NodeManager
    55484 ResourceManager
    

    解决办法:https://www.cnblogs.com/mtime2004/p/10008325.html
    (最后写错了,是复制datanode的值,到version里面去)

    查看日志:

    cd /usr/local/Cellar/hadoop/3.1.2/libexec/logs
    open -e hadoop-ranmo-datanode-ranmodeiMac.local.log
    

    在里面找到:


    image.png

    namenode clusterID和datanode clusterID不同,复制datanode clusterID

    cd /tmp/hadoop-ranmo/dfs/name/current
    open -e version
    

    修改clusterID为复制的datanode clusterID

    #Sun Jul 14 03:53:07 CST 2019
    namespaceID=333721495
    clusterID=CID-c69af5e0-abad-412f-bf0e-33711cfe47f1
    cTime=1563047587165
    storageType=NAME_NODE
    blockpoolID=BP-471837932-192.168.1.4-1563047587165
    layoutVersion=-64
    

    之后./stop-all.sh关闭程序,在./start-all.sh,这下显示正常:

    (base) ranmodeiMac:sbin ranmo$ jps
    59698 ResourceManager
    59507 SecondaryNameNode
    59795 NodeManager
    2548 
    59270 NameNode
    59863 Jps
    59373 DataNode
    

    四、路径总结

    JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home #java的安装路径
    HADOOP_HOME=/usr/local/Cellar/hadoop/3.1.2/libexec

    配置文件:
    cd /usr/local/Cellar/hadoop/3.1.2/libexec/etc/hadoop
    格式化文件:
    cd /usr/local/Cellar/hadoop/3.1.2/libexec/bin
    运行/关闭文件:
    /usr/local/Cellar/hadoop/3.1.2/sbin/
    临时文件:
    /usr/local/Cellar/hadoop/3.1.2/libexec/tmp
    日志文件:
    /usr/local/Cellar/hadoop/3.1.2/libexec/logs

    五、常用命令

    hadoop fs -ls 显示当前目录结构,-ls -R 递归显示目录结构
    hadoop fs -ls /显示目录下文件及文件夹
    hadoop fs -mkdir 创建目录
    hadoop fs -rm -r -skipTrash /path_to_file/file_name 删除文件
    hadoop fs -rm -r -skipTrash /folder_name 删除文件夹
    hadoop fs -put [localsrc] [dst] 从本地加载文件到HDFS
    hadoop fs -get [dst] [localsrc] 从HDFS导出文件到本地
    hadoop fs - copyFromLocal [localsrc] [dst] 从本地加载文件到HDFS,与put一致
    hadoop fs -copyToLocal [dst] [localsrc] 从HDFS导出文件到本地,与get一致
    hadoop fs -test -e 检测目录和文件是否存在,存在返回值$?为0,不存在返回1
    hadoop fs -text 查看文件内容
    hadoop fs -du 统计目录下各文件大小,单位字节。-du -s 汇总目录下文件大小,-du -h 显示单位
    hadoop fs -tail 显示文件末尾
    hadoop fs -cp [src] [dst] 从源目录复制文件到目标目录
    hadoop fs -mv [src] [dst] 从源目录移动文件到目标目录

    六、简单测试

    创建input文件夹:

    hadoop fs -mkdir /input
    

    显示:

    [ranmodeiMac:~ ranmo$ hadoop fs -ls /
    2019-07-14 19:21:30,639 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 1 items
    drwxr-xr-x   - ranmo supergroup          0 2019-07-14 18:15 /input
    
    

    在/usr/local/Cellar/hadoop/3.1.2/目录下创建test文件夹

    cd /usr/local/Cellar/hadoop/3.1.2/
    mkdir test
    

    在test文件夹中创建dream.txt做测试文本

    cd /usr/local/Cellar/hadoop/3.1.2/test
    touch dream.txt
    open -e dream.txt
    hadoop fs -put dream.txt /input
    #查看input里是否存在
    hadoop fs -ls /input
    #显示确实存在,用cat命令查看内容
    hadoop fs -cat /input/dream.txt
    #2019-07-14 21:19:43,064 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    #Hello world
    
    

    如果不能显示,http://localhost:9870
    界面显示文件可能已经损坏,所以需要重新上传到hdfs。
    现在删除input文件夹中的所有文件:

    hadoop fs -rmr /input/*
    

    七、安装hive

    1、brew安装

    brew install hive
    

    2、设置环境变量
    open -e ~/.bash_profile 添加:

    export HIVE_HOME=/usr/local/Cellar/hive/3.1.1/libexec
    export PATH=$PATH:${HIVE_HOME}/bin
    

    source ~/.bash_profile 生效
    3、创建配置文件
    https://www.cnblogs.com/micrari/p/7067968.html
    https://blog.csdn.net/u013185349/article/details/86691634

    cd /usr/local/Cellar/hive/3.1.1/libexec/conf
    cp hive-default.xml.template hive-site.xml  #把别的文件复制过来再进行property,主要是想要别的文件的头部命令
    
    

    hive-site.xml配置的是mysql的位置、用户名信息等,
    最终的配置文件是:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>              
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>root</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>xxxxx</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionURL</name>mysql
            <value>jdbc:mysql://localhost:3306/hive?useSSL=false</value>
        </property>
        <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
        </property>
    </configuration>
    

    里面root和xxxxx是自己mysql的用户名以及密码
    4、关联mysql和hive
    给Hive的lib目录下拷贝一个mysql-connector

    curl -L 'http://www.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.11.tar.gz/from/http://mysql.he.net/' | tar xz
    cp mysql-connector-java-8.0.11/mysql-connector-java-8.0.11-bin.jar /usr/local/Cellar/hive/3.1.1/libexec/lib/
    
    

    可以进一步前往lib文件夹查看:

    cd /usr/local/Cellar/hive/3.1.1/libexec/lib
    

    发现是把mysql-connector-java-8.0.11文件夹放在了lib里面,但是我只需要文件夹里面的mysql-connector-java-8.0.11-bin.jar,然后我自己前往文件夹路径把jar从里面拿出来放在lib里面了。不然之后的初始化连接过程中,会读不到这个包。

    5、数据初始化(把mysql的数据初始化到hive上)

    /usr/local/Cellar/hive/3.1.1/libexec/bin
    schematool -initSchema -dbType mysql
    

    过程显示Initialization script hive-schema-3.1.0.mysql.sql,是因为自己的mysql是3.1.0版本,然后会对里面hive的tables进行初始化。(但是目前为空)

    中途可能会报错,原因可能是:
    a. root和password没有写对;
    b.MySQL Connector的jar包和数据库不匹配,jar包版本太低
    https://blog.csdn.net/qq_21870555/article/details/80711187
    c.Failed to load driver,因为lib里面没有connector的jar包(不能是文件夹)

    6、运行hive

    cd $HIVE_HOME 
    cd bin
    hive
    

    附:mac终端操作mysql
    连接(打开)mysql,输入密码后可正常使用(quit 退出mysql)
    https://www.cnblogs.com/jamescr7/p/7842784.html

     /usr/local/mysql/bin/mysql -u root -p
    

    7、简单测试
    a.在桌面创建测试文件student.txt
    内容如下:

    1,zhangsan,12
    2,lisi,13
    3,wangwu,14
    

    b.上传至hadoop

    hadoop fs -put student.txt /input
    hadoop fs -ls /input
    

    显示文件存在,上传成功
    c.运行hive,将其转换为表

    hive
    create table student (id int,username string,age int) row format delimited fields terminated by ',';
    load data inpath '/input/student.txt' into table student;
    select * from student;
    

    显示完全正常!

    1   zhangsan    12
    2   lisi    13
    3   wangwu  14
    
    desc formatted student;
    

    显示表的详情信息。

    OK
    # col_name              data_type               comment             
    id                      int                                         
    username                string                                      
    age                     int                                         
             
    # Detailed Table Information         
    Database:               default                  
    OwnerType:              USER                     
    Owner:                  ranmo                    
    CreateTime:             Mon Jul 15 02:43:11 CST 2019     
    LastAccessTime:         UNKNOWN                  
    Retention:              0                        
    Location:               hdfs://localhost:9000/user/hive/warehouse/student    
    Table Type:             MANAGED_TABLE            
    Table Parameters:        
        bucketing_version       2                   
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               35                  
        transient_lastDdlTime   1563129886          
             
    # Storage Information        
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:         
        field.delim             ,                   
        serialization.format    ,                   
    Time taken: 0.22 seconds, Fetched: 33 row(s)
    

    用hadoop命令查看信息中心的位置,是否保存的是hive表

    hadoop fs -ls /user/hive/warehouse/student
    hadoop fs -cat hadoop fs -ls /user/hive/warehouse/student/student.txt
    

    显示正常!

    附:用hive创建外部表

    create external table student_w(id int,username string,age int) row format delimited fields terminated by ',';
    

    创建外部表和内部表的区别,在于drop table的时候,外部表不会被删掉,只有用终端命令进行删除;

    八、安装scoop

    https://blog.csdn.net/maxmao1024/article/details/79478794
    https://blog.csdn.net/scgh_fx/article/details/73522372

    1、brew安装sqoop

    brew install sqoop
    

    2、配置环境变量
    open -e ~/.bash_profile

    export SQOOP_HOME=/usr/local/Cellar/sqoop/1.4.6_1/libexec
    export PATH=$PATH:${SQOOP_HOME}/bin
    

    source ~/.bash_profile生效
    3、创建配置文件

    cd $SQOOP_HOME
    cd conf
    open -e sqoop-env.sh
    

    在里面配置hadoop和hive路径,其他路径不用配置,因为还没装。。

    export HADOOP_HOME="/usr/local/Cellar/hadoop/3.1.2/libexec"
    export HIVE_HOME="/usr/local/Cellar/hive/3.1.1/libexec"
    

    4、关联mysql和hive
    sqoop本质上是操作mysql和hive的链接,lib目录下也要拷贝一个mysql-connector,直接从hive lib里面把jar拷贝过来就行了

    cp /usr/local/Cellar/hive/3.1.1/libexec/lib/mysql-connector-java-8.0.11.jar lib/
    
    

    可以进一步前往lib文件夹查看:
    cd /usr/local/Cellar/hive/3.1.1/libexec/lib

    5、用sqoop实现mysql数据到hive的导入
    sqoop help命令查看帮助,显示:

    Available commands:
      codegen            Generate code to interact with database records
      create-hive-table  Import a table definition into Hive
      eval               Evaluate a SQL statement and display the results
      export             Export an HDFS directory to a database table
      help               List available commands
      import             Import a table from a database to HDFS
      import-all-tables  Import tables from a database to HDFS
      import-mainframe   Import datasets from a mainframe server to HDFS
      job                Work with saved jobs
      list-databases     List available databases on a server
      list-tables        List available tables in a database
      merge              Merge results of incremental imports
      metastore          Run a standalone Sqoop metastore
      version            Display version information
    

    用import实现导入:

    sqoop import --connect jdbc:mysql://localhost:3306/hive --username root --password lingying --table food --target-dir /input/food
    

    第一次报错,显示:

    2019-07-16 01:55:27,454 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: The connection property 'zeroDateTimeBehavior' acceptable values are: 'CONVERT_TO_NULL', 'EXCEPTION' or 'ROUND'. The value 'convertToNull' is not acceptable.
    
    

    显示是connector jar包和mysql的版本兼容问题,https://www.2cto.com/net/201806/757728.html
    调整语句为:

    sqoop import --connect jdbc:mysql://localhost:3306/hive?zeroDateTimeBehavior=EXCEPTION --username root --password lingying --table hive_test --target-dir /input/food
    
    

    第二次报错,显示:

    [2019-07-16 02:23:52.743]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
    Last 4096 bytes of prelaunch.err :
    Last 4096 bytes of stderr :
    错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
    
    
    [2019-07-16 02:23:52.743]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
    Last 4096 bytes of prelaunch.err :
    Last 4096 bytes of stderr :
    错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
    
    
    For more detailed output, check the application tracking page: http://ranmodeiMac.local:8088/cluster/app/application_1563210689881_0009 Then click on links to logs of each attempt.
    . Failing the application.
    2019-07-16 02:23:53,014 INFO mapreduce.Job: Counters: 0
    2019-07-16 02:23:53,021 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
    2019-07-16 02:23:53,023 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 5.8023 seconds (0 bytes/sec)
    2019-07-16 02:23:53,028 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
    2019-07-16 02:23:53,028 INFO mapreduce.ImportJobBase: Retrieved 0 records.
    2019-07-16 02:23:53,029 ERROR tool.ImportTool: Error during import: Import job failed!
    

    参考
    https://blog.csdn.net/hongxiao2016/article/details/88919176,重新设置yarn-site.xml解决。
    重新调试:

    [https://blog.csdn.net/hongxiao2016/article/details/88919176](https://blog.csdn.net/hongxiao2016/article/details/88919176)
    
    

    显示成功
    查看hdfs上的数据

    hadoop fs -ls /input/food
    

    显示:

    Found 4 items
    -rw-r--r--   1 ranmo supergroup          0 2019-07-16 02:28 /input/food/_SUCCESS
    -rw-r--r--   1 ranmo supergroup          8 2019-07-16 02:28 /input/food/part-m-00000
    -rw-r--r--   1 ranmo supergroup          9 2019-07-16 02:28 /input/food/part-m-00001
    -rw-r--r--   1 ranmo supergroup          9 2019-07-16 02:28 /input/food/part-m-00002
    
    
    hadoop fs -cat /input/food/part-m-00000
    

    显示:

    apple,1
    

    所以其实是把food里面有m行,就分别执行m个mapreduce,最终汇总成m个文件。

    6、只用一个mapreduce执行程序

    sqoop import --connect jdbc:mysql://localhost:3306/hive?zeroDateTimeBehavior=EXCEPTION --username root --password lingying --table hive_test --target-dir /input/food1 -m 1
    
    

    查看文件:

    hadoop fs -ls /input/food1
    

    显示:

    Found 2 items
    -rw-r--r--   1 ranmo supergroup          0 2019-07-16 02:34 /input/food1/_SUCCESS
    -rw-r--r--   1 ranmo supergroup         26 2019-07-16 02:34 /input/food1/part-m-00000
    

    7、用sqoop直接导入hive允许操作
    不用sqoop导入操作流程有三步:
    a. mysql数据导入hdfs
    b. hive创建表
    c. hdfs数据导入hive
    用scoop可以直接一步实现上述三步:

    sqoop import --connect jdbc:mysql://localhost:3306/hive?zeroDateTimeBehavior=EXCEPTION --username root --password lingying --table hive_test --hive-import --hive-table food -m 1 --delete-target-dir
    

    delete在中间的作用,是因为hdfs上已经有这个文件了,所以重复必须删除;
    显示报错:

    2019-07-16 23:33:42,872 INFO hive.HiveImport: FAILED: ParseException line 1:211 missing EOF at ';' near 'TEXTFILE'
    2019-07-16 23:33:43,070 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 64
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:389)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:339)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:240)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
    
    

    发现是已经上传到了hadoop,但是在导入hive过程中出错的。用hadoop fs -ls命令,会显示:

    Found 1 items
    drwxr-xr-x   - ranmo supergroup          0 2019-07-16 04:02 hive_test
    

    理论上,hadoop fs -ls /才会显示文件夹的,上面的分析表明:

    • hadoop fs -ls 会显示存在hadoop fs -ls /user/user/ranmo里的文件,而这个文件夹相当于是临时目录,所以如果hadoop fs -ls指令显示有文件存在,则表示有的文件没有正确上传;
    • 如果不指定路径,正确上传到hive的文件应该都是在hadoop fs -ls /user/hive/warehouse里。

    本来以为错误是Hive exited with status 64的问题,后来找了一圈方法也不起作用,发现是FAILED: ParseException line 1:211 missing EOF at ';' near 'TEXTFILE'的问题。这条指令明明是执行hive语句才会出现的指令,为什么用sqoop传数据也会出现?发现是表中的列名有一个是“na’me”,列名不能使用’符号,不然hive读取hadoop数据是以默认分隔符进行切割的(反正大概就是这个意思),调整列名之后正确上传。

    附:sqoop常用命令:
    https://www.cnblogs.com/cenyuhai/p/3306037.html

    九、总结

    对小白来说,搭建这一套实在是太不容易了(hadoop圈还有那么多没搭建完天呐),中间的坑实在太多了。。找资料解决bug的过程简直让我的精神得到了洗礼。。。

    相关文章

      网友评论

          本文标题:【学习】Mac上hadoop3.1.2+hive3.1.1+sc

          本文链接:https://www.haomeiwen.com/subject/rlqikctx.html