美文网首页
使用sqoop从mysql向hive中导数据

使用sqoop从mysql向hive中导数据

作者: 大胖圆儿小姐 | 来源:发表于2021-12-07 18:18 被阅读0次

    一、情况概述

    mysql中已创建好名为sitelight的数据库并导入了数据,表名为t_site_formal,数据条数为168992。hive中执行建库语句,名称为sitelight,且未指定分隔符,为默认的'\001'。执行导入数据的命令时,需要启动hadoop集群、启动hive服务端、hive的metastore服务,sqoop无需启动,可直接用。

    二、服务启动

    1. 启动hdfs服务
    [hadoop@hadoop01 sbin]$ pwd
    /home/hadoop/hadoop-2.10.1/sbin
    [hadoop@hadoop01 sbin]$ ./start-dfs.sh
    
    1. 启动hive服务端,无所谓在哪个目录上
    [hadoop@hadoop01 sbin]$ nohup hive --service hiveserver2 &
    
    1. 启动metastore服务,无所谓在哪个目录上
    [hadoop@hadoop01 sbin]$ nohup hive --service metastore &
    

    三、执行脚本

    sqoop import  \
    --connect jdbc:mysql://172.16.100.19:3306/sitelight  \
    --username queryuser  \
    --password abcde12345  \
    --table t_site_formal  \
    --hive-import  \
    --hive-overwrite  \
    --create-hive-table  \
    --delete-target-dir \
    --hive-database  sitelight \
    --hive-table t_site_formal \
    -m 1
    

    四、执行导入脚本遇见的错误

    1. 报错信息1
    [hadoop@hadoop01 sbin]$ sqoop import   \
    > --connect jdbc:mysql://172.16.100.19:3306/sitelight   \
    > --username queryuser  \
    > --password abcde12345   \
    > --table t_site_formal   \
    > --hive-import \
    > -m 1
    Warning: /home/hadoop/sqoop-1.4.7/../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /home/hadoop/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /home/hadoop/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /home/hadoop/sqoop-1.4.7/../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    21/12/07 15:32:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
    21/12/07 15:32:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    21/12/07 15:32:53 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
    21/12/07 15:32:53 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
    21/12/07 15:32:53 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    21/12/07 15:32:53 INFO tool.CodeGenTool: Beginning code generation
    21/12/07 15:32:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
    21/12/07 15:32:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
    21/12/07 15:32:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.10.1
    Note: /tmp/sqoop-hadoop/compile/672d74685e64a0fe1025ce9b2b875e46/t_site_formal.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    21/12/07 15:32:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/672d74685e64a0fe1025ce9b2b875e46/t_site_formal.jar
    21/12/07 15:32:54 WARN manager.MySQLManager: It looks like you are importing from mysql.
    21/12/07 15:32:54 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
    21/12/07 15:32:54 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
    21/12/07 15:32:54 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
    21/12/07 15:32:54 INFO mapreduce.ImportJobBase: Beginning import of t_site_formal
    21/12/07 15:32:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    21/12/07 15:32:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    21/12/07 15:32:54 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/172.16.100.26:8032
    21/12/07 15:32:56 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    21/12/07 15:32:57 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    21/12/07 15:32:58 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    
    • 报错原因:INFO client.RMProxy: Connecting to ResourceManager at hadoop01/172.16.100.26:8032,resourcemanager属于yarn的主节点,是由于hadoop服务配置时指定了yarn计算框架。
    • 解决方法:一、如果只想使用hadoop的hdfs文件系统存储数据功能,可不指定yarn,yarn配置的文件是mapred-site.xml,即/home/hadoop/hadoop-2.10.1/etc/hadoop/mapred-site.xml,将其修改成别名,并重启hadoop集群即可;二、直接启动yarn服务,在sbin目录下直接启动,即/home/hadoop/hadoop-2.10.1/sbin/start-yarn.sh
    1. 报错信息2
    21/12/07 15:39:42 INFO hive.HiveImport: Loading uploaded data into Hive
    21/12/07 15:39:42 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
    21/12/07 15:39:42 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
        at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
        ... 12 more
    
    • 报错原因:Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.很明显,hadoop找不到hive的配置文件。
    • 解决方法:追加当前用户下hadoop的环境变量
    [hadoop@hadoop01 sbin]$  vim ~/.bashrc
    
    export HADOOP_CLASSPATH=$HIVE_HOME/lib/*
    
    1. 报错信息3
    21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Transferred 40.7139 MB in 11.4171 seconds (3.566 MB/sec)
    21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Retrieved 168237 records.
    21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table t_site_formal
    21/12/07 17:25:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
    21/12/07 17:25:05 WARN hive.TableDefWriter: Column create_date had to be cast to a less precise type in Hive
    21/12/07 17:25:05 WARN hive.TableDefWriter: Column audit_date had to be cast to a less precise type in Hive
    21/12/07 17:25:05 INFO hive.HiveImport: Loading uploaded data into Hive
    21/12/07 17:25:05 INFO conf.HiveConf: Found configuration file file:/home/hadoop/apache-hive-2.3.9-bin/conf/hive-site.xml
    2021-12-07 17:25:06,664 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
        at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
        at java.lang.SecurityManager.checkPermission(SecurityManager.java:585)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanTrustPermission(DefaultMBeanServerInterceptor.java:1848)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:322)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
        at org.apache.logging.log4j.core.jmx.Server.register(Server.java:380)
        at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:165)
        at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:138)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:507)
        at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:249)
        at org.apache.logging.log4j.core.async.AsyncLoggerContext.start(AsyncLoggerContext.java:86)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:239)
        at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:157)
        at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:130)
        at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:100)
        at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:187)
        at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:154)
        at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:90)
        at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:82)
        at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:65)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:702)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
    
    • 报错原因:ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register"),明确提示是java的安全策略控制不允许,即jdk的安全配置需修改。
    • 解决方法:找到jdk安装目录,并修改安全策略文件。我的jdk是安装在全局变量中,所以使用root用户修改
    [hadoop@hadoop01 sbin]$  echo $JAVA_HOME
    /usr/local/java
    [hadoop@hadoop01 sbin]$ exit
    exit
    [root@hadoop01 hadoop-2.10.1]$ cd /usr/local/java/jre/lib/security/
    [root@hadoop01 security]$ vim java.policy 
    
    //在grant的{}中添加此配置
    permission javax.management.MBeanTrustPermission "register";
    
    

    五、成功运行

    LOAD DATA INPATH 'hdfs://hadoop01:8082/user/hadoop/t_site_formal' OVERWRITE INTO TABLE `sitelight`.`t_site_formal`
    21/12/07 18:14:47 INFO ql.Driver: Semantic Analysis Completed
    21/12/07 18:14:47 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    21/12/07 18:14:47 INFO ql.Driver: Completed compiling command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303); Time taken: 0.115 seconds
    21/12/07 18:14:47 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
    21/12/07 18:14:47 INFO ql.Driver: Executing command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303): 
    LOAD DATA INPATH 'hdfs://hadoop01:8082/user/hadoop/t_site_formal' OVERWRITE INTO TABLE `sitelight`.`t_site_formal`
    21/12/07 18:14:47 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
    21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
    Loading data to table sitelight.t_site_formal
    21/12/07 18:14:47 INFO exec.Task: Loading data to table sitelight.t_site_formal from hdfs://hadoop01:8082/user/hadoop/t_site_formal
    21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
    21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
    21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
    21/12/07 18:14:47 INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://hadoop01:8082/usr/local/warehouse/sitelight.db/t_site_formal
    21/12/07 18:14:47 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
    21/12/07 18:14:47 INFO exec.StatsTask: Executing stats task
    21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
    21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
    21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
    21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
    21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
    21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
    21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
    21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
    21/12/07 18:14:47 INFO exec.StatsTask: Table sitelight.t_site_formal stats: [numFiles=1, numRows=0, totalSize=42691574, rawDataSize=0]
    21/12/07 18:14:47 INFO ql.Driver: Completed executing command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303); Time taken: 0.291 seconds
    OK
    21/12/07 18:14:47 INFO ql.Driver: OK
    Time taken: 0.406 seconds
    21/12/07 18:14:47 INFO CliDriver: Time taken: 0.406 seconds
    21/12/07 18:14:47 INFO conf.HiveConf: Using the default value passed in for log id: bb608b6a-b9af-4129-bae0-739683648882
    21/12/07 18:14:47 INFO session.SessionState: Resetting thread name to  main
    21/12/07 18:14:47 INFO conf.HiveConf: Using the default value passed in for log id: bb608b6a-b9af-4129-bae0-739683648882
    21/12/07 18:14:47 INFO session.SessionState: Deleted directory: /tmp/hive/hadoop/bb608b6a-b9af-4129-bae0-739683648882 on fs with scheme hdfs
    21/12/07 18:14:47 INFO session.SessionState: Deleted directory: /tmp/hadoop/bb608b6a-b9af-4129-bae0-739683648882 on fs with scheme file
    21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
    21/12/07 18:14:47 INFO hive.HiveImport: Hive import complete.
    

    我只遇到了这些问题,就及时的记录下来了,即使同样的报错也可能是不同的原因,如果不奏效,就再去找找吧!

    相关文章

      网友评论

          本文标题:使用sqoop从mysql向hive中导数据

          本文链接:https://www.haomeiwen.com/subject/lflaxrtx.html