一、情况概述
mysql中已创建好名为sitelight的数据库并导入了数据,表名为t_site_formal,数据条数为168992。hive中执行建库语句,名称为sitelight,且未指定分隔符,为默认的'\001'。执行导入数据的命令时,需要启动hadoop集群、启动hive服务端、hive的metastore服务,sqoop无需启动,可直接用。
二、服务启动
- 启动hdfs服务
[hadoop@hadoop01 sbin]$ pwd
/home/hadoop/hadoop-2.10.1/sbin
[hadoop@hadoop01 sbin]$ ./start-dfs.sh
- 启动hive服务端,无所谓在哪个目录上
[hadoop@hadoop01 sbin]$ nohup hive --service hiveserver2 &
- 启动metastore服务,无所谓在哪个目录上
[hadoop@hadoop01 sbin]$ nohup hive --service metastore &
三、执行脚本
sqoop import \
--connect jdbc:mysql://172.16.100.19:3306/sitelight \
--username queryuser \
--password abcde12345 \
--table t_site_formal \
--hive-import \
--hive-overwrite \
--create-hive-table \
--delete-target-dir \
--hive-database sitelight \
--hive-table t_site_formal \
-m 1
四、执行导入脚本遇见的错误
- 报错信息1
[hadoop@hadoop01 sbin]$ sqoop import \
> --connect jdbc:mysql://172.16.100.19:3306/sitelight \
> --username queryuser \
> --password abcde12345 \
> --table t_site_formal \
> --hive-import \
> -m 1
Warning: /home/hadoop/sqoop-1.4.7/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop-1.4.7/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
21/12/07 15:32:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
21/12/07 15:32:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
21/12/07 15:32:53 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
21/12/07 15:32:53 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
21/12/07 15:32:53 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
21/12/07 15:32:53 INFO tool.CodeGenTool: Beginning code generation
21/12/07 15:32:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
21/12/07 15:32:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
21/12/07 15:32:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.10.1
Note: /tmp/sqoop-hadoop/compile/672d74685e64a0fe1025ce9b2b875e46/t_site_formal.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
21/12/07 15:32:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/672d74685e64a0fe1025ce9b2b875e46/t_site_formal.jar
21/12/07 15:32:54 WARN manager.MySQLManager: It looks like you are importing from mysql.
21/12/07 15:32:54 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
21/12/07 15:32:54 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
21/12/07 15:32:54 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
21/12/07 15:32:54 INFO mapreduce.ImportJobBase: Beginning import of t_site_formal
21/12/07 15:32:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
21/12/07 15:32:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
21/12/07 15:32:54 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/172.16.100.26:8032
21/12/07 15:32:56 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/12/07 15:32:57 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
21/12/07 15:32:58 INFO ipc.Client: Retrying connect to server: hadoop01/172.16.100.26:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
- 报错原因:INFO client.RMProxy: Connecting to ResourceManager at hadoop01/172.16.100.26:8032,resourcemanager属于yarn的主节点,是由于hadoop服务配置时指定了yarn计算框架。
- 解决方法:一、如果只想使用hadoop的hdfs文件系统存储数据功能,可不指定yarn,yarn配置的文件是mapred-site.xml,即/home/hadoop/hadoop-2.10.1/etc/hadoop/mapred-site.xml,将其修改成别名,并重启hadoop集群即可;二、直接启动yarn服务,在sbin目录下直接启动,即/home/hadoop/hadoop-2.10.1/sbin/start-yarn.sh
- 报错信息2
21/12/07 15:39:42 INFO hive.HiveImport: Loading uploaded data into Hive
21/12/07 15:39:42 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
21/12/07 15:39:42 ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50)
at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392)
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44)
... 12 more
- 报错原因:Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.很明显,hadoop找不到hive的配置文件。
- 解决方法:追加当前用户下hadoop的环境变量
[hadoop@hadoop01 sbin]$ vim ~/.bashrc
export HADOOP_CLASSPATH=$HIVE_HOME/lib/*
- 报错信息3
21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Transferred 40.7139 MB in 11.4171 seconds (3.566 MB/sec)
21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Retrieved 168237 records.
21/12/07 17:25:05 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table t_site_formal
21/12/07 17:25:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_site_formal` AS t LIMIT 1
21/12/07 17:25:05 WARN hive.TableDefWriter: Column create_date had to be cast to a less precise type in Hive
21/12/07 17:25:05 WARN hive.TableDefWriter: Column audit_date had to be cast to a less precise type in Hive
21/12/07 17:25:05 INFO hive.HiveImport: Loading uploaded data into Hive
21/12/07 17:25:05 INFO conf.HiveConf: Found configuration file file:/home/hadoop/apache-hive-2.3.9-bin/conf/hive-site.xml
2021-12-07 17:25:06,664 main ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:585)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.checkMBeanTrustPermission(DefaultMBeanServerInterceptor.java:1848)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:322)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.logging.log4j.core.jmx.Server.register(Server.java:380)
at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:165)
at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:138)
at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:507)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:249)
at org.apache.logging.log4j.core.async.AsyncLoggerContext.start(AsyncLoggerContext.java:86)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:239)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:157)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:130)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:100)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:187)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:154)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:90)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:82)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:65)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:702)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:537)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
- 报错原因:ERROR Could not register mbeans java.security.AccessControlException: access denied ("javax.management.MBeanTrustPermission" "register"),明确提示是java的安全策略控制不允许,即jdk的安全配置需修改。
- 解决方法:找到jdk安装目录,并修改安全策略文件。我的jdk是安装在全局变量中,所以使用root用户修改
[hadoop@hadoop01 sbin]$ echo $JAVA_HOME
/usr/local/java
[hadoop@hadoop01 sbin]$ exit
exit
[root@hadoop01 hadoop-2.10.1]$ cd /usr/local/java/jre/lib/security/
[root@hadoop01 security]$ vim java.policy
//在grant的{}中添加此配置
permission javax.management.MBeanTrustPermission "register";
五、成功运行
LOAD DATA INPATH 'hdfs://hadoop01:8082/user/hadoop/t_site_formal' OVERWRITE INTO TABLE `sitelight`.`t_site_formal`
21/12/07 18:14:47 INFO ql.Driver: Semantic Analysis Completed
21/12/07 18:14:47 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
21/12/07 18:14:47 INFO ql.Driver: Completed compiling command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303); Time taken: 0.115 seconds
21/12/07 18:14:47 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
21/12/07 18:14:47 INFO ql.Driver: Executing command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303):
LOAD DATA INPATH 'hdfs://hadoop01:8082/user/hadoop/t_site_formal' OVERWRITE INTO TABLE `sitelight`.`t_site_formal`
21/12/07 18:14:47 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table sitelight.t_site_formal
21/12/07 18:14:47 INFO exec.Task: Loading data to table sitelight.t_site_formal from hdfs://hadoop01:8082/user/hadoop/t_site_formal
21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
21/12/07 18:14:47 INFO common.FileUtils: Creating directory if it doesn't exist: hdfs://hadoop01:8082/usr/local/warehouse/sitelight.db/t_site_formal
21/12/07 18:14:47 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
21/12/07 18:14:47 INFO exec.StatsTask: Executing stats task
21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
21/12/07 18:14:47 INFO hive.metastore: Trying to connect to metastore with URI thrift://172.16.100.26:9083
21/12/07 18:14:47 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/12/07 18:14:47 INFO hive.metastore: Connected to metastore.
21/12/07 18:14:47 INFO exec.StatsTask: Table sitelight.t_site_formal stats: [numFiles=1, numRows=0, totalSize=42691574, rawDataSize=0]
21/12/07 18:14:47 INFO ql.Driver: Completed executing command(queryId=hadoop_20211207181447_5c3a828a-2830-4fa6-b852-e4731b760303); Time taken: 0.291 seconds
OK
21/12/07 18:14:47 INFO ql.Driver: OK
Time taken: 0.406 seconds
21/12/07 18:14:47 INFO CliDriver: Time taken: 0.406 seconds
21/12/07 18:14:47 INFO conf.HiveConf: Using the default value passed in for log id: bb608b6a-b9af-4129-bae0-739683648882
21/12/07 18:14:47 INFO session.SessionState: Resetting thread name to main
21/12/07 18:14:47 INFO conf.HiveConf: Using the default value passed in for log id: bb608b6a-b9af-4129-bae0-739683648882
21/12/07 18:14:47 INFO session.SessionState: Deleted directory: /tmp/hive/hadoop/bb608b6a-b9af-4129-bae0-739683648882 on fs with scheme hdfs
21/12/07 18:14:47 INFO session.SessionState: Deleted directory: /tmp/hadoop/bb608b6a-b9af-4129-bae0-739683648882 on fs with scheme file
21/12/07 18:14:47 INFO hive.metastore: Closed a connection to metastore, current connections: 0
21/12/07 18:14:47 INFO hive.HiveImport: Hive import complete.
我只遇到了这些问题,就及时的记录下来了,即使同样的报错也可能是不同的原因,如果不奏效,就再去找找吧!
网友评论