美文网首页
利用sqoop从mysql导入数据到HIVE中

利用sqoop从mysql导入数据到HIVE中

作者: 润土1030 | 来源:发表于2018-12-13 18:03 被阅读15次
    项目有个需求是要求把mysql的数据同步到hive中,之前用过sqoop,这里记录下,以后还用得着
    命令如下

    sqoop import --connect jdbc:mysql://100.98.97.156:3306/volte_eop_prod --username root --password 123456 --table dw_wy_drop_customized_drilldown_table_daily --direct --fields-terminated-by "\t" --lines-terminated-by "\n" --delete-target-dir --hive-import --create-hive-table --hive-database test --hive-table test1 --num-mappers 1

    参数分析
    • delete-target-dir :当你重复导入数据的时候由于HDFS文件路径已经存在会导致导入失败,加入这个参数,导入完后删除HDFS对应文件,重复导入不会报错

    • num-mappers : 这是mapper的数量,这个根据你自己的情况而定

    • create-hive-table : 根据mysql的表结构创建hive表

    • direct : mysql的特别参数,加快导出速度

    执行结果
    [ericsson@dlbdn3 runtu]$ sqoop import --connect jdbc:mysql://100.98.97.156:3306/volte_eop_prod --username root --password 123456 --table dw_wy_drop_customized_drilldown_table_daily --direct  --fields-terminated-by "\t" --lines-terminated-by "\n" --delete-target-dir --hive-import --create-hive-table --hive-database test --hive-table test1 --num-mappers 1
    Warning: /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    18/12/13 17:55:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.11.0
    18/12/13 17:55:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    18/12/13 17:55:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    18/12/13 17:55:49 INFO tool.CodeGenTool: Beginning code generation
    18/12/13 17:55:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
    18/12/13 17:55:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
    18/12/13 17:55:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
    Note: /tmp/sqoop-ericsson/compile/0f7e6d0f0c9ff6fc9fffb7d3d6412651/dw_wy_drop_customized_drilldown_table_daily.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    18/12/13 17:55:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ericsson/compile/0f7e6d0f0c9ff6fc9fffb7d3d6412651/dw_wy_drop_customized_drilldown_table_daily.jar
    18/12/13 17:55:54 INFO tool.ImportTool: Destination directory dw_wy_drop_customized_drilldown_table_daily deleted.
    18/12/13 17:55:54 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import
    18/12/13 17:55:54 INFO mapreduce.ImportJobBase: Beginning import of dw_wy_drop_customized_drilldown_table_daily
    18/12/13 17:55:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    18/12/13 17:55:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    18/12/13 17:55:54 INFO client.RMProxy: Connecting to ResourceManager at dlbdn3/192.168.123.4:8032
    18/12/13 17:55:56 INFO db.DBInputFormat: Using read commited transaction isolation
    18/12/13 17:55:56 INFO mapreduce.JobSubmitter: number of splits:1
    18/12/13 17:55:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543800485319_1069
    18/12/13 17:55:57 INFO impl.YarnClientImpl: Submitted application application_1543800485319_1069
    18/12/13 17:55:57 INFO mapreduce.Job: The url to track the job: http://dlbdn3:8088/proxy/application_1543800485319_1069/
    18/12/13 17:55:57 INFO mapreduce.Job: Running job: job_1543800485319_1069
    18/12/13 17:56:05 INFO mapreduce.Job: Job job_1543800485319_1069 running in uber mode : false
    18/12/13 17:56:05 INFO mapreduce.Job:  map 0% reduce 0%
    18/12/13 17:56:15 INFO mapreduce.Job:  map 100% reduce 0%
    18/12/13 17:56:15 INFO mapreduce.Job: Job job_1543800485319_1069 completed successfully
    18/12/13 17:56:15 INFO mapreduce.Job: Counters: 32
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=153436
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=87
            HDFS: Number of bytes written=562
            HDFS: Number of read operations=4
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Other local map tasks=1
            Total time spent by all maps in occupied slots (ms)=6137
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=6137
            Total vcore-milliseconds taken by all map tasks=6137
            Total megabyte-milliseconds taken by all map tasks=6284288
        Map-Reduce Framework
            Map input records=1
            Map output records=6
            Input split bytes=87
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=48
            CPU time spent (ms)=1510
            Physical memory (bytes) snapshot=328478720
            Virtual memory (bytes) snapshot=1694789632
            Total committed heap usage (bytes)=824180736
            Peak Map Physical memory (bytes)=328478720
            Peak Map Virtual memory (bytes)=1694789632
        File Input Format Counters 
            Bytes Read=0
        File Output Format Counters 
            Bytes Written=562
    18/12/13 17:56:15 INFO mapreduce.ImportJobBase: Transferred 562 bytes in 21.2223 seconds (26.4815 bytes/sec)
    18/12/13 17:56:15 INFO mapreduce.ImportJobBase: Retrieved 6 records.
    18/12/13 17:56:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dw_wy_drop_customized_drilldown_table_daily` AS t LIMIT 1
    18/12/13 17:56:15 WARN hive.TableDefWriter: Column DATE_TIME had to be cast to a less precise type in Hive
    18/12/13 17:56:15 INFO hive.HiveImport: Loading uploaded data into Hive
    
    Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/jars/hive-common-1.1.0-cdh5.11.0.jar!/hive-log4j.properties
    OK
    Time taken: 3.832 seconds
    Loading data to table test.test1
    Table test.test1 stats: [numFiles=1, totalSize=562]
    OK
    Time taken: 0.691 seconds
    [ericsson@dlbdn3 runtu]$ 
    
    用direct参数

    18/12/13 17:55:54 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import

    生成表名的java文件
    [ericsson@dlbdn3 runtu]$ ll
    total 32
    -rw-rw-r-- 1 ericsson ericsson 32198 Dec 13 17:37 dw_wy_drop_customized_drilldown_table_daily.java
    [ericsson@dlbdn3 runtu]$
    

    相关文章

      网友评论

          本文标题:利用sqoop从mysql导入数据到HIVE中

          本文链接:https://www.haomeiwen.com/subject/bipbhqtx.html