美文网首页
sqoop2 shell 数据导入导出

sqoop2 shell 数据导入导出

作者: 天堂宝宝_V | 来源:发表于2017-09-14 18:20 被阅读446次

    title: sqoop2 shell 使用
    tags: 新建,模板,小书匠
    grammar_cjkRuby: true


    从MySQL数据库中导入数据到HDFS中

    显示 可用创建link的connector

    sqoop:000> show connector
    0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    +------------------------+---------+------------------------------------------------------------+----------------------+
    |          Name          | Version |                           Class                            | Supported Directions |
    +------------------------+---------+------------------------------------------------------------+----------------------+
    | generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
    | kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
    | oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
    | ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
    | hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
    | kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
    | sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
    +------------------------+---------+------------------------------------------------------------+----------------------+
    

    创建mysqllink

    sqoop:000> update link -n mysqllink
    Updating link with name mysqllink
    Please update link:
    Name: mysqllink
    
    Database connection
    
    Driver class: com.mysql.jdbc.Driver
    Connection String: jdbc:mysql://master:3306/forsqoop?useSSL=false
    Username: root
    Password: ******
    Fetch Size: 
    Connection Properties: 
    There are currently 0 values in the map:
    entry# 
    
    SQL Dialect
    
    Identifier enclose: `
    link was successfully updated with status OK
    

    创建 hdfslink

    sqoop:000> update link -n hdfslink
    Updating link with name hdfslink
    Please update link:
    Name: hdfslink
    
    HDFS cluster
    
    URI: hdfs://master:9000
    Conf directory: /usr/hadoop-2.6.4/etc/hadoop/
    Additional configs:: 
    There are currently 0 values in the map:
    entry# 
    link was successfully updated with status OK
    

    显示创建成功的link

    sqoop:000> show link
    +-----------+------------------------+---------+
    |   Name    |     Connector Name     | Enabled |
    +-----------+------------------------+---------+
    | mysqllink | generic-jdbc-connector | true    |
    | hdfslink  | hdfs-connector         | true    |
    +-----------+------------------------+---------+
    

    更新 job

    sqoop:000> update job -n frommysqljob
    Updating job with name frommysqljob
    Please update job:
    Name: frommysqljob
    
    Database source
    
    Schema name: forsqoop
    Table name: sds
    SQL statement: 
    Column names: 
    There are currently 0 values in the list:
    element# 
    Partition column: SD_ID
    Partition column nullable: 
    Boundary query: 
    
    Incremental read
    
    Check column: 
    Last value: 
    
    Target configuration
    
    Override null value: 
    Null value: 
    File format: 
      0 : TEXT_FILE
      1 : SEQUENCE_FILE
      2 : PARQUET_FILE
    Choose: 0
    Compression codec: 
      0 : NONE
      1 : DEFAULT
      2 : DEFLATE
      3 : GZIP
      4 : BZIP2
      5 : LZO
      6 : LZ4
      7 : SNAPPY
      8 : CUSTOM
    Choose: 0
    Custom codec: 
    Output directory: /user/root/sqoop/frommysql
    Append mode: 
    
    Throttling resources
    
    Extractors: 
    Loaders: 
    
    Classpath configuration
    
    Extra mapper jars: 
    There are currently 0 values in the list:
    element# 
    Job was successfully updated with status OK
    

    开启任务

    sqoop:000> start job -n frommysqljob
    Submission details
    Job Name: frommysqljob
    Server URL: http://localhost:12000/sqoop/
    Created by: root
    Creation date: 2017-06-25 14:43:47 CST
    Lastly updated by: root
    External ID: job_1498352893725_0013
            http://master:8088/proxy/application_1498352893725_0013/
    2017-06-25 14:43:47 CST: BOOTING  - Progress is not available
    

    查看执行任务的结果

    enter description hereenter description here

    易出现问题的地方

    1 mysql 没有打开远程链接,导致can't get a connector 错误的发生
    进入mysql开启远程访问
    @'192.168.137.121'可以替换为@‘%’就可任意ip访问,当然我们也可以直接用 UPDATE 更新 root 用户 Host, 但不推荐,123456是密码 SQL如下:

    GRANT ALL PRIVILEGES ON *.* TO 'root'@'192.168.137.121' IDENTIFIED BY '123456' WITH GRANT OPTION;
    flush privileges;
    

    2 配置 generic-jdbc-connector link时 Identifier enclose: `参数要设置为`或者是空格

    3 mysql导入hdfs时表结构如果没有主键需要显示指定Partition column: 参数为表中的一个字段分区,否则会因为没有主键导致导入报错。

    4 MySQL中的Connection String:参数配置路径不能使用localhost因为你不知道yarn会分配任务给那一台机器执行此操作,使用master从机后都配过master的IP地址,这样就不用担心从机获取不到master主机上的数据库数据了。

    从HDFS导出数据到MySQL数据库

    创建 tomysqllink

    sqoop:000> create link -n generic-jdbc-connector
    Invalid command invocation: Unknown option encountered: -n
    sqoop:000> create link  generic-jdbc-connector
    Invalid command invocation: Missing required option: c
    sqoop:000> create link -c  generic-jdbc-connector
    Creating link for connector with name generic-jdbc-connector
    Please fill following values to create new link object
    Name: tomysqllink
    
    Database connection
    
    Driver class: com.mysql.jdbc.Driver
    Connection String: jdbc:mysql://master:3306/forsqoop?useSSL=false
    Username: root
    Password: ******
    Fetch Size: 
    Connection Properties: 
    There are currently 0 values in the map:
    entry# 
    
    SQL Dialect
    
    Identifier enclose: `
    New link was successfully created with validation status OK and name tomysqllink
    

    创建fromhdfslink link

    sqoop:000> create link -c hdfs-connector
    Creating link for connector with name hdfs-connector
    Please fill following values to create new link object
    Name: fromhdfslink
    
    HDFS cluster
    
    URI: hdfs://master:9000
    Conf directory: /usr/hadoop-2.6.4/etc/hadoop/
    Additional configs:: 
    There are currently 0 values in the map:
    entry# 
    New link was successfully created with validation status OK and name fromhdfslink
    

    展示创建成功的link

    sqoop:000> show link
    +--------------+------------------------+---------+
    |     Name     |     Connector Name     | Enabled |
    +--------------+------------------------+---------+
    | mysqllink    | generic-jdbc-connector | true    |
    | tomysqllink  | generic-jdbc-connector | true    |
    | hdfslink     | hdfs-connector         | true    |
    | fromhdfslink | hdfs-connector         | true    |
    +--------------+------------------------+---------+
    

    创建 tomysqljob job

    sqoop:000> create job --f fromhdfslink --t tomysqllink
    Creating job for links with from name fromhdfslink and to name tomysqllink
    Please fill following values to create new job object
    Name: tomysqljob
    
    Input configuration
    
    Input directory: /user/root/sqoop/frommysql
    Override null value: N
    Null value: ture
    
    Incremental import
    
    Incremental type: 
      0 : NONE
      1 : NEW_FILES
    Choose: 0
    Last imported date: 
    
    Database target
    
    Schema name: forsqoop
    Table name: sds
    Column names: 
    There are currently 0 values in the list:
    element# 
    Staging table: 
    Clear stage table: 
    
    Throttling resources
    
    Extractors: 
    Loaders: 
    
    Classpath configuration
    
    Extra mapper jars: 
    There are currently 0 values in the list:
    element# 
    New job was successfully created with validation status OK  and name tomysqljob
    

    展示创建成功的job

    sqoop:000> show job
    +----+--------------+------------------------------------+--------------------------------------+---------+
    | Id |     Name     |           From Connector           |             To Connector             | Enabled |
    +----+--------------+------------------------------------+--------------------------------------+---------+
    | 1  | frommysqljob | mysqllink (generic-jdbc-connector) | hdfslink (hdfs-connector)            | true    |
    | 2  | tomysqljob   | fromhdfslink (hdfs-connector)      | tomysqllink (generic-jdbc-connector) | true    |
    +----+--------------+------------------------------------+--------------------------------------+---------+
    

    执行导入MySQL数据库表的job

    sqoop:000> start job -n tomysqljob
    Submission details
    Job Name: tomysqljob
    Server URL: http://localhost:12000/sqoop/
    Created by: root
    Creation date: 2017-06-25 15:03:10 CST
    Lastly updated by: root
    External ID: job_1498352893725_0014
            http://master:8088/proxy/application_1498352893725_0014/
    2017-06-25 15:03:10 CST: BOOTING  - Progress is not available
    

    查看job的执行情况

    enter description hereenter description here

    相关链接

    Command Line Shell 命令

    Connector-GenericJDBC 配置
    Connector-HDFS 配置
    其中的 FROM Job ConfigurationTO Job Configuration 分别会在创建job的时候用到里面的参数

    自定义 Connector Development 默认支持导入导出CSV

    相关文章

      网友评论

          本文标题:sqoop2 shell 数据导入导出

          本文链接:https://www.haomeiwen.com/subject/bkadsxtx.html