美文网首页收藏
flink13.5整合hudi10

flink13.5整合hudi10

作者: wudl | 来源:发表于2022-02-07 00:19 被阅读0次

    1. 版本

    组件 版本
    hudi 10.0
    flink 13.5

    2.hudi 源码下载

    https://github.com/apache/hudi/releases
    

    2.1 需要改flink 版本为13.5

    根目录下面的pom 文件

    <flink.version>1.13.5</flink.version>
    <hive.version>3.1.0</hive.version>
    <hadoop.version>3.1.1</hadoop.version>
     
    

    2.2 编译命令

    mvn clean package -DskipTests
    # 或者指定scala 版本
     
    #编译后的包
    包的路径在packaging/hudi-flink-bundle/target/hudi-flink-bundle_2.12-*.*.*-SNAPSHOT.jar
    

    2.3编译遇到一个错误

    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Summary for Hudi 0.10.0:
    [INFO] 
    [INFO] Hudi ............................................... SUCCESS [  1.642 s]
    [INFO] hudi-common ........................................ SUCCESS [  9.808 s]
    [INFO] hudi-aws ........................................... SUCCESS [  1.306 s]
    [INFO] hudi-timeline-service .............................. SUCCESS [  1.623 s]
    [INFO] hudi-client ........................................ SUCCESS [  0.082 s]
    [INFO] hudi-client-common ................................. SUCCESS [  8.027 s]
    [INFO] hudi-hadoop-mr ..................................... SUCCESS [  2.825 s]
    [INFO] hudi-spark-client .................................. SUCCESS [ 13.891 s]
    [INFO] hudi-sync-common ................................... SUCCESS [  0.718 s]
    [INFO] hudi-hive-sync ..................................... SUCCESS [  3.027 s]
    [INFO] hudi-spark-datasource .............................. SUCCESS [  0.066 s]
    [INFO] hudi-spark-common_2.12 ............................. SUCCESS [  7.706 s]
    [INFO] hudi-spark2_2.12 ................................... SUCCESS [  9.535 s]
    [INFO] hudi-spark_2.12 .................................... SUCCESS [ 25.923 s]
    [INFO] hudi-utilities_2.12 ................................ FAILURE [  2.638 s]
    [INFO] hudi-utilities-bundle_2.12 ......................... SKIPPED
    [INFO] hudi-cli ........................................... SKIPPED
    [INFO] hudi-java-client ................................... SKIPPED
    [INFO] hudi-flink-client .................................. SKIPPED
    [INFO] hudi-spark3_2.12 ................................... SKIPPED
    [INFO] hudi-dla-sync ...................................... SKIPPED
    [INFO] hudi-sync .......................................... SKIPPED
    [INFO] hudi-hadoop-mr-bundle .............................. SKIPPED
    [INFO] hudi-hive-sync-bundle .............................. SKIPPED
    [INFO] hudi-spark-bundle_2.12 ............................. SKIPPED
    [INFO] hudi-presto-bundle ................................. SKIPPED
    [INFO] hudi-timeline-server-bundle ........................ SKIPPED
    [INFO] hudi-hadoop-docker ................................. SKIPPED
    [INFO] hudi-hadoop-base-docker ............................ SKIPPED
    [INFO] hudi-hadoop-namenode-docker ........................ SKIPPED
    [INFO] hudi-hadoop-datanode-docker ........................ SKIPPED
    [INFO] hudi-hadoop-history-docker ......................... SKIPPED
    [INFO] hudi-hadoop-hive-docker ............................ SKIPPED
    [INFO] hudi-hadoop-sparkbase-docker ....................... SKIPPED
    [INFO] hudi-hadoop-sparkmaster-docker ..................... SKIPPED
    [INFO] hudi-hadoop-sparkworker-docker ..................... SKIPPED
    [INFO] hudi-hadoop-sparkadhoc-docker ...................... SKIPPED
    [INFO] hudi-hadoop-presto-docker .......................... SKIPPED
    [INFO] hudi-integ-test .................................... SKIPPED
    [INFO] hudi-integ-test-bundle ............................. SKIPPED
    [INFO] hudi-examples ...................................... SKIPPED
    [INFO] hudi-flink_2.12 .................................... SKIPPED
    [INFO] hudi-kafka-connect ................................. SKIPPED
    [INFO] hudi-flink-bundle_2.12 ............................. SKIPPED
    [INFO] hudi-kafka-connect-bundle .......................... SKIPPED
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD FAILURE
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  01:29 min
    [INFO] Finished at: 2022-02-06T17:59:02+08:00
    [INFO] ------------------------------------------------------------------------
    [ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.10.0: The following artifacts could not be resolved: io.confluent:kafka-avro-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.4 in aliyunmaven (https://maven.aliyun.com/repository/public) -> [Help 1]
    [ERROR] 
    [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR] 
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
    [ERROR] 
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR]   mvn <args> -rf :hudi-utilities_2.12
    
    

    以上错误需要手动下载包后添加本地仓库

    mvn install:install-file -Dfile=/opt/myjar/common-config-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.0 -Dpackaging=jar
    
    mvn install:install-file -Dfile=/opt/myjar/common-utils-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.0 -Dpackaging=jar
    
    
    mvn install:install-file -Dfile=/opt/myjar/kafka-avro-serializer-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.0 -Dpackaging=jar
    
    mvn install:install-file -Dfile=/opt/myjar/kafka-schema-registry-client-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.0 -Dpackaging=jar
    
    
    [INFO] Reactor Summary for Hudi 0.10.0:
    [INFO] 
    [INFO] Hudi ............................................... SUCCESS [  1.370 s]
    [INFO] hudi-common ........................................ SUCCESS [ 10.813 s]
    [INFO] hudi-aws ........................................... SUCCESS [  1.394 s]
    [INFO] hudi-timeline-service .............................. SUCCESS [  1.404 s]
    [INFO] hudi-client ........................................ SUCCESS [  0.072 s]
    [INFO] hudi-client-common ................................. SUCCESS [  7.295 s]
    [INFO] hudi-hadoop-mr ..................................... SUCCESS [  2.848 s]
    [INFO] hudi-spark-client .................................. SUCCESS [ 15.158 s]
    [INFO] hudi-sync-common ................................... SUCCESS [  0.681 s]
    [INFO] hudi-hive-sync ..................................... SUCCESS [  2.856 s]
    [INFO] hudi-spark-datasource .............................. SUCCESS [  0.054 s]
    [INFO] hudi-spark-common_2.12 ............................. SUCCESS [  7.296 s]
    [INFO] hudi-spark2_2.12 ................................... SUCCESS [ 10.521 s]
    [INFO] hudi-spark_2.12 .................................... SUCCESS [ 26.299 s]
    [INFO] hudi-utilities_2.12 ................................ SUCCESS [ 11.262 s]
    [INFO] hudi-utilities-bundle_2.12 ......................... SUCCESS [01:39 min]
    [INFO] hudi-cli ........................................... SUCCESS [ 15.297 s]
    [INFO] hudi-java-client ................................... SUCCESS [  2.267 s]
    [INFO] hudi-flink-client .................................. SUCCESS [01:06 min]
    [INFO] hudi-spark3_2.12 ................................... SUCCESS [  6.117 s]
    [INFO] hudi-dla-sync ...................................... SUCCESS [  6.830 s]
    [INFO] hudi-sync .......................................... SUCCESS [  0.061 s]
    [INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [  8.565 s]
    [INFO] hudi-hive-sync-bundle .............................. SUCCESS [  1.131 s]
    [INFO] hudi-spark-bundle_2.12 ............................. SUCCESS [ 11.139 s]
    [INFO] hudi-presto-bundle ................................. SUCCESS [ 38.706 s]
    [INFO] hudi-timeline-server-bundle ........................ SUCCESS [  8.251 s]
    [INFO] hudi-hadoop-docker ................................. SUCCESS [  1.166 s]
    [INFO] hudi-hadoop-base-docker ............................ SUCCESS [  0.649 s]
    [INFO] hudi-hadoop-namenode-docker ........................ SUCCESS [  0.649 s]
    [INFO] hudi-hadoop-datanode-docker ........................ SUCCESS [  0.627 s]
    [INFO] hudi-hadoop-history-docker ......................... SUCCESS [  0.659 s]
    [INFO] hudi-hadoop-hive-docker ............................ SUCCESS [  7.320 s]
    [INFO] hudi-hadoop-sparkbase-docker ....................... SUCCESS [  0.731 s]
    [INFO] hudi-hadoop-sparkmaster-docker ..................... SUCCESS [  0.638 s]
    [INFO] hudi-hadoop-sparkworker-docker ..................... SUCCESS [  0.667 s]
    [INFO] hudi-hadoop-sparkadhoc-docker ...................... SUCCESS [  0.671 s]
    [INFO] hudi-hadoop-presto-docker .......................... SUCCESS [  0.704 s]
    [INFO] hudi-integ-test .................................... SUCCESS [ 36.320 s]
    [INFO] hudi-integ-test-bundle ............................. SUCCESS [01:47 min]
    [INFO] hudi-examples ...................................... SUCCESS [  8.120 s]
    [INFO] hudi-flink_2.12 .................................... SUCCESS [ 38.207 s]
    [INFO] hudi-kafka-connect ................................. SUCCESS [ 19.832 s]
    [INFO] hudi-flink-bundle_2.12 ............................. SUCCESS [ 27.658 s]
    [INFO] hudi-kafka-connect-bundle .......................... SUCCESS [ 14.287 s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  10:30 min
    [INFO] Finished at: 2022-02-06T20:29:29+08:00
    [INFO] ------------------------------------------------------------------------
    [root@node01 hudi-0.10.0]# 
    
    

    3. 编译的包目录

    [root@node01 packaging]# pwd
    /opt/module/hudi/hudi-0.10.0/packaging
    [root@node01 packaging]# ll
    总用量 4
    drwxr-xr-x 4 501 games   46 2月   6 20:41 hudi-flink-bundle
    drwxr-xr-x 4 501 games   46 2月   6 20:38 hudi-hadoop-mr-bundle
    drwxr-xr-x 4 501 games   46 2月   6 20:38 hudi-hive-sync-bundle
    drwxr-xr-x 4 501 games   46 2月   6 20:39 hudi-integ-test-bundle
    drwxr-xr-x 4 501 games   46 2月   6 20:41 hudi-kafka-connect-bundle
    drwxr-xr-x 4 501 games   46 2月   6 20:38 hudi-presto-bundle
    drwxr-xr-x 4 501 games   46 2月   6 20:38 hudi-spark-bundle
    drwxr-xr-x 4 501 games  101 2月   6 20:38 hudi-timeline-server-bundle
    drwxr-xr-x 4 501 games   46 2月   6 20:37 hudi-utilities-bundle
    -rw-r--r-- 1 501 games 2206 12月  8 10:38 README.md
    [root@node01 packaging]# 
    
    

    4.flink 整合hudi 所需要的jar 包

    主要是
    hudi-flink-bundle_2.12-0.10.0.jar
    hudi-hadoop-mr-bundle-0.10.0.jar

    [root@node01 lib]# pwd
    /opt/module/flink/flink-1.13.5/lib
    [root@node01 lib]# ll
    总用量 316964
    -rw-r--r-- 1 root root   7802399 1月   1 08:27 doris-flink-1.0-SNAPSHOT.jar
    -rw-r--r-- 1 root root    249571 12月 27 23:32 flink-connector-jdbc_2.12-1.13.5.jar
    -rw-r--r-- 1 root root    359138 1月   1 10:17 flink-connector-kafka_2.12-1.13.5.jar
    -rw-r--r-- 1 hive 1007     92315 12月 15 08:23 flink-csv-1.13.5.jar
    -rw-r--r-- 1 hive 1007 106535830 12月 15 08:29 flink-dist_2.12-1.13.5.jar
    -rw-r--r-- 1 hive 1007    148127 12月 15 08:23 flink-json-1.13.5.jar
    -rw-r--r-- 1 root root  43317025 2月   6 20:51 flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
    -rw-r--r-- 1 hive 1007   7709740 12月 15 06:57 flink-shaded-zookeeper-3.4.14.jar
    -rw-r--r-- 1 hive 1007  35051557 12月 15 08:28 flink-table_2.12-1.13.5.jar
    -rw-r--r-- 1 hive 1007  38613344 12月 15 08:28 flink-table-blink_2.12-1.13.5.jar
    -rw-r--r-- 1 root root  62447468 2月   6 20:44 hudi-flink-bundle_2.12-0.10.0.jar
    -rw-r--r-- 1 root root  17276348 2月   6 20:51 hudi-hadoop-mr-bundle-0.10.0.jar
    -rw-r--r-- 1 root root   1893564 1月   1 10:17 kafka-clients-2.0.0.jar
    -rw-r--r-- 1 hive 1007    207909 12月 15 06:56 log4j-1.2-api-2.16.0.jar
    -rw-r--r-- 1 hive 1007    301892 12月 15 06:56 log4j-api-2.16.0.jar
    -rw-r--r-- 1 hive 1007   1789565 12月 15 06:56 log4j-core-2.16.0.jar
    -rw-r--r-- 1 hive 1007     24258 12月 15 06:56 log4j-slf4j-impl-2.16.0.jar
    -rw-r--r-- 1 root root    724213 12月 27 23:23 mysql-connector-java-5.1.9.jar
    [root@node01 lib]# 
    
    

    5. 进入到flink sql 中

    ./sql-client.sh embedded shell
    # 在SQL Cli设置分析结果展示模式
    set execution.result-mode=tableau;
    
    

    6. 建表语句

        CREATE TABLE t1(
          uuid VARCHAR(20), 
          name VARCHAR(10),
          age INT,
          ts TIMESTAMP(3),
          `partition` VARCHAR(20)
        )
        PARTITIONED BY (`partition`)
        WITH (
          'connector' = 'hudi',
          'path' = 'hdfs://192.168.1.161:8020/hudi-warehouse/hudi-t1',
          'write.tasks' = '1',
          'compaction.tasks' = '1', 
          'table.type' = 'MERGE_ON_READ'
        );
    

    6.1 插入数据

    INSERT INTO t1 VALUES('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1');
    INSERT INTO t1 VALUES
    ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
    ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
    ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
    ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
    ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
    ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
    ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
    ##  展示
    Flink SQL> INSERT INTO t1 VALUES
    > ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
    > ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
    > ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
    > ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
    > ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
    > ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
    > ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
    [INFO] Submitting SQL update statement to the cluster...
    [INFO] SQL update statement has been successfully submitted to the cluster:
    Job ID: d6c70e43969b0f2b5124104468c5e065
    
    
    Flink SQL> select * from t1;
    +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
    | op |                           uuid |                           name |         age |                      ts |                      partition |
    +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
    | +I |                            id6 |                           Emma |          20 | 1970-01-01 00:00:06.000 |                           par3 |
    | +I |                            id5 |                         Sophia |          18 | 1970-01-01 00:00:05.000 |                           par3 |
    | +I |                            id8 |                            Han |          56 | 1970-01-01 00:00:08.000 |                           par4 |
    | +I |                            id7 |                            Bob |          44 | 1970-01-01 00:00:07.000 |                           par4 |
    | +I |                            id2 |                        Stephen |          33 | 1970-01-01 00:00:02.000 |                           par1 |
    | +I |                            id1 |                          Danny |          28 | 1970-01-01 00:00:01.000 |                           par1 |
    | +I |                            id4 |                         Fabian |          31 | 1970-01-01 00:00:04.000 |                           par2 |
    | +I |                            id3 |                         Julian |          53 | 1970-01-01 00:00:03.000 |                           par2 |
    +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
    Received a total of 8 rows
    
    
    
    

    7 更新 操作 更新就是需要从新插入数据

    将年龄更改为18
    INSERT INTO t1 VALUES('id1','Danny',18,TIMESTAMP '1970-01-01 00:00:01','par1');
    查询如下

    Flink SQL> select * from t1;
    +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
    | op |                           uuid |                           name |         age |                      ts |                      partition |
    +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
    | +I |                            id8 |                            Han |          56 | 1970-01-01 00:00:08.000 |                           par4 |
    | +I |                            id7 |                            Bob |          44 | 1970-01-01 00:00:07.000 |                           par4 |
    | +I |                            id4 |                         Fabian |          31 | 1970-01-01 00:00:04.000 |                           par2 |
    | +I |                            id3 |                         Julian |          53 | 1970-01-01 00:00:03.000 |                           par2 |
    | +I |                            id2 |                        Stephen |          33 | 1970-01-01 00:00:02.000 |                           par1 |
    | +I |                            id1 |                          Danny |          18 | 1970-01-01 00:00:01.000 |                           par1 |
    | +I |                            id6 |                           Emma |          20 | 1970-01-01 00:00:06.000 |                           par3 |
    | +I |                            id5 |                         Sophia |          18 | 1970-01-01 00:00:05.000 |                           par3 |
    +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
    Received a total of 8 rows
    
    Flink SQL> 
    
    

    8.flink 中的任务

    在这里插入图片描述

    相关文章

      网友评论

        本文标题:flink13.5整合hudi10

        本文链接:https://www.haomeiwen.com/subject/vpipkrtx.html