一、环境准备
Maven 3.3.x +
JDK 8 高版本
SCALA 2.11.x
flink 1.8.1 源码包
二、maven setting文件配置
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>*,!jeecg,!jeecg-snapshots,!mapr-releases,!cloudera-releases,!confluent</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
- mirrorof 配置解释
* = everything
external:* = everything not on the localhost and not file based.
repo,repo1 = repo or repo1
*,!repo1 = everything except repo1
发现如果在<mirrorOf>中配置*,表示当前mirror为所有仓库镜像,所有远程仓库请求地址为当前mirror对应的URL( having it mirror all repository requests)。所以我把此处的mirrorOf改为*,!jeecg,!jeecg-snapshots,!mapr-releases,!cloudera-releases,!confluent
,此时当前mirror只会拦截仓库除了jeecg,jeecg-snapshots,mapr-releases,cloudera-releases,confluent
的依赖请求,对于未被拦截的请求会到pom文件指定的仓库去下载。
cloudera-releases
指的是这里的id

其他问题:
Setting.xml中repository的配置与pom.xml中repository的配置有什么不同?
Setting.xml中配置repository与pom.xml中配置repository的作用是相同的,都是为了指定多个存储库的使用(you can specify the use of multiple repositories)。但在pom.xml中配置只对当前项目与子项目有用,而在setting.xml中配置为全局性配置,用于所用的项目。
三、下载并编译flink-shade的源码
- 下载源码
wget https://archive.apache.org/dist/flink/flink-shaded-7.0/flink-shaded-7.0-src.tgz
- 解压缩后修改pom文件,增加cloudera的仓库
<repositories>
<repository>
<id>cloudera-releases</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
- 编译flink-shade源码
mvn clean install -DskipTests -Dhadoop.version=2.6.0-cdh5.15.1
三、执行编译命令
mvn clean install -DskipTests -Pvendor-repos -Dfast -Dhadoop.version=2.6.0-cdh5.15.1
四、安装结果

五、测试
运行flink on yarn案例
flink run -m yarn-cluster ./examples/batch/WordCount.jar \
--input /ruozedata/LICENSE-2.0.txt --output /ruozedata/wordcount-result.txt
报错:
------------------------------------------------------------
The program finished with the following exception:
java.lang.RuntimeException: Could not identify hostname and port in 'yarn-cluster'.
at org.apache.flink.client.ClientUtils.parseHostPortAddress(ClientUtils.java:47)
at org.apache.flink.client.cli.AbstractCustomCommandLine.applyCommandLineOptionsToConfiguration(AbstractCustomCommandLine.java:83)
at org.apache.flink.client.cli.DefaultCLI.createClusterDescriptor(DefaultCLI.java:60)
at org.apache.flink.client.cli.DefaultCLI.createClusterDescriptor(DefaultCLI.java:35)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:216)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1010)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1083)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1083)
解决方案:
-
将已经编译好的flink-shaded-7.0目录中的flink-shaded-hadoop-2-2.6.0-cdh5.15.1-7.0.jar拷贝到flink的lib目录中
-
指定hadoop依赖,可在系统变量中配置 或在跑yarn 时 执行
[hadoop@hadoop001 lib]$ export HADOOP_CLASSPATH=`hadoop classpath`
[hadoop@hadoop001 flink]$ bin/flink run -m yarn-cluster -yn 3 -s 4 examples/batch/WordCount.jar
网友评论