美文网首页
Flink实验问题

Flink实验问题

作者: 大大大大大大大熊 | 来源:发表于2018-10-29 19:37 被阅读0次

参数问题

描述:


image.png

解决:仔细查看上下函数的输入输出的参数个数,和函数的参数设置

测试全局变量和并行度,集群模式的关系

未完成

  1. bolt内不加static,3并行度,单机模式
    测试bolt中全局变量i,在execute中执行打印i,然后i++

    结果: image.png
    分析:每个bolt实例维护自己的全局变量i。
  2. bolt内加static,3并行度,单机模式
    测试bolt中全局变量i,在execute中执行打印i,然后i++

    结果: image.png

分析:每个bolt实例共同维护享受static全局变量i。
原因:同一个机器在同一个jvm中,只有一个static。

  1. bolt内加static,3并行度,集群模式
    测试bolt中全局变量i,在execute中执行打印i,然后i++

    结果: image.png

分析:每个bolt实例维护自己的全局变量i。
原因:不同一个机器在不同一个jvm中,各自都有static。

  1. testtopology加static的public变量,3并行度,本地模式
    测试bolt中使用该testtopology.i,在execute中执行打印i,然后i++

    结果: image.png

分析:每个bolt实例共同维护享受static全局变量i。
原因:同一个机器在同一个jvm中,只有一个static。

  1. testtopology加static的public变量,3并行度,集群模式
    测试bolt中使用该testtopology.i,在execute中执行打印i,然后i++

    结果: image.png
    分析:每个bolt实例维护自己的全局变量i。
    原因:不同一个机器在不同一个jvm中,各自都有static。

全局状态

链接:https://stackoverflow.com/questions/33755697/flink-sharing-state-in-coflatmapfunction

Is the CoFlatMapFunction intended to be executed in parallel?
If yes, you need some way to deterministically assign which record goes to which parallel instance. In some way the CoFlatMapFunction does a parallel (partitions) join between the model and the result of the session windows, so you need some form of key that selects which partition the elements go to. Does that make sense?
If not, try to set it to parallelism 1 explicitly.
Greetings, Stephan
A global state that all can access read-only is doable via broadcast().
A global state that is available to all for read and update is currently not available. Consistent operations on that would be quite costly, require some form of distributed communication/consensus.
Instead, I would encourage you to go with the following:

  1. If you can partition the state, use a keyBy().mapWithState() - That localizes state operations and makes it very fast.
  2. If your state is not organized by key, your state is probably very small, and you may be able to use a non-parallel operation.
  3. If some operation updates the state and another one accesses it, you can often implement that with iterations and a CoFlatMapFunction (one side is the original input, the other the feedback input).
    All approaches in the end localize state access and modifications, which is a good pattern to follow, if possible.
    Greetings, Stephan

链接:http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Issue-with-sharing-state-in-CoFlatMapFunction-td3529.html
flink讨论多并行度,全局变量

窗口函数的bug:

image.png

解决:因为keyby输出的key类型为tuple,需要自己重新定义。


image.png

1.4.1使用read kafka的时候bug

KafkaFetcher09/010/011 uses wrong user code classloader
链接:https://issues.apache.org/jira/browse/FLINK-8741
在1.4.2中修复
解决:使用新版本

1.4.1能自动添加hadoop到依赖,1.7.0的启动脚本中取消了

image.png

解决:手动设置hadoop的依赖在/etc/profile中加入export HADOOP_CLASSPATH=hadoop classpath``。

Hadoop is not in the classpath/dependencies.

如果要使用filesyetem sate backen必须要用带hadoop依赖的flink

you're right that if you want to access HDFS from the user code only it should be possible to use the Hadoop free Flink version and bundle the Hadoop dependencies with your user code. However, if you want to use Flink's file system state backend as you did, then you have to start the Flink cluster with the Hadoop dependency in its classpath. The reason is that the FsStateBackend is part of the Flink distribution and will be
loaded using the system class loader. One thing you could try out is to use the RocksDB state backend instead. Since the RocksDBStateBackend is loaded dynamically, I think it should use the Hadoop dependencies when trying to load the filesystem.

解决:安装具有依赖的hadoop版本。成功

相关文章

网友评论

      本文标题:Flink实验问题

      本文链接:https://www.haomeiwen.com/subject/xoditqtx.html