逻辑处理相关的概念
stream: 纯逻辑概念,stream指的是不同类型的spout或者bolt之间(跟一个bolt有几个task无关)
grouping: 物理概念,grouping defines how that stream should be partitioned among the bolt's tasks.
Part of defining a topology is specifying for each bolt which streams it should receive as input. A stream grouping defines how that stream should be partitioned among the bolt's tasks.
Tuple:
The tuple is the main data structure in Storm. A tuple is a named list of values, where each value can be any type. Tuples are dynamically typed -- the types of the fields do not need to be declared. Tuples have helper methods like getInteger and getString to get field values without having to cast the result. Storm needs to know how to serialize all the values in a tuple. By default, Storm knows how to serialize the primitive types, strings, and byte arrays. If you want to use another type, you'll need to implement and register a serializer for that type. See http://github.com/nathanmarz/storm/wiki/Serialization for more info.
易错提示
builder.setBolt("file-writer", new WriteBolt(), 2)
.shuffleGrouping("transform-character");
如果自定义了grouping
,bolt
就不会默认声明 default
stream
运行相关的概念
worker:
A worker is a progress, is a subset if a topology.
executor:
A executor is a thread, and contains one or more tasks belong to the same component(bolt or spout).
task:
a task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster.
The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time. This means that the following condition holds true: #threads ≤ #tasks. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.
指定task
和executor
的数量
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
.shuffleGrouping("blue-spout");
2是executor,4是task
网友评论