http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html
1.三个组成部分
worker 是topology 的一个子集
A worker process executes a subset of a topology
**worker 属于特定topology **
A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology.
运行的topology 包含运行在很多机器上的进程
A running topology consists of many such processes running on many machines within a Storm cluster.
每个bolt或者spout执行很多个task
each spout or bolt that you implement in your code executes as many tasks across the cluster
一个task就是一个组件(spout or bolt).
threads 数小于等于task数目 #threads ≤ #tasks
. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.
2.
you can configure not only the number of executors but also the number of worker processes and the number of tasks of a Storm topology. We will specifically call out when "parallelism" is used in the normal, narrow definition of Storm.
3.设置并行度
BlueSpout sends its output to GreenBolt, which in turns sends its own output to YellowBolt
.
Paste_Image.png
Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes
topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
.shuffleGrouping("blue-spout");
topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
.shuffleGrouping("green-bolt");
StormSubmitter.submitTopology(
"mytopology",
conf,
topologyBuilder.createTopology()
);
网友评论