1. webmagic 官网地址: http://webmagic.io/
2. 引入依耐
<dependency>
<groupId>us.codecraft</groupId>
<artifactId>webmagic-core</artifactId>
<version>0.9.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency><groupId>us.codecraft</groupId>
<artifactId>webmagic-extension</artifactId>
<version>0.9.0</version>
</dependency>
3. 官网demo
![](https://img.haomeiwen.com/i2536235/d47ab20c6c5e4c70.png)
4. 分布式爬虫
![](https://img.haomeiwen.com/i2536235/f506d35083e08bb3.png)
分布式爬虫架构:
![](https://img.haomeiwen.com/i2536235/1034dd7fd9c807d1.png)
分布式爬虫注意点 uuid 在多台机器要一致:
Spider.create(pageProcess)
.setScheduler(new RedisScheduler())
.setUUID(UUID.randomUUID().toString()).run();
网友评论