用到2个jar包,本别是 lucene-core 和 IKAnalyzer-lucene ,版本号一定要对应,见pox.xml的版本号
我这里用的maven仓库地址是: https://maven.aliyun.com/repository/central 和 https://maven.aliyun.com/repository/public
pox.xml里面的配置如下:
<dependency>
<groupId>com.jianggujin</groupId>
<artifactId>IKAnalyzer-lucene</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>7.6.0</version>
</dependency>
<repositories>
<repository>
<id>Ali_central</id>
<name>Alibaba central</name>
<url>https://maven.aliyun.com/repository/central</url>
</repository>
<repository>
<id>Ali_public</id>
<name>Alibaba public</name>
<url>https://maven.aliyun.com/repository/public</url>
</repository>
</repositories>
代码也比较简单
public class TestFenci {
private static Logger logger = Logger.getLogger(TestFenci.class);
@Test
public void fenci() throws IOException {
String text = "中国空间站将于今年完成在轨建造 扎实迈好每一步";
//创建分词对象
Analyzer anal = new IKAnalyzer(true);
StringReader reader = new StringReader(text);
//分词
TokenStream ts = anal.tokenStream("", reader);
ts.reset();
CharTermAttribute term = ts.getAttribute(CharTermAttribute.class);
//遍历分词数据
while (ts.incrementToken()) {
System.out.print(term.toString() + "|");
}
reader.close();
System.out.println();
}
}
结果如下:
分词.png
网友评论