美文网首页
lucene源码分析 - core

lucene源码分析 - core

作者: 机器智能 | 来源:发表于2018-09-30 17:31 被阅读0次

    Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

    lucene是高性能,功能全,文本搜索引擎库,这是一个简单的例子使用Lucene索引和搜索(使用junit)

       Analyzer analyzer = new StandardAnalyzer();
    
        Path indexPath = Files.createTempDirectory("tempIndex");
        Directory directory = FSDirectory.open(indexPath)
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter iwriter = new IndexWriter(directory, config);
        Document doc = new Document();
        String text = "This is the text to be indexed.";
        doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
        iwriter.addDocument(doc);
        iwriter.close();
        
        // Now search the index:
        DirectoryReader ireader = DirectoryReader.open(directory);
        IndexSearcher isearcher = new IndexSearcher(ireader);
        // Parse a simple query that searches for "text":
        QueryParser parser = new QueryParser("fieldname", analyzer);
        Query query = parser.parse("text");
        ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs;
        assertEquals(1, hits.length);
        // Iterate through the results:
        for (int i = 0; i < hits.length; i++) {
          Document hitDoc = isearcher.doc(hits[i].doc);
          assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
        }
        ireader.close();
        directory.close();
        IOUtils.rm(indexPath);
    

    The Lucene API is divided into several packages:

    lucene api分散在一下的几个包中

    • {@link org.apache.lucene.analysis} defines an abstract {@link org.apache.lucene.analysis.Analyzer Analyzer} API for converting text from a {@link java.io.Reader} into a {@link org.apache.lucene.analysis.TokenStream TokenStream}, an enumeration of token {@link org.apache.lucene.util.Attribute Attribute}s. A TokenStream can be composed by applying {@link org.apache.lucene.analysis.TokenFilter TokenFilter}s to the output of a {@link org.apache.lucene.analysis.Tokenizer Tokenizer}. Tokenizers and TokenFilters are strung together and applied with an {@link org.apache.lucene.analysis.Analyzer Analyzer}. analyzers-common provides a number of Analyzer implementations, including StopAnalyzer and the grammar-based StandardAnalyzer.

    • {@link org.apache.lucene.codecs} provides an abstraction over the encoding and decoding of the inverted index structure, as well as different implementations that can be chosen depending upon application needs.

    • {@link org.apache.lucene.document} provides a simple {@link org.apache.lucene.document.Document Document} class. A Document is simply a set of named {@link org.apache.lucene.document.Field Field}s, whose values may be strings or instances of {@link java.io.Reader}.

    • {@link org.apache.lucene.index} provides two primary classes: {@link org.apache.lucene.index.IndexWriter IndexWriter}, which creates and adds documents to indices; and {@link org.apache.lucene.index.IndexReader}, which accesses the data in the index.

    • {@link org.apache.lucene.search} provides data structures to represent queries (ie {@link org.apache.lucene.search.TermQuery TermQuery} for individual words, {@link org.apache.lucene.search.PhraseQuery PhraseQuery} for phrases, and {@link org.apache.lucene.search.BooleanQuery BooleanQuery} for boolean combinations of queries) and the {@link org.apache.lucene.search.IndexSearcher IndexSearcher} which turns queries into {@link org.apache.lucene.search.TopDocs TopDocs}. A number of QueryParsers are provided for producing query structures from strings or xml.

    • {@link org.apache.lucene.store} defines an abstract class for storing persistent data, the {@link org.apache.lucene.store.Directory Directory}, which is a collection of named files written by an {@link org.apache.lucene.store.IndexOutput IndexOutput} and read by an {@link org.apache.lucene.store.IndexInput IndexInput}. Multiple implementations are provided, but {@link org.apache.lucene.store.FSDirectory FSDirectory} is generally recommended as it tries to use operating system disk buffer caches efficiently.

    • {@link org.apache.lucene.util} contains a few handy data structures and util classes, ie {@link org.apache.lucene.util.FixedBitSet FixedBitSet} and {@link org.apache.lucene.util.PriorityQueue PriorityQueue}.

    以上单独开文章翻译

    To use Lucene, an application should:

    可以这样在应用中使用Lucene:

    1. Create {@link org.apache.lucene.document.Document Document}s by adding {@link org.apache.lucene.document.Field Field}s;

    2. Create an {@link org.apache.lucene.index.IndexWriter IndexWriter} and add documents to it with {@link org.apache.lucene.index.IndexWriter#addDocument(Iterable) addDocument()};

    3. Call QueryParser.parse() to build a query from a string; and

    4. Create an {@link org.apache.lucene.search.IndexSearcher IndexSearcher} and pass the query to its {@link org.apache.lucene.search.IndexSearcher#search(org.apache.lucene.search.Query, int) search()} method.

    1. 创建Document
    2. 创建indexwriter
    3. 调用queryparser.parse来构建query
    4. 创建一个indexsearcher并调用search方法处理query

    To demonstrate these, try something like:

    > java -cp lucene-core.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles -index index -docs rec.food.recipes/soups 
    adding rec.food.recipes/soups/abalone-chowder 
    添加rec.food.recipes/soups/abalone-chowder 
      [ ... ]
    > java -cp lucene-core.jar:lucene-demo.jar:lucene-queryparser.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles 
    Query: chowder 
    Searching for: chowder 
    搜索chowder
    34 total matching documents 
    34个匹配的文档
    1. rec.food.recipes/soups/spam-chowder 
      [ ... thirty-four documents contain the word "chowder" ... ]
    
    Query: "clam chowder" AND Manhattan 
    Searching for: +"clam chowder" +manhattan 
    2 total matching documents 
    1. rec.food.recipes/soups/clam-chowder 
      [ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ] 
        [ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]
    
    

    相关文章

      网友评论

          本文标题:lucene源码分析 - core

          本文链接:https://www.haomeiwen.com/subject/zxgwoftx.html