将文件传送给Solr服务器

作者: 尚亦汐 | 来源:发表于2016-08-05 13:30 被阅读0次

用post.jar将XML或JSON文件传送到Solr服务器

在Solr中对XML和JSON文件的方式非常简单，直接用post.jar就可以了，但是这些文件需要符合一定的格式，如：
　　XML文档

要添加的xml文档示例

文档需要用<add>标签包围起来，告诉Solr需要添加这个文档到索引中。文档的内容被包裹在<doc>标签里面，上面示例的文档为了能使用shema.xml中的Dynamic Field解析，给每个field都添加了后缀。
　　将上面的XML文档添加到Solr中只需要执行java -jar post.jar fileName.xml即可。

JSON文档

Json文档示例

每一个需要创建索引的文档都是一个JSON对象，但是将JSON文件发送给Solr的时候需要执行的语句为：java -Dtype=application/json -jar post.jar fileName.json
　　在-jar之前需要告诉solr这个是一个什么格式的文件。查看post.jar支持的选项，可以运行java -jar post.jar --help.

SolrJ

SolrJ是一个基于java的client library，可以用来与Solr服务器进行连接。
　　一个使用SolrJ将文件添加到solr服务器并执行搜索所有文件的例子：

public class ExampleSolrJClient {
    public static void main(String[] args) throws Exception {
        String serverUrl = (args != null && args.length > 0) ? args[0] : "http://localhost:8983/solr/collection1";
        SolrServer solr = new HttpSolrServer(serverUrl);
        SolrInputDocument doc1 = new SolrInputDocument();
        doc1.setField("id", "1");
        doc1.setField("screen_name_s", "@thelabdude");
        doc1.setField("type_s", "post");
        doc1.setField("lang_s", "en");
        doc1.setField("timestamp_tdt", "2012-05-22T09:30:22Z/HOUR");
        doc1.setField("favorites_count_ti", "10");
        doc1.setField("text_t", "#Yummm :) Drinking a latte " + "at Caffe Grecco in SF's historic North Beach... "
                + "Learning text analysis with#SolrInAction " + "by @ManningBooks on my i-Pad");
        solr.add(doc1);
        SolrInputDocument doc2 = new SolrInputDocument();
        doc2.setField("id", "2");
        doc2.setField("screen_name_s", "@thelabdude");
        doc2.setField("type_s", "post");
        doc2.setField("lang_s", "en");
        doc2.setField("timestamp_tdt", "2012-05-22T09:30:22Z/HOUR");
        doc2.setField("favorites_count_ti", "10");
        doc2.setField("text_t",
                "Just downloaded the ebook of " + "#SolrInAction from @ManningBooks http://bit.ly/T3eGYG "
                        + "to learn more about #Solr http://bit.ly/3ynriE");
        doc2.addField("link_ss", "http://manning.com/grainger/");
        doc2.addField("link_ss", "http://lucene.apache.org/solr/");
        solr.add(doc2);
        solr.commit(true, true);
        for (SolrDocument next : simpleSolrQuery(solr, "*:*", 10)) {
            prettyPrint(System.out, next);
        }
    }

    static SolrDocumentList simpleSolrQuery(SolrServer solr, String query, int rows) throws SolrServerException {
        SolrQuery solrQuery = new SolrQuery(query);
        solrQuery.setRows(rows);
        QueryResponse resp = solr.query(solrQuery);
        SolrDocumentList hits = resp.getResults();
        return hits;
    }

    static void prettyPrint(PrintStream out, SolrDocument doc) {
        List<String> sortedFieldNames = new ArrayList<String>(doc.getFieldNames());
        Collections.sort(sortedFieldNames);
        out.println();
        for (String field : sortedFieldNames) {
            out.println(String.format("\t%s: %s", field, doc.getFieldValue(field)));
        }
        out.println();
    }
}

上面的示例代码分为两个部分，第一部分将文件添加到Solr服务器，第二部分对Solr服务器中的所有文件搜索。
　　将文件添加到Solr服务器：

创建SolrServer对象，指定Solr服务器地址；
　　创建SolrInputDocument对象，创建两个document；
　　用SolrServer对象的add()方法将文件添加到Solr服务器，之后执行commit方法，否则新的文件不会被检索到。

搜索Solr服务器中的所有文件：

创建SolrQuery对象，指定查询语句“:”；
　　创建QueryResponse对象，接收查询结果。

从示例代码可以看出，通过SolrJ接口可以很容易的实现与Solr服务器的连接，文件的添加，查询，以及接收返回的结果。

Data Import Handler(DIH)

DIH是Solr对外部数据源，如网站或关系型数据库的扩展。DIH支持Oracle、Postgres、MySQL或者MS SQL Server等。从高层次来说，你只需要提供数据库的连接参数以及SQL查询语句，DIH对数据库进行查询并将返回的结果转换为document。

Nutch

Nutch是一个基于java的开源爬虫，Nutch可以将web页面爬下来并解析成Solr可以检索的格式。

网友评论

本文标题：将文件传送给Solr服务器

本文链接：https://www.haomeiwen.com/subject/eavhsttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！