Solr6.3 配置整理

作者: Peter锐 | 来源:发表于2016-12-19 11:35 被阅读452次

Solr6.3 配置整理
Kafka相关文章索引（2）
MVN 配置整理
git配置整理
Chrome配置【整理】
webpack配置整理
Flask配置项说明
prometheus配置详解
在Solr6.3中创建core
Mobile Atlas Creator配置天地图数据源并下载离

根据网上的资料整理如下，对solr的入门配置很有帮助。（6.3版本）

数据源配置

支持MySQL

Solr里关于MySQL的配置主要是两个文件：data-config.xml和managed-schema
data-config.xml

<dataConfig>
 <dataSource name="source1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="******" />
   <document>
     <entity name="goods" query="select * from goods" >
       <field column="id" name="id"/>
       <field column="name" name="name"/>
       <field column="number" name="number"/>
       <field column="updateTime" name="updateTime"/>
     </entity>
   </document>
</dataConfig>

其中source1为数据源自定义名称，没什么约束，type这是固定值，表示JDBC数据源，后面的driver表示JDBC驱动类，这与具体使用的数据库有关，url即JDBC链接URL,后面的user，password分别表示链接数据库的账号密码，下面的entity映射有点类似hiberante或mybatis中的mapping映射，column即数据库表的列名称，name即managed-schema中定义的域名称
managed-schema

<field name="id" type="int" indexed="true" stored="true" required="true" omitNorms="true" />
<field name="name" type="string" indexed="true" stored="true" omitNorms="true"/>
<field name="number" type="int" indexed="true" stored="true" omitNorms="true"/>
<field name="updateTime" type="date" indexed="true" stored="true" omitNorms="true"/>

支持Oracle

Oracle的配置与MySQL类似，唯一的区别是在配置dataSource的时候要修改driver和url的属性如下：

<dataSource driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@//ip:port/database" user="****" password="*****" />

支持word、excel、ppt、pdf、txt等文件

注意，txt文件编码请保证是UTF-8编码，否则无法索引文件内容
data-config.xml配置

<dataConfig>
 <dataSource type="FileDataSource" name="fileDataSource"/>
 <dataSource name="binary" type="BinFileDataSource" />
 <document>
   <entity name="files" dataSource="null" rootEntity="false" processor="FileListEntityProcessor"      baseDir="E:/Docs/搜索引擎/source/"
     fileName=".*.*" onError="skip" recursive="true">
     <field column="fileAbsolutePath" name="filePath"/>
     <field column="file" name="fileName"/>
     <field column="fileSize" name="size"/>
     <field column="fileLastModified" name="lastModified"/>
     <entity name="tika" processor="TikaEntityProcessor"
       dataSource="binary"
       url="${files.fileAbsolutePath}" format="text">
       <field column="Author" name="author" meta="true"/>
       <field column="text" name="text"/>
     </entity>
   </entity>
 </document>
</dataConfig>

baseDir表示目标文件夹的路径，fileName支持使用正则表达式来过滤一些baseDir文件夹下不想被索引的文件，processor是用来生成Entity的处理器，而不同Entity默认会生成不同的Field域。FileListEntityProcessor处理器会根据指定的文件夹生成多个Entity,且生成的Entity会包含fileAbsolutePath, fileSize, fileLastModified, fileName这几个域，recursive表示是否递归查找子目录下的文件，onError表示当出现异常时是否跳过这个条件不处理。
managed-schema配置

<field name="author" type="text_general" indexed="true" stored="true" omitNorms="true"/>
<field name="text" type="textComplex" indexed="true" stored="true" omitNorms="false"/>
<field name="filePath" type="textComplex" indexed="true" stored="true" omitNorms="true"/>
<field name="fileName" type="textComplex" indexed="true" stored="true" omitNorms="true"/>
<field name="size" type="textComplex" indexed="true" stored="true" omitNorms="true"/>
<field name="lastModified" type="date" indexed="true" stored="true" omitNorms="true"/>

混合支持数据库和文档

配置方法与上面类似，只需把所有的entity的配置放在一个document下，并设置好相应的dataSource即可。
data-config.xml配置

<dataConfig>
 <dataSource type="FileDataSource" name="fileDataSource"/>
 <dataSource name="binary" type="BinFileDataSource" />
 <dataSource name="source1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="mysql" />
 <document>
   <entity name="files" dataSource="null" rootEntity="false" processor="FileListEntityProcessor"     baseDir="E:/Docs/搜索引擎/source/"
    fileName=".*.*" onError="skip" recursive="true">
   <field column="fileAbsolutePath" name="filePath"/>
   <field column="file" name="fileName"/>
   <field column="fileSize" name="size"/>
   <field column="fileLastModified" name="lastModified"/>
   <entity name="tika" processor="TikaEntityProcessor"
    dataSource="binary"
    url="${files.fileAbsolutePath}" format="text">
     <field column="Author" name="author" meta="true"/>
     <field column="title" name="title" meta="true"/>
     <field column="text" name="text"/>
   </entity>
   </entity>
   <entity name="goods" dataSource="source1" query="select * from goods" >
     <field column="id" name="fileName"/>
     <field column="name" name="author"/>
     <field column="number" name="size"/>
     <field column="updateTime" name="lastModified"/>
   </entity>
 </document>
</dataConfig>

索引

索引的增量更新

如果数据库中有新的数据插入，可以通过索引的增量更新查询到最新的数据。data-config.xml的配置如下：

<dataConfig>
 <dataSource name="source1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/test" user="root" password="******" />
 <document>
   <entity name="goods" pk="id" dataSource="source1"
     query="select * from goods where isDelete=0"
     deltaImportQuery="select * from goods where id='${dih.delta.id}'"
           deletedPkQuery="select id from goods where isDelete=1"

      deltaQuery="select id from goods where updateTime> '${dataimporter.last_index_time}'">
     <field column="id" name="id"/>
     <field column="name" name="name"/>
     <field column="number" name="number"/>
     <field column="updateTime" name="updateTime"/>
   </entity>
 </document>
</dataConfig>

参数说明：

pk="ID" ，增量索引查询主键ID时需要
query：用于全量导入而非增量导入
query="select * from goods WHERE isdelete=0
query查询是指查询出表里所有的符合条件的数据，因为有删除业务，所以
where后面有一个限定条件isdelete=0，意思为查询未被删除的数据
deltaQuery：用于增量导入且只返回ID
deltaQuery="select ID from goods where updateTime > '${dih.last_index_time}'"
deltaQuery的意思是，查询出所有经过修改的记录的ID，可能是修改操作，添加操作，删除操作产生的
deletedPkQuery : 用于增量导入且只返回ID
deletedPkQuery="select ID from goods where isdelete=1"
此操作只查询那些数据库里伪删除的数据的ID（即isdelete标识为1的数据）
solr通过它来删除索引里面对应的数据
deltaImportQuery：增量导入起作用，可以返回多个字段的值,一般情况下，都是返回所有字段的列
deltaImportQuery="select * from goods where ID='${dih.delta.ID}'"
deltaImportQuery查询是获取以上两步的ID，然后把其全部数据获取，根据获取的数据
对索引库进行更新操作，可能是删除，添加，修改

其中内置变量${dataimporter.last_index_time}用来记录最后一次索引的时间(包括全量与增量)，${dih.delta.id}记录本次要索引的id，并保存在core/conf/dataimport.properties文件中，该文件为自动创建。
注：该增量更新机制要求数据库表中存在updateTime字段，即记录每条数据插入的时间。如果需要增量删除机制，还需要在数据表中添加isDelete字段，用于标识记录是否需要删除。

索引定时更新机制

索引更新的配置文件dataimport.properties位于solr_home/conf目录下，具体的参数说明如下：

################################################
# dataimport scheduler properties #
#################################################

# to sync or not to sync
# 1 - active; anything else - inactive
syncEnabled=1

# which cores to schedule
# in a multi-core environment you can decide which cores you want syncronized
# leave empty or comment it out if using single-core deployment
# 多个core之间用逗号隔开
syncCores=file

# solr server name or IP address
# [defaults to localhost if empty]
server=localhost

# solr server port
# [defaults to 80 if empty]
port=8080

# application name/context
# [defaults to current ServletContextListener's context (app) name]
webapp=solr

# URL params [mandatory]
# remainder of URL
#增量
params=/dataimport?command=delta-import&clean=false&commit=true&optimize=false&wt=json&indent=true&verbose=false&debug=false

# schedule interval
# number of minutes between two runs
# [defaults to 30 if empty]
interval=2

# 重做索引的时间间隔，单位分钟，默认7200，即1天;
# 为空,为0,或者注释掉:表示永不重做索引
reBuildIndexInterval=7200

# 重做索引的参数
reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true&optimize=true&wt=json&indent=true&verbose=false&debug=false

# 重做索引时间间隔的计时开始时间，第一次真正执行的时间=reBuildIndexBeginTime+reBuildIndexInterval*60*1000；
# 两种格式：2012-04-11 03:10:00 或者 03:10:00，后一种会自动补全日期部分为服务启动时的日期
reBuildIndexBeginTime=15:10:00

搜索

关键词高亮、返回关键词的上下文

设置参数hl=on开启高亮，并可通过调整hl.fragsize的大小来设置返回上下文的字符数
例：http://localhost:8080/solr/file/select?hl=on&indent=on&q=solr&wt=json&hl.fragsize=20
hl其他参数说明：

hl.fl: 用空格或逗号隔开的字段列表。要启用某个字段的highlight功能，就得保证该字段在schema中是stored。如果该参数未被给出，那么就会高亮默认字段。可以使用星号高亮所有字段。如果使用了通配符，那么要考虑启用hl.requiredFieldMatch选项。
hl.requireFieldMatch: 如果置为true，除非该字段的查询结果不为空才会被高亮。它的默认值是false，意味着它可能匹配某个字段却高亮一个不同的字段。如果hl.fl使用了通配符，那么就要启用该参数。尽管如此，如果查询是all字段（可能是使用copy-field 指令），那么还是把它设为false，这样搜索结果能表明哪个字段的查询文本未被找到。
hl.simple.pre和hl.simple.post: 高亮内容与关键匹配的地方，默认将会被"<em>"和"</em>"包围。可以使用hl.simple.pre和hl.simple.post参数设置前后标签。

排序

使用参数sort，后面加上要排序的字段和顺序(desc或asc)
例：http://localhost:8080/solr/file/select?indent=on&q=:&wt=json&sort=lastModified desc或asc

分页

配合使用参数start和rows实现分页。

start: 指定第一条记录在所有记录的位置。
rows: 一次性取出多少条数据。
start=20&rows=10表示取20到30之间的数据

Solr6.3 配置整理
根据网上的资料整理如下，对solr的入门配置很有帮助。（6.3版本）数据源配置支持MySQL Solr里关于M...
Kafka相关文章索引（2）
基本常识 kafka主要配置 Kafka配置说明 Kafka学习整理四(Producer配置) Kafka学习整理...
MVN 配置整理
一：常用命令：清空：mvn clean编译：mvn compile打包：mvn package打包并安装到本地仓库...
git配置整理
GIT使用（第一次）一、设置Git的user name和email： $ git config --globa...
Chrome配置【整理】
转自：这 15 条实用命令，帮你打开 Chrome 浏览器的隐藏功能相关命令显示当前版本浏览器地址栏输入并打...
webpack配置整理
开始准备为了从JavaScript模块中import一个CSS文件，需要安装并添加style-loader和cs...
Flask配置项说明
Flask的配置项 Flask-SQLAlchemy配置项整理参考： http://docs.jinkan.or...
prometheus配置详解
本文按照官方文档的相关内容整理整理的配置语法以及实现功能 1.prometheus 配置文件主体 2.scrape...
在Solr6.3中创建core
一、新建core 新建的solrhome文件的位置为：D:\solrhome 创建new_core文件夹复制D...
Mobile Atlas Creator配置天地图数据源并下载离
流程整理 1. 配置文件信息框架梳理： customMultiLayerMapSource：配置多层数据源标签；...

Solr6.3 配置整理

数据源配置

支持MySQL

支持Oracle

支持word、excel、ppt、pdf、txt等文件

混合支持数据库和文档

索引

索引的增量更新

索引定时更新机制

搜索

关键词高亮、返回关键词的上下文

排序

分页

相关文章

Solr6.3 配置整理

Kafka相关文章索引（2）

MVN 配置整理

git配置整理

Chrome配置【整理】

webpack配置整理

Flask配置项说明

prometheus配置详解

在Solr6.3中创建core

Mobile Atlas Creator配置天地图数据源并下载离

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

solr