美文网首页
BM25算法在ES中的应用

BM25算法在ES中的应用

作者: A_You | 来源:发表于2019-06-04 13:51 被阅读0次

bm25相关定义自行查阅,本章着重介绍 Idf 和 tfNorm

idf(term)

idf(term), computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
docCount:查询文档集数目,是所在shard的文档数,而不是以索引为单位,这也就能解释为何同一个term对应的Idf是变化的
docFreq:该 term 所命中文档所在shard的数目

tfNorm

tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:

示例

  • 写入文档
"title": "51信用卡(02051): 51信用卡 月报表截至2019年5月31日止之股份发行人的证券变动月报表(539KB, PDF) 网页链接 - 雪球",
"content": "51信用卡 月报表截至2019年5月31日止之股份发行人的证券变动月报表(539KB, PDF) 网页链接"
"title": "51信用卡管家里的投资靠谱吗?",
"content": "谁用过,试了下还信用卡的券还是挺好用的,投资不晓得靠谱吗"
"title": "51信用卡“费马”营销解决方案获得营销服务奖项",
"content": "5月30日,“2019Topdigital创新盛典”在上海龙之梦大酒店顺利举行,会议邀请了众多业内企业领袖、行业专家等来到现场,围绕“新智能”、“新消费”、“新营销”三大主题展开讨论。 作为金融科技头部上市平台,51信用卡也受邀出席本次会议。同时,51信用卡旗下“费马”全生命周期营销解决方案一举摘得当天产品与服务—营销服务奖项。 随着大数据、云计算、人工智能等技术的快速发展和普及,当今世界已经进入到一个快速发展的数字化时代,银行业进入转型升级周期,但在营销获客层面仍面临较多难题。因此,银行业亟需利用大数据技术进行获客、用户运营、经营模式的改造,以实现降本增效的目标。 据介绍,由51信用卡自主研发的“费马”全生命周期营销解决方案可以凭借技术的力量,准确预测用户的全生命周期的需求,实现用户金融需求的深度挖掘,并提供从原点到全生命周期的全链路营销方案。 在第一阶段,“费马”主要围绕渠道投放进行用户的首次转化。依托大数据技术展开渠道价值量化和精准实时调控,并通过渠道追踪、意图识别、实时计算等步骤,帮助完成用户的首次转化; 在用户完成注册激活过程中,采用自研渠道追踪、意图识别引擎对用户进行首次转化优化,在注册激活后,仰仗短文本挖掘、用户行为挖掘等51信用卡独有的人工智能技术,充分挖掘用户特征,描绘用户画像,进而实现智能触达和精准营销,完成用户全生命周期价值的挖掘。 据悉,该方案已实际作用于51信用卡的合作银行,并帮助合作银行发卡量由此前3年累计的10万张提高到了单年28万张,大约实现年发卡量8倍以上的增长。 未来,针对不同银行的获客体系和获客规律进行定制化的产品设计与配套服务,费马也将以点、线、面的方式提供多元化可供选择的产品模式,进一步扩大该产品的市场覆盖率。同时,随着5G时代的逐步到来,提前储备和加强大数据技术的应用和处理速度。"
  • index settings
{
    "similarity":{
        "my_bm25":{
            "type":"BM25",
            "b":"0"
        }
    }
}
  • single field query
query:
{
    "match_phrase":{
        "content":{
            "query":"51信用卡",
            "slop":0,
            "boost":1
        }
    }
}
response:
{
    "took":10,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":2,
        "max_score":1.5446091,
        "hits":[
            {
                "_shard":"[mf_index_tf_idf][0]",
                "_node":"K2hhx2UGTdWw6mXlYZtNjQ",
                "_index":"mf_index_tf_idf",
                "_type":"docs",
                "_id":"3",
                "_score":1.5446091,
                "_source":{
                    "title":"51信用卡“费马”营销解决方案获得营销服务奖项",
                    "content":"5月30日,“2019Topdigital创新盛典”在上海龙之梦大酒店顺利举行,会议邀请了众多业内企业领袖、行业专家等来到现场,围绕“新智能”、“新消费”、“新营销”三大主题展开讨论。 作为金融科技头部上市平台,51信用卡也受邀出席本次会议。同时,51信用卡旗下“费马”全生命周期营销解决方案一举摘得当天产品与服务—营销服务奖项。 随着大数据、云计算、人工智能等技术的快速发展和普及,当今世界已经进入到一个快速发展的数字化时代,银行业进入转型升级周期,但在营销获客层面仍面临较多难题。因此,银行业亟需利用大数据技术进行获客、用户运营、经营模式的改造,以实现降本增效的目标。 据介绍,由51信用卡自主研发的“费马”全生命周期营销解决方案可以凭借技术的力量,准确预测用户的全生命周期的需求,实现用户金融需求的深度挖掘,并提供从原点到全生命周期的全链路营销方案。 在第一阶段,“费马”主要围绕渠道投放进行用户的首次转化。依托大数据技术展开渠道价值量化和精准实时调控,并通过渠道追踪、意图识别、实时计算等步骤,帮助完成用户的首次转化; 在用户完成注册激活过程中,采用自研渠道追踪、意图识别引擎对用户进行首次转化优化,在注册激活后,仰仗短文本挖掘、用户行为挖掘等51信用卡独有的人工智能技术,充分挖掘用户特征,描绘用户画像,进而实现智能触达和精准营销,完成用户全生命周期价值的挖掘。 据悉,该方案已实际作用于51信用卡的合作银行,并帮助合作银行发卡量由此前3年累计的10万张提高到了单年28万张,大约实现年发卡量8倍以上的增长。 未来,针对不同银行的获客体系和获客规律进行定制化的产品设计与配套服务,费马也将以点、线、面的方式提供多元化可供选择的产品模式,进一步扩大该产品的市场覆盖率。同时,随着5G时代的逐步到来,提前储备和加强大数据技术的应用和处理速度。"
                },
                "_explanation":{
                    "value":1.5446092,
                    "description":"sum of:",
                    "details":[
                        {
                            "value":1.5446092,
                            "description":"weight(content:"51 信 用 卡" in 0) [PerFieldSimilarity], result of:",
                            "details":[
                                {
                                    "value":1.5446092,
                                    "description":"score(doc=0,freq=5.0 = phraseFreq=5.0 ), product of:",
                                    "details":[
                                        {
                                            "value":0.87059784,
                                            "description":"idf(), sum of:",
                                            "details":[
                                                {
                                                    "value":0.47000363,
                                                    "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                    "details":[
                                                        {
                                                         "value":2,
                                                            "description":"docFreq",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":3,
                                                            "description":"docCount",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":0.13353139,
                                                    "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                    "details":[
                                                        {
                                                            "value":3,
                                                            "description":"docFreq",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":3,
                                                            "description":"docCount",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":0.13353139,
                                                    "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                    "details":[
                                                        {
                                                            "value":3,
                                                            "description":"docFreq",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":3,
                                                            "description":"docCount",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":0.13353139,
                                                    "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                    "details":[
                                                        {
                                                            "value":3,
                                                            "description":"docFreq",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":3,
                                                            "description":"docCount",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        },
                                        {
                                            "value":1.7741936,
                                            "description":"tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                            "details":[
                                                {
                                                    "value":5,
                                                    "description":"phraseFreq=5.0",
                                                    "details":[

                                                    ]
                                                },
                                                {
                                                    "value":1.2,
                                                    "description":"parameter k1",
                                                    "details":[

                                                    ]
                                                },
                                                {
                                                    "value":0,
                                                    "description":"parameter b",
                                                    "details":[

                                                    ]
                                                },
                                                {
                                                    "value":270,
                                                    "description":"avgFieldLength",
                                                    "details":[

                                                    ]
                                                },
                                                {
                                                    "value":728,
                                                    "description":"fieldLength",
                                                    "details":[

                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "value":0,
                            "description":"match on required clause, product of:",
                            "details":[
                                {
                                    "value":0,
                                    "description":"# clause",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1,
                                    "description":"DocValuesFieldExistsQuery [field=_primary_term]",
                                    "details":[

                                    ]
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "_shard":"[mf_index_tf_idf][0]",
                "_node":"K2hhx2UGTdWw6mXlYZtNjQ",
                "_index":"mf_index_tf_idf",
                "_type":"docs",
                "_id":"1",
                "_score":0.87059784,
                "_source":{
                    "title":"51信用卡(02051): 51信用卡 月报表截至2019年5月31日止之股份发行人的证券变动月报表(539KB, PDF) 网页链接 - 雪球",
                    "content":"51信用卡 月报表截至2019年5月31日止之股份发行人的证券变动月报表(539KB, PDF) 网页链接"
                },
                "_explanation":{
                    "value":0.87059784,
                    "description":"sum of:",
                    "details":[
                        {
                            "value":0.87059784,
                            "description":"weight(content:"51 信 用 卡" in 0) [PerFieldSimilarity], result of:",
                            "details":[
                                {
                                    "value":0.87059784,
                                    "description":"score(doc=0,freq=1.0 = phraseFreq=1.0 ), product of:",
                                    "details":[
                                        {
                                            "value":0.87059784,
                                            "description":"idf(), sum of:",
                                            "details":[
                                                {
                                                    "value":0.47000363,
                                                    "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                    "details":[
                                                        {
                                                            "value":2,
                                                            "description":"docFreq",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":3,
                                                            "description":"docCount",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":0.13353139,
                                                    "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                    "details":[
                                                        {
                                                            "value":3,
                                                            "description":"docFreq",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":3,
                                                            "description":"docCount",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":0.13353139,
                                                    "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                    "details":[
                                                        {
                                                            "value":3,
                                                            "description":"docFreq",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":3,
                                                            "description":"docCount",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":0.13353139,
                                                    "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                    "details":[
                                                        {
                                                            "value":3,
                                                            "description":"docFreq",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":3,
                                                            "description":"docCount",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        },
                                        {
                                            "value":1,
                                            "description":"tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                            "details":[
                                                {
                                                    "value":1,
                                                    "description":"phraseFreq=1.0",
                                                    "details":[

                                                    ]
                                                },
                                                {
                                                    "value":1.2,
                                                    "description":"parameter k1",
                                                    "details":[

                                                    ]
                                                },
                                                {
                                                    "value":0,
                                                    "description":"parameter b",
                                                    "details":[

                                                    ]
                                                },
                                                {
                                                    "value":270,
                                                    "description":"avgFieldLength",
                                                    "details":[

                                                    ]
                                                },
                                                {
                                                    "value":39,
                                                    "description":"fieldLength",
                                                    "details":[

                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "value":0,
                            "description":"match on required clause, product of:",
                            "details":[
                                {
                                    "value":0,
                                    "description":"# clause",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1,
                                    "description":"DocValuesFieldExistsQuery [field=_primary_term]",
                                    "details":[

                                    ]
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}
  • 解释
可以看到 docCount: 3, docFreq: 2(在content field中,有两篇是命中的); 两篇命中文档的IDF score 是一致的
  • mutilate fields query
query:
[
        {
          "match_phrase": {
            "content": {
              "query": "51信用卡",
              "slop": 0,
              "boost": 1
            }
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "51信用卡",
              "slop": 0,
              "boost": 1
            }
          }
        }
      ]
reponse:
{
    "took":15,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":3,
        "max_score":2.0787346,
        "hits":[
            {
                "_shard":"[mf_index_tf_idf][0]",
                "_node":"K2hhx2UGTdWw6mXlYZtNjQ",
                "_index":"mf_index_tf_idf",
                "_type":"docs",
                "_id":"3",
                "_score":2.0787346,
                "_source":{
                    "title":"51信用卡“费马”营销解决方案获得营销服务奖项",
                    "content":"5月30日,“2019Topdigital创新盛典”在上海龙之梦大酒店顺利举行,会议邀请了众多业内企业领袖、行业专家等来到现场,围绕“新智能”、“新消费”、“新营销”三大主题展开讨论。 作为金融科技头部上市平台,51信用卡也受邀出席本次会议。同时,51信用卡旗下“费马”全生命周期营销解决方案一举摘得当天产品与服务—营销服务奖项。 随着大数据、云计算、人工智能等技术的快速发展和普及,当今世界已经进入到一个快速发展的数字化时代,银行业进入转型升级周期,但在营销获客层面仍面临较多难题。因此,银行业亟需利用大数据技术进行获客、用户运营、经营模式的改造,以实现降本增效的目标。 据介绍,由51信用卡自主研发的“费马”全生命周期营销解决方案可以凭借技术的力量,准确预测用户的全生命周期的需求,实现用户金融需求的深度挖掘,并提供从原点到全生命周期的全链路营销方案。 在第一阶段,“费马”主要围绕渠道投放进行用户的首次转化。依托大数据技术展开渠道价值量化和精准实时调控,并通过渠道追踪、意图识别、实时计算等步骤,帮助完成用户的首次转化; 在用户完成注册激活过程中,采用自研渠道追踪、意图识别引擎对用户进行首次转化优化,在注册激活后,仰仗短文本挖掘、用户行为挖掘等51信用卡独有的人工智能技术,充分挖掘用户特征,描绘用户画像,进而实现智能触达和精准营销,完成用户全生命周期价值的挖掘。 据悉,该方案已实际作用于51信用卡的合作银行,并帮助合作银行发卡量由此前3年累计的10万张提高到了单年28万张,大约实现年发卡量8倍以上的增长。 未来,针对不同银行的获客体系和获客规律进行定制化的产品设计与配套服务,费马也将以点、线、面的方式提供多元化可供选择的产品模式,进一步扩大该产品的市场覆盖率。同时,随着5G时代的逐步到来,提前储备和加强大数据技术的应用和处理速度。"
                },
                "_explanation":{
                    "value":2.0787349,
                    "description":"sum of:",
                    "details":[
                        {
                            "value":2.0787349,
                            "description":"sum of:",
                            "details":[
                                {
                                    "value":1.5446092,
                                    "description":"weight(content:"51 信 用 卡" in 0) [PerFieldSimilarity], result of:",
                                    "details":[
                                        {
                                            "value":1.5446092,
                                            "description":"score(doc=0,freq=5.0 = phraseFreq=5.0 ), product of:",
                                            "details":[
                                                {
                                                    "value":0.87059784,
                                                    "description":"idf(), sum of:",
                                                    "details":[
                                                        {
                                                            "value":0.47000363,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":2,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":1.7741936,
                                                    "description":"tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                                    "details":[
                                                        {
                                                            "value":5,
                                                            "description":"phraseFreq=5.0",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":1.2,
                                                            "description":"parameter k1",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":0,
                                                            "description":"parameter b",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":270,
                                                            "description":"avgFieldLength",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":728,
                                                            "description":"fieldLength",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value":0.53412557,
                                    "description":"weight(title:"51 信 用 卡" in 0) [PerFieldSimilarity], result of:",
                                    "details":[
                                        {
                                            "value":0.53412557,
                                            "description":"score(doc=0,freq=1.0 = phraseFreq=1.0 ), product of:",
                                            "details":[
                                                {
                                                    "value":0.53412557,
                                                    "description":"idf(), sum of:",
                                                    "details":[
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":1,
                                                    "description":"tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                                    "details":[
                                                        {
                                                            "value":1,
                                                            "description":"phraseFreq=1.0",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":1.2,
                                                            "description":"parameter k1",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":0,
                                                            "description":"parameter b",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":28.666666,
                                                            "description":"avgFieldLength",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":22,
                                                            "description":"fieldLength",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "value":0,
                            "description":"match on required clause, product of:",
                            "details":[
                                {
                                    "value":0,
                                    "description":"# clause",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1,
                                    "description":"DocValuesFieldExistsQuery [field=_primary_term]",
                                    "details":[

                                    ]
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "_shard":"[mf_index_tf_idf][0]",
                "_node":"K2hhx2UGTdWw6mXlYZtNjQ",
                "_index":"mf_index_tf_idf",
                "_type":"docs",
                "_id":"1",
                "_score":1.6050205,
                "_source":{
                    "title":"51信用卡(02051): 51信用卡 月报表截至2019年5月31日止之股份发行人的证券变动月报表(539KB, PDF) 网页链接 - 雪球",
                    "content":"51信用卡 月报表截至2019年5月31日止之股份发行人的证券变动月报表(539KB, PDF) 网页链接"
                },
                "_explanation":{
                    "value":1.6050205,
                    "description":"sum of:",
                    "details":[
                        {
                            "value":1.6050205,
                            "description":"sum of:",
                            "details":[
                                {
                                    "value":0.87059784,
                                    "description":"weight(content:"51 信 用 卡" in 0) [PerFieldSimilarity], result of:",
                                    "details":[
                                        {
                                            "value":0.87059784,
                                            "description":"score(doc=0,freq=1.0 = phraseFreq=1.0 ), product of:",
                                            "details":[
                                                {
                                                    "value":0.87059784,
                                                    "description":"idf(), sum of:",
                                                    "details":[
                                                        {
                                                            "value":0.47000363,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":2,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":1,
                                                    "description":"tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                                    "details":[
                                                        {
                                                            "value":1,
                                                            "description":"phraseFreq=1.0",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":1.2,
                                                            "description":"parameter k1",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":0,
                                                            "description":"parameter b",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":270,
                                                            "description":"avgFieldLength",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":39,
                                                            "description":"fieldLength",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "value":0.7344227,
                                    "description":"weight(title:"51 信 用 卡" in 0) [PerFieldSimilarity], result of:",
                                    "details":[
                                        {
                                            "value":0.7344227,
                                            "description":"score(doc=0,freq=2.0 = phraseFreq=2.0 ), product of:",
                                            "details":[
                                                {
                                                    "value":0.53412557,
                                                    "description":"idf(), sum of:",
                                                    "details":[
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":1.375,
                                                    "description":"tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                                    "details":[
                                                        {
                                                            "value":2,
                                                            "description":"phraseFreq=2.0",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":1.2,
                                                            "description":"parameter k1",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":0,
                                                            "description":"parameter b",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":28.666666,
                                                            "description":"avgFieldLength",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":50,
                                                            "description":"fieldLength",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "value":0,
                            "description":"match on required clause, product of:",
                            "details":[
                                {
                                    "value":0,
                                    "description":"# clause",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1,
                                    "description":"DocValuesFieldExistsQuery [field=_primary_term]",
                                    "details":[

                                    ]
                                }
                            ]
                        }
                    ]
                }
            },
            {
                "_shard":"[mf_index_tf_idf][0]",
                "_node":"K2hhx2UGTdWw6mXlYZtNjQ",
                "_index":"mf_index_tf_idf",
                "_type":"docs",
                "_id":"2",
                "_score":0.53412557,
                "_source":{
                    "title":"51信用卡管家里的投资靠谱吗?",
                    "content":"谁用过,试了下还信用卡的券还是挺好用的,投资不晓得靠谱吗"
                },
                "_explanation":{
                    "value":0.53412557,
                    "description":"sum of:",
                    "details":[
                        {
                            "value":0.53412557,
                            "description":"sum of:",
                            "details":[
                                {
                                    "value":0.53412557,
                                    "description":"weight(title:"51 信 用 卡" in 0) [PerFieldSimilarity], result of:",
                                    "details":[
                                        {
                                            "value":0.53412557,
                                            "description":"score(doc=0,freq=1.0 = phraseFreq=1.0 ), product of:",
                                            "details":[
                                                {
                                                    "value":0.53412557,
                                                    "description":"idf(), sum of:",
                                                    "details":[
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        },
                                                        {
                                                            "value":0.13353139,
                                                            "description":"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                                                            "details":[
                                                                {
                                                                    "value":3,
                                                                    "description":"docFreq",
                                                                    "details":[

                                                                    ]
                                                                },
                                                                {
                                                                    "value":3,
                                                                    "description":"docCount",
                                                                    "details":[

                                                                    ]
                                                                }
                                                            ]
                                                        }
                                                    ]
                                                },
                                                {
                                                    "value":1,
                                                    "description":"tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                                                    "details":[
                                                        {
                                                            "value":1,
                                                            "description":"phraseFreq=1.0",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":1.2,
                                                            "description":"parameter k1",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":0,
                                                            "description":"parameter b",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":28.666666,
                                                            "description":"avgFieldLength",
                                                            "details":[

                                                            ]
                                                        },
                                                        {
                                                            "value":14,
                                                            "description":"fieldLength",
                                                            "details":[

                                                            ]
                                                        }
                                                    ]
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "value":0,
                            "description":"match on required clause, product of:",
                            "details":[
                                {
                                    "value":0,
                                    "description":"# clause",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1,
                                    "description":"DocValuesFieldExistsQuery [field=_primary_term]",
                                    "details":[

                                    ]
                                }
                            ]
                        }
                    ]
                }
            }
        ]
    }
}
  • 解释
_score = weight(content:"51 信 用 卡" in 0) [PerFieldSimilarity] + weight(title:"51 信 用 卡" in 0) [PerFieldSimilarity]
分数是两者之和

相关文章

  • BM25算法在ES中的应用

    bm25相关定义自行查阅,本章着重介绍 Idf 和 tfNorm idf(term) idf(term), com...

  • BM25算法在Lucene中的应用

    Lucene是apache软件基金会jakarta项目组的一个子项目,是一个用Java写的全文检索引擎工具包,可以...

  • 经典检索算法:BM25原理

    本文cmd地址:经典检索算法:BM25原理 bm25 是什么? bm25 是一种用来评价搜索词和文档之间相关性的算...

  • BM25下一代Lucene相关性算法

    前言 Lucene自6.0起使用BM25相关性算法代替了之前的TF*IDF相关性算法,切换到BM25之后,基于Lu...

  • BM25算法

    1. bm25 是什么? bm25 是一种用来评价搜索词和文档之间相关性的算法,它是一种基于概率检索模型提出的算法...

  • BM25介绍和代码实现

    一、基础介绍 BM25 是一种用来评价搜索词和文档之间相关性的算法。通常用来做搜索相关性评分的,也是ES(弹性搜索...

  • 文本摘要生成小记

    方法分类: 抽取式(传统基于统计学的)相关算法:Text rank排序算法、BM25算法、TFIDF 生成式(Au...

  • 递归算法在树形结构中的应用(es6)

  • BM25的改造-参照TF

    需求 ElasticSearch 默认使用的是BM25算法进行排序, 参照指标有 IDF、TF、Doc_Lengt...

  • 文本相似度算法-BM25

    BM25算法,通常用于计算两个文本,或者文本与文档之间的相关性.所以可以用于文本相似度计算和文本检索等应用场景.它...

网友评论

      本文标题:BM25算法在ES中的应用

      本文链接:https://www.haomeiwen.com/subject/xnplxctx.html