美文网首页
IR-chapter7: computing scores in

IR-chapter7: computing scores in

作者: woodsouthmmm | 来源:发表于2017-05-01 18:48 被阅读0次

    efficient scoring and ranking

    FastCosineScore
    • constructing a heap to pick out top K components
    • Inexact top K document retrieval
    • index elimination
      considering documents containing terms whose idf exceeds a preset threshold
      considering documents containing many(even all) terms
    • champion list
      precompute r documents with the highest weights for each term.
      r does not to be the same for every term.(rarer term, larger)
    • static quality scoring and ordering
      net-score
      global champion list, expansion two lists sorted by g(d) value
    • impact ordering
      sorted by common ordering: document-at-a-time scoring
      sorted by uncommon ordering: term-at-a-time scoring
      ordered by a decreasing tf value,advantage:
      1.stop after considering a prefix of posting list
      2.consedering query terms in decreasing order of idf.
    • cluster pruning
      pick ,compute nearest, cluster, computing cosine similarity from q to each leader, then the closest L and its follower
      variation - b1,b2

    components of an information retrieval system

    • tiered indexes
      motivation: A has fewer than K documents
      solution: we set a tf threshold of 20 for tier 1 and 10 for tier 2, meaning that the tier 1 index only has postings entries with tf values exceeding 20, and the tier 2 index only has postings entries with tf values exceeding 10.
    tiered indexes
    • designing parsing and scoring function
      query parser - translate the user-specified keywords into a query with various operators
      scoring function - manual configuration or machine-learned scoring

    • putting it all together

    a complete search system

    results snippets: snippets of text accompanying each document in the results list for a query.

    Vector space scoring and query operator interaction

    Google: the semantics of a conjunctive query that only retrieves documents containing all or most query terms.

    • Boolean retrieval
    • wildcard queries
    • phase queries

    相关文章

      网友评论

          本文标题:IR-chapter7: computing scores in

          本文链接:https://www.haomeiwen.com/subject/wisizttx.html