美文网首页
IR-chapter6: sorting, term weigh

IR-chapter6: sorting, term weigh

作者: woodsouthmmm | 来源:发表于2017-04-25 23:58 被阅读0次
  • motivation: to rank-order the documents matching a query by giving a score to each (query,document) pair

parametric and zone indexes

  • index and retrieve documents by metadata.
  • parametric index vs zone index: fixed vocabulary, whatever vocabulary from the text of that zone.
parametric search zone index zone index
  • weighted zone scoring
  • learning weights
  • the optimal weight g
    machine learning algorithm

term frequency and weighting

  • intuition: scores relate to term frequency, but are all words equally important?
  • free text query: document - the set of weights, bag of words model
    score = the sum of all terms
  • inverse document frequency
  • tf-idf weighting
    terms with lower document frequency weigh higher
tf-idf

the vector space for scoring

  • dot products : similarity between two documents
    the magnitude of the vector difference? the effect of document length.
cosine similarity length-normalize cosine similarity
  • query as vectors
    computation is expensive

  • computing vector scores

basic algorithm

Variant tf–idf functions

SMART notation for tf–idf variants.
  • Pivoted normalized document length
    the relationship between document length and relevance
Pivoted normalized document length

linear model
machine learning techniques

相关文章

网友评论

      本文标题:IR-chapter6: sorting, term weigh

      本文链接:https://www.haomeiwen.com/subject/jknvzttx.html