H. Pang & K. Mouratidis
VLDB 2008
Motication
- If a search engine is compromised:
- Incomplete results: omitting some legitimate documents;
- Altered ranking: deviating form the correct similarity ranking;
- Spurious results: includes fake documents
Contribution:
A framework for inverted index authentication that can generate an integrity proof for any search result.
Simularity score computation
![](https://img.haomeiwen.com/i4186036/a0f61326cdbb08bb.png)
- Here w(d,t) and w(Q,t) are normalized frequency of term t in document d and query Q, respectively.
- Execution time proportional to the document size n
Inverted index
Frequency-ordered inverted index.
![](https://img.haomeiwen.com/i4186036/41a6ebf6f1cd066d.png)
PSCAN algorithm
![](https://img.haomeiwen.com/i4186036/5b9c9cfc99cf9f02.png)
Modified PSCAN algorithm (TRA)
- Fagin's algorithm (extended)
- Alogirhtm repeatedly reads off the next entry <di, fi> from the inverted list entry Li, with the largest term score.
- The similarity score of document di is immediately computed (by Random Acess) fro mthe document MHT
- Entry <d,s> is entered into the result list R ordered in non-incresing order
- Threshold thres is computed which is an upper bound on the similarity score for any non-encountered document down the inverted list.
- So when thres <= R.sr then the top r matching documents have been found.
TRA Algorithm
- To prove that R satisfies the correctness criterion
- For each result document d, the VO includes the query term frequencies in d so that the user can compute the similarity score.
- For each non-result document d' that occurs upo to the cut-off threshold the VO includes the query term frequencies in d' so that the user can verify that the score is lower than those of the result documents.
- The inverted list entries that correspond to the cut-off threshold.
Term MHT
![](https://img.haomeiwen.com/i4186036/3b32e87ff2972970.png)
Document MHT
![](https://img.haomeiwen.com/i4186036/e5677d6370300023.png)
Drawbacks
- Searh engine has to retrieve entire inverted lists in order to regenerate the complimentary digest of the term-MHTs.
- Leaves of the term-MHTs and document-MHTs are smaller than the upper level digest.
Authentication with Chain MHTs
![](https://img.haomeiwen.com/i4186036/4231d8defd6405f8.png)
- Digest of each block is included in the digest computation of the block immediately ahead of it
- Can be used to verify any j leading blocks of the inverted list by supplying the digest of the j+1 block
网友评论