in previous lecture, we learn symbol table implemented by bst. and have lg n time complexity in worst case.
Can we do better?
yes, but with different access to the data.
it is the famous data structure, hash table.
the key of hash table is the hash function, the issue of hash function is the hash collision. so when implement hash table, we need to handle two keys that hash to the same array index.
so i will introduce hash function.
hash function
hash function is a function convert any object to a int.
idealistic goal is that scramble the keys uniformly to produce a table index
image.png image.png image.png
image.png
uniform hash assumption
image.pngseparate chainning
image.pngimage.png
linear probing
open addressing, when a new key collides, find next empty slot, and put it there
image.png
knuth's parking problem
cars arrive at one-way street with M parking spaces. each desire a random space i: if space i is taken, try i+1, i+2 etc
what is mean displacement of a car?
image.png
image.png
algorithmic complexity attack on java
goal: find family of strings with the same hash code.
image.png
so we need a one-way hash function, it is hard to find a key that will hash to a desired value(or two keys that hash to some value)
spearate chaining vs linear probing
image.pngQUIZ
Q1
image.pngwe assume we have uniform hashing assumption, then we can use hashmap to record all the n^2 pair , then we can use O(N) to find the result, be care find the two pair with some same indices.
Q2
image.pngif you override hashCode() but not equals(), only same reference will be dedup in hashmap.
if you override equals() but not hashCode().
only same reference will be dedup, but equal may return false on same reference
if you override hashCode() but implement wrong equals signature
public boolean equals(OlympicAthletethat)
have no use in dedup of hashmap.
Searching application
exception filter application
read in a list of words from one file, print out all words from standard input that are {in, not in} the list
image.png
and dictionary lookup
image.png
File indexing
goal: given a list of files specified, create an index so that you can efficiently find all files containing a given query string
image.png
Concordance
goal: preprocess a text corpus to support concordance queries: given a word, find all occurrences with their immediate contexts
image.png
Sparse matrix-vector multiplication
assumption: matrix dimension is 10,000; average nonzeros per row ~10
image.png image.png
image.png
course quiz and project : https://github.com/yixuaz/algorithm4-princeton-cos226
网友评论