basic-index
faiss库包含线性检索方法(BLAS库优化)、哈希方法的实现(LSH)及矢量量化方法的实现(PQ、IVFPQ)。faiss库的一大优势是支持索引的动态增删。
这里有一个关于大数据量下计算最近邻的工具调研: http://wiki.baidu.com/pages/viewpage.action?pageId=479989012
所有的索引都可以通过index_factory统一api通过传字符串参数来构建
Method | Class name | index_factory | Main parameters | Bytes/vector | Exhaustive | Comments |
---|---|---|---|---|---|---|
Exact Search for L2 | IndexFlatL2 | "Flat" | d | 4*d | yes | brute-force |
Exact Search for Inner Product | IndexFlatIP | "Flat" | d | 4*d | yes | also for cosine (normalize vectors beforehand) |
Hierarchical Navigable Small World graph exploration | IndexHNSWFlat | 'HNSWx,Flat` | d, M | 4*d + 8 * M | no | |
Inverted file with exact post-verification | IndexIVFFlat | "IVFx,Flat" | quantizer, d, nlists, metric | 4*d | no | Take another index to assign vectors to inverted lists |
Locality-Sensitive Hashing (binary flat index) | IndexLSH | - | d, nbits | nbits/8 | yes | optimized by using random rotation instead of random projections |
Scalar quantizer (SQ) in flat mode | IndexScalarQuantizer | "SQ8" | d | d | yes | 4 bit per component is also implemented, but the impact on accuracy may be inacceptable |
Product quantizer (PQ) in flat mode | IndexPQ | "PQx" | d, M, nbits | M (if nbits=8) | yes | |
IVF and scalar quantizer | IndexIVFScalarQuantizer | "IVFx,SQ4" "IVFx,SQ8" | quantizer, d, nlists, qtype | d or d/2 | no | there are 2 encodings: 4 bit per dimension and 8 bit per dimension |
IVFADC (coarse quantizer+PQ on residuals) | IndexIVFPQ | "IVFx,PQy" | quantizer, d, nlists, M, nbits | M+4 or M+8 | no | the memory cost depends on the data type used to represent ids (int or long), currently supports only nbits <= 8 |
IVFADC+R (same as IVFADC with re-ranking based on codes) | IndexIVFPQR | "IVFx,PQy+z" | quantizer, d, nlists, M, nbits, M_refine, nbits_refine | M+M_refine+4 or M+M_refine+8 | no |
composite-index
如上面的例子, 构建索引可以先做L2或者其它量化相似度的索引方式, 然后PQ或者Flat
coarse_quantizer = faiss.IndexFlatL2 (d)
index = faiss.IndexIVFPQ (coarse_quantizer, d,
ncentroids, code_size, 8)
index.nprobe = 5
nbits_mi = 12 # c
M_mi = 2 # m
coarse_quantizer_mi = faiss.MultiIndexQuantizer(d, M_mi, nbits_mi)
ncentroids_mi = 2 ** (M_mi * nbits_mi)
index = faiss.IndexIVFFlat(coarse_quantizer_mi, d, ncentroids_mi)
index.nprobe = 2048
index.quantizer_trains_alone = True
MultiIndexQuantizer 相比 IndexFlat fast/low-precision
网友评论