美文网首页
faiss-index

faiss-index

作者: 点点渔火 | 来源:发表于2018-04-12 22:56 被阅读0次

    basic-index

    faiss库包含线性检索方法(BLAS库优化)、哈希方法的实现(LSH)及矢量量化方法的实现(PQ、IVFPQ)。faiss库的一大优势是支持索引的动态增删。
    这里有一个关于大数据量下计算最近邻的工具调研: http://wiki.baidu.com/pages/viewpage.action?pageId=479989012
    所有的索引都可以通过index_factory统一api通过传字符串参数来构建

    Method Class name index_factory Main parameters Bytes/vector Exhaustive Comments
    Exact Search for L2 IndexFlatL2 "Flat" d 4*d yes brute-force
    Exact Search for Inner Product IndexFlatIP "Flat" d 4*d yes also for cosine (normalize vectors beforehand)
    Hierarchical Navigable Small World graph exploration IndexHNSWFlat 'HNSWx,Flat` d, M 4*d + 8 * M no
    Inverted file with exact post-verification IndexIVFFlat "IVFx,Flat" quantizer, d, nlists, metric 4*d no Take another index to assign vectors to inverted lists
    Locality-Sensitive Hashing (binary flat index) IndexLSH - d, nbits nbits/8 yes optimized by using random rotation instead of random projections
    Scalar quantizer (SQ) in flat mode IndexScalarQuantizer "SQ8" d d yes 4 bit per component is also implemented, but the impact on accuracy may be inacceptable
    Product quantizer (PQ) in flat mode IndexPQ "PQx" d, M, nbits M (if nbits=8) yes
    IVF and scalar quantizer IndexIVFScalarQuantizer "IVFx,SQ4" "IVFx,SQ8" quantizer, d, nlists, qtype d or d/2 no there are 2 encodings: 4 bit per dimension and 8 bit per dimension
    IVFADC (coarse quantizer+PQ on residuals) IndexIVFPQ "IVFx,PQy" quantizer, d, nlists, M, nbits M+4 or M+8 no the memory cost depends on the data type used to represent ids (int or long), currently supports only nbits <= 8
    IVFADC+R (same as IVFADC with re-ranking based on codes) IndexIVFPQR "IVFx,PQy+z" quantizer, d, nlists, M, nbits, M_refine, nbits_refine M+M_refine+4 or M+M_refine+8 no

    composite-index

    如上面的例子, 构建索引可以先做L2或者其它量化相似度的索引方式, 然后PQ或者Flat

    coarse_quantizer = faiss.IndexFlatL2 (d)
    index = faiss.IndexIVFPQ (coarse_quantizer, d,
                              ncentroids, code_size, 8)
    index.nprobe = 5
    
    nbits_mi = 12  # c
    M_mi = 2       # m
    coarse_quantizer_mi = faiss.MultiIndexQuantizer(d, M_mi, nbits_mi)
    ncentroids_mi = 2 ** (M_mi * nbits_mi)
    
    index = faiss.IndexIVFFlat(coarse_quantizer_mi, d, ncentroids_mi)
    index.nprobe = 2048
    index.quantizer_trains_alone = True
    

    MultiIndexQuantizer 相比 IndexFlat fast/low-precision

    相关文章

      网友评论

          本文标题:faiss-index

          本文链接:https://www.haomeiwen.com/subject/ctkqkftx.html