美文网首页
python查看simhash,minhash转换后的值

python查看simhash,minhash转换后的值

作者: 丙吉 | 来源:发表于2022-01-27 17:42 被阅读0次

    看了下simhash, minhash算法原理。
    查到的大多是直接用它们做计算,但想了解下hash后的值长什么样子。
    https://leons.im/posts/a-python-implementation-of-simhash-algorithm/

    simhash 查其值,用.value

    from simhash import Simhash
    
    def get_features(s):
        width = 3
        s = s.lower()
        s = re.sub(r'[^\w]+', '', s)
        return [s[i:i + width] for i in range(max(len(s) - width + 1, 1))]
    
    print('%x' % Simhash(get_features('How are you? I am fine. Thanks.')).value)
    print('%x' % Simhash(get_features('How are u? I am fine.     Thanks.')).value)
    print('%x' % Simhash(get_features('How r you?I    am fine. Thanks.')).value)
    

    结果如下:

    image.png

    minhash 查看值用,digest()

    from datasketch import MinHashLSHEnsemble, MinHash
    
    m1 = MinHash()
    m2 = MinHash()
    m1.update('How are you? I am fine. Thanks.'.encode('utf8'))
    m2.update('How r you?I am fine. Thanks.'.encode('utf8'))
    print(m1.digest())
    print(m2.digest())
    

    是个128维的向量


    image.png

    查看hashlib中的相关算法

    https://docs.python.org/3.5/library/hashlib.html

    import hashlib
    hashlib.algorithms_guaranteed
    
    image.png

    相关文章

      网友评论

          本文标题:python查看simhash,minhash转换后的值

          本文链接:https://www.haomeiwen.com/subject/ifpahrtx.html