美文网首页
2018-09-03

2018-09-03

作者: 赵轩_2a45 | 来源:发表于2018-09-03 13:41 被阅读0次

    Hash Tables

    When we talk about hash tables, we're actually talking about dictionary. While an array can be used to construct hash tables, array indexes its elements using integers. However, if we want to store data and use keys other than integer, such as 'string', we may want to use dictionary.

    Dictionaries in Python are implemented using hash tables. It is an array whose indexes are obtained using a hash function on the keys.

    We declare an empty dictionary like this:

    We declare an empty dictionary like this:

    >>> D = {}

    Then, we can add its elements:

    >>> D['a'] = 1

    >>> D['b'] = 2

    >>> D['c'] = 3

    >>> D

    {'a': 1, 'c': 3, 'b': 2}

    It's a structure with (key, value) pair:

    D[key] = value

    The string used to "index" the hash table D is called the "key". To access the data stored in the table, we need to know the key:

    >>> D['b']

    2

    How we loop through the hash table?

    >>> for k in D.keys():

    ...    print D[k]

    ...

    1

    3

    2

    If we want to print the (key, value) pair:

    >>> for k,v in D.items():

    ...    print k,':',v

    ...

    a : 1

    c : 3

    b : 2

    Hashing from two arrays

    Using two Arrays of equal length, create a Hash object where the elements from one array (the keys) are associated with the elements of the other (the values):

    >>> keys = ['a', 'b', 'c']

    >>> values = [1, 2, 3]

    >>> hash = {k:v for k, v in zip(keys, values)}

    >>> hash

    {'a': 1, 'c': 3, 'b': 2}

    Hashing

    Here are some hashing samples using built-in hash function:

    >>> map(hash, [0, 1, 2, 3])

    [0, 1, 2, 3]

    >>> map(hash, ['0','1','2','3'])

    [6144018481, 6272018864, 6400019251, 6528019634]

    >>> hash('0')

    6144018481

    As we can see from the example, Python is using different hash() function depending on the type of data.

    Python provides hashlib for secure hashes and message digests:

    md5(), sha*():

    >>> import hashlib>>> hashlib.md5('a')>>> hashlib.md5('a').digest()'\x0c\xc1u\xb9\xc0\xf1\xb6\xa81\xc3\x99\xe2iw&a;'>>> hashlib.md5('a').hexdigest()'0cc175b9c0f1b6a831c399e269772661'>>> hashlib.sha512('a')>>> hashlib.sha512('a').digest()

    '\x1f@\xfc\x92\xda$\x16\x94u\ty\xeel\xf5\x82\xf2\xd5\xd7\xd2\x8e\x183]\xe0Z\xbcT\xd0V\x0e\x0fS\x02\x86\x0ce+\xf0\x8dV\x02R\xaa^t!\x05F\xf3i\xfb\xbb\xce\x8c\x12\xcf\xc7\x95{&R;\xfe\x9au'

    >>> hashlib.sha512('a').hexdigest()

    '1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75'

    >>>

    Hashing example code

    Hashing example code

    The following code is a revision from Sets (union/intersection) and itertools - Jaccard coefficient & shingling to check plagiarism. In this section, we used 64 bit integer (hash value from hash()) for the comparison of shingles instead of directly working on the string.

    Output is exactly the same as the one we got using string comparison:

    combinations=[(0, 1), (0, 2), (1, 2)]

    (0, 1) : jaccard=0.0196078431373

    (0, 2) : jaccard=0.0

    (1, 2) : jaccard=0.0

    相关文章

      网友评论

          本文标题:2018-09-03

          本文链接:https://www.haomeiwen.com/subject/ehbbwftx.html