美文网首页
架构师训练营第五周作业 一致性Hash算法

架构师训练营第五周作业 一致性Hash算法

作者: 浩哥有料 | 来源:发表于2020-07-08 23:59 被阅读0次

    用你熟悉的编程语言实现一致性 hash 算法。
    编写测试用例测试这个算法,测试 100 万 KV 数据,10 个服务器节点的情况下,计算这些 KV 数据在服务器上分布数量的标准差,以评估算法的存储负载不均衡性。

    # coding:utf-8
    
    # Import built-in modules
    import hashlib
    
    # Import third-party modules
    from numpy import std
    
    
    def calculate_hash(key):
        """Function to calculate hash code from a string.
    
        Different algorithm will affect much on final result.
    
        Args:
            key (str): String to be used to calculate hash code from.
    
        """
        # Calculate SHA-256 code from the string and convert it to int.
        return int(hashlib.sha256(key).hexdigest(), 16)
    
    
    class CacheSystem(object):
        """Class that simulates a distributed cache system."""
    
        def __init__(self, key_count):
            """Initialize the cache system."""
    
            # List to store all the physical cache nodes.
            self.nodes = []
    
            # Hash circle.
            self.hash_circle = {}
    
            # Default virtual nodes count for each cache node is 200.
            self.virtual_nodes_count = 200
    
            # Total count of the data keys
            self._key_count = key_count
    
        def add_node(self, node):
            """Add a physical cache node into this system.
    
            Use Name of the node as argument.
    
            Args:
                node (str): Name of the physical node.
            """
            for i in range(self.virtual_nodes_count):
                hashcode = calculate_hash('{}->vn{}'.format(node, i))
                self.hash_circle[hashcode] = node
            if node not in self.nodes:
                self.nodes.append(node)
    
        def _fill_data(self):
            """Fill the hash circle with data keys."""
            for i in range(self._key_count):
                key = 'key{}'.format(i)
                hashcode = calculate_hash(key)
                self.hash_circle[hashcode] = key
    
        def setup(self, count):
            """Setup/reset the cache system.
    
            Args:
                count (int): Specific virtual nodes count for 
                    each physical node.
            """
            self.virtual_nodes_count = count
            self.hash_circle = {}
            self.nodes = []
            self._fill_data()
    
    
    class TestCacheSystem(object):
        """Class to test the cache system and hash algorithm."""
    
        def __init__(self, key_count=1000000):
            """Initialize the test handler.
    
            Args:
                key_count (int): Number of the data keys.
            """
            self.cache_system = CacheSystem(key_count)
    
        def _add_nodes(self, nodes_count):
            """Add specified amount of physical nodes into cache system.
    
            Args:
                nodes_count (int): Number of the physical nodes.
            """
            for i in range(nodes_count):
                self.cache_system.add_node('Node-{}'.format(i))
    
        def calculate(self, nodes_count, virtual_nodes_count):
            """Calculate the standard deviation of the keys each node stores.
            
            Args:
                nodes_count (int): Number of the physical nodes.
                virtual_nodes_count (int): Number of virtual nodes for each 
                    physical node.
    
            """
            self.cache_system.setup(virtual_nodes_count)
            self._add_nodes(nodes_count)
    
            hashcodes = sorted(self.cache_system.hash_circle.keys())
            nodes = {node:0 for node in self.cache_system.nodes}
            i = 0
            first_node = None
            for hc in hashcodes:
                if self.cache_system.hash_circle[hc].startswith('key'):
                    i += 1
                    continue
                node = self.cache_system.hash_circle[hc].split('->')[0]
                if not first_node:
                    first_node = node
                nodes[node] = nodes[node] + i
                i = 0
    
            if i != 0:
                nodes[first_node] = nodes[first_node] + i
    
            print "物理节点数:", nodes_count
            print "每个物理节点分配的虚拟节点数量:", self.cache_system.virtual_nodes_count
            print "每个物理节点存储的KV数据数量:\n"
    
            node_keys = []
            for node in sorted(nodes.keys()):
                print node, nodes[node]
                node_keys.append(nodes[node])
    
            print "\n当前配置下1000000万KV数据的分布标准差:", std(node_keys, ddof=1)
    
    
    test = TestCacheSystem()
    test.calculate(10, 500)
    

    运行结果:

    物理节点数: 10
    每个物理节点分配的虚拟节点数量: 500
    每个物理节点存储的KV数据数量:
    
    Node-0 101833
    Node-1 99428
    Node-2 95766
    Node-3 97259
    Node-4 100356
    Node-5 99638
    Node-6 102234
    Node-7 104631
    Node-8 102526
    Node-9 96329
    
    当前配置下1000000万KV数据的分布标准差: 2899.7801449228677
    
    物理节点数: 10
    每个物理节点分配的虚拟节点数量: 200
    每个物理节点存储的KV数据数量:
    
    Node-0 104533
    Node-1 94639
    Node-2 103166
    Node-3 102795
    Node-4 89517
    Node-5 99433
    Node-6 104803
    Node-7 110528
    Node-8 95187
    Node-9 95399
    
    当前配置下1000000万KV数据的分布标准差: 6285.614369335745
    

    结论

    不同的Hash算法、不同的虚拟节点数量、不同的虚拟节点命名方式都会影响到标准差结果。

    相关文章

      网友评论

          本文标题:架构师训练营第五周作业 一致性Hash算法

          本文链接:https://www.haomeiwen.com/subject/gdklcktx.html