C# 容器——字典

作者: 太刀 | 来源:发表于2021-03-02 16:15 被阅读0次

Day7-字典&集合
Day7 字典和集合
第006篇：字典
python字典与集合
python零基础不要错过，python字典的所有类型！
Python字典操作及方法总结
重温 C# 字典Dictionary类
容器类型-字典
容器类型：字典
2018-08-28 Day7-容器02-字典和集合

字典 Dictionary 是最常见的 键值对 存储容器，Dictionary源码

1. 字典常见用法

        // 声明
        Dictionary<int, string> dict = new Dictionary<int, string>();

        // 添加元素
        dict.Add(1, "abc");
        dict.Add(2, "cde");

        // 下标访问
        dict[1] = "newABc";
        string v = dict[1];

        // 遍历
        foreach(KeyValuePair<int, string> pair in dict)
        {
            Debug.Log(string.Format("key:{0},value:{1}", pair.Key, pair.Value));
        }

        Dictionary<int, string>.Enumerator enumerator = dict.GetEnumerator();
        while (enumerator.MoveNext())
        {
            Debug.Log(string.Format("key:{0},value:{1}", enumerator.Current.Key, enumerator.Current.Value));
        }

        // 查找key
        bool contains = dict.ContainsKey(1);

        // 查找 Value
        bool containsValue = dict.ContainsValue("abc");

        // 取出 Value
        string str = string.Empty;
        contains = dict.TryGetValue(1, out str);

        // 容器长度
        int count = dict.Count;

        // 移除元素
        dict.Remove(1);

        // 情况容器
        dict.Clear();

2. Dictionary 的存储结构

Dictionary的底层存储结构是哈希表，主要是 buckets 和 entries 这两个数组

    private struct Entry
    {
        public int hashCode;    // Lower 31 bits of hash code, -1 if unused
        public int next;        // Index of next entry, -1 if last
        public TKey key;           // Key of entry
        public TValue value;         // Value of entry
    }

    private int[] buckets;
    private Entry[] entries;

数据存储在 entries 数组中，而 buckets 数组是哈希桶数组，在添加元素时，计算元素的哈希值，映射到 buckets 相应位置，如果相同位置有多个元素，会在该位置行程链表，通过 entries 元素的 next 指针串联起来。

3. Dictionary 的扩容机制

3.1 扩容时机

Dictionary 在调用 Add 方法或 [] 添加元素时都会调用到 Insert，如果当前的 entries 数组已满，会调用 Resize 方法进行扩容。

3.2 扩容算法

扩容的逻辑是调用 ExpandPrime 方法确定新的容量，然后重新分配 entries 和 buckets 数组，并将数据复制过去。ExpandPrime当前的实现是返回与当前容量2倍最接近的素数。

public static readonly int[] primes = {
    3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
    1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
    17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
    187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
    1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};

/*......Other Code......*/

// Returns size of hashtable to grow to.
public static int ExpandPrime(int oldSize)
{
    int newSize = 2 * oldSize;
 
    // Allow the hashtables to grow to maximum possible size (~2G elements) before encoutering capacity overflow.
    // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
    if ((uint)newSize > MaxPrimeArrayLength &amp;&amp; MaxPrimeArrayLength > oldSize)
    {
        Contract.Assert( MaxPrimeArrayLength == GetPrime(MaxPrimeArrayLength), "Invalid MaxPrimeArrayLength");
        return MaxPrimeArrayLength;
    }
 
    return GetPrime(newSize);
}

public static int GetPrime(int min) 
{
    if (min < 0)
        throw new ArgumentException(Environment.GetResourceString("Arg_HTCapacityOverflow"));
    Contract.EndContractBlock();
 
    for (int i = 0; i < primes.Length; i++) 
    {
        int prime = primes[i];
        if (prime >= min) return prime;
    }
 
    //outside of our predefined table. 
    //compute the hard way. 
    for (int i = (min | 1); i < Int32.MaxValue;i+=2) 
    {
        if (IsPrime(i) &amp;&amp; ((i - 1) % Hashtable.HashPrime != 0))
            return i;
    }
    return min;
}

为什么要选择素数？
关于这个问题可以参考stackoverflow上的讨论以及这篇博客。
结论就是: 在大多数情况下，对于一组数列 a(n) ，如果它数列长度的公因数数量越少，那么该数列中的数字经散列后形成的新数字在哈希表上分布就越均匀，即发生哈希冲突的概率就越小。而素数的公因数为 1 和它本身，公因数最少，所以每次扩容都尽可能地选择素数作为新容量大小。简单说就是降低哈希冲突发生的概率。

3.3 哈希冲突过多的优化

若 Insert 时发生的哈希冲突次数过多（当前实现是100次），会调用 Resize来重新计算所有元素的 hashCode

if(collisionCount > HashHelpers.HashCollisionThreshold &amp;&amp; HashHelpers.IsWellKnownEqualityComparer(comparer)) 
{
    comparer = (IEqualityComparer<TKey>) HashHelpers.GetRandomizedEqualityComparer(comparer);
    Resize(entries.Length, true);
}

4. Dictionary 操作实现

4.1 添加元素

可使用 Add 或 [] 赋值的方法添加元素，根据 key 计算 hashCode，在 buckets 中查找到对应的位置，存放到 entries 数组中，如果发生冲突，则采用 拉链法 进行链表扩展，未触发扩容时时间复杂度 O(1)，触发扩容则是 O(n)

4.2 删除元素

Remove 方法，计算 key 的 hashCode，在 entries 中找到对应的元素，遍历对应位置的链表，移除元素，移除后空间保留，freeCount + 1，提高内存复用率，移除的时间复杂度 O(1)

4.3 查找元素

根据 Key 查找
ContainsKey，TryGetValue 都会调用 FindEntry 来查找元素，对应实现

private int FindEntry(TKey key) {
            if( key == null) {
                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
            }
 
            if (buckets != null) {
                int hashCode = comparer.GetHashCode(key) &amp; 0x7FFFFFFF;
                for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
                    if (entries[i].hashCode == hashCode &amp;&amp; comparer.Equals(entries[i].key, key)) return i;
                }
            }
            return -1;
        }

根据 key 计算 hashCode，在对应位置遍历链表，时间复杂度 O(1)

根据 Value 查找
ContainsValue 参数没有 key，所以只能进行遍历对比来查找，时间复杂度 O(n)

public bool ContainsValue(TValue value) {
    if (value == null) {
        for (int i = 0; i < count; i++) {
            if (entries[i].hashCode >= 0 &amp;&amp; entries[i].value == null) return true;
        }
    }
    else {
        EqualityComparer<TValue> c = EqualityComparer<TValue>.Default;
        for (int i = 0; i < count; i++) {
            if (entries[i].hashCode >= 0 &amp;&amp; c.Equals(entries[i].value, value)) return true;
        }
    }
    return false;
}

Day7-字典&集合
字典 1.什么是字典(dict) 1)字典字典是容器型数据类型(序列), 将{}作为容器的标志，里面多个元素用逗...
Day7 字典和集合
一、字典 1.什么是字典(dict) 1)字典是容器性数据类型(序列), 将{}作为容器的标志, 里面多个元素用逗...
第006篇：字典
1、什么是字典(dict) 字典是容器型数据类型,将大括号({})作为容器的标志，里面多个元素用逗号隔开，字典中的...
python字典与集合
字典 1.什么是字典 1）字典是容器型数据类型（序列），将{}作为容器的标志，里面的元素用逗号隔开特点：可变（支...
python零基础不要错过，python字典的所有类型！
python字典类型分类: python 字典类型简介字典(dict)是存储key/value数据的容器，也就...
Python字典操作及方法总结
一、字典概念字典是另一种可变容器模型，且可存储任意类型对象，如字符串、数字、元组等其他容器模型。二、创建字典 ...
重温 C# 字典Dictionary类
C# 中使用字典Dictionary来存储键值对的数据。创建字典时需要定义键值对的类型，再添加字典元素时需要符合定...
容器类型-字典
一贴标签的数据二创建字典三访问数据项四。在字典中查找
容器类型：字典
容器类型：字典贴标签的数据 “标签收纳盒” 给数据贴上标签，就可以通过具有特定含义的名字或者别的记号来获取数据 ...
2018-08-28 Day7-容器02-字典和集合
字典和集合一、字典--dict 字典：容器类型，以键值对作为元素。（字典里面存的数据全是以键值对出现的）字典是...