美文网首页
Java HashMap

Java HashMap

作者: bowen_wu | 来源:发表于2022-06-21 15:52 被阅读0次

    哈希表

    • 核心是基于哈希值的桶和链表
    • O(1) 的平均查找、插入、删除时间
    • 致命缺陷是哈希值的碰撞(collision)

    java.long.Object

    Object.equals(Object obj)

    equals 约定 => 对于非空引用值 => Indicates whether some other object is "equal to" this one. The equals method implements an equivalence relation on non-null object references:

    • Reflexive => 自反性 => x.equals(x) == true
    • Symmetric => 对称性 => if x.equals(y) == true 那么 y.equals(x) == true
    • Transitive => 传递性 => x.equals(y) == true && y.equals(z) == true 那么 x.equals(z) == true
    • Consistent => 一致性 => 如果没有更改 equals 方法,那么 x.equals(y) 多次调用结果保持一致
    • x.equals(null) == false

    Object.hashCode()

    hashCode 约定 => Returns a hash code value for the object. This method is supported for the benefit of hash tables
    such as those provided by HashMap. The general contract of hashCode is:

    • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application. => 无论什么时候在相同的对象上调用,在整个 Java 应用执行中, hashCode 方法保持返回相同的值,如果在对象上的 equals 比较中使用的信息没有被修改
    • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. => 如果两个对象根据 equals 方法是相等的,那么这两个对象的 hashCode 的返回值也相同
    • It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that
      producing distinct integer results for unequal objects may improve the performance of hash tables. =>
      如果两个对象根据 equals 方法是不相等的,那么在两个对象上调用 hashCode 方法不一定产生不同的结果 => hashCode 返回值 int,大约42亿个数字,小概率会发生碰撞,

    Java 7 HashMap

    经典的哈希表实现:数组 + 链表

    • Java 7 创建 HashMap 的时候哈希桶并没有开辟出来,当第一次 put 的时候,这些空间才会开辟出来。如果 HashMapempty 则会调用 inflateTable
    • HashMaphashCodeObject.hashCode() 的,返回值是 int,范围是 -2 ^ 31 ~ 2 ^ 31 - 1,约 42 亿个数
    public class HashMap<K, V>
            extends AbstractMap<K, V>
            implements Map<K, V>, Cloneable, Serializable {
    
        /**
         * <p>As a general rule, the default load factor (.75) offers a good tradeoff
         * between time and space costs.  Higher values decrease the space overhead
         * but increase the lookup cost (reflected in most of the operations of the
         * <tt>HashMap</tt> class, including <tt>get</tt> and <tt>put</tt>).  The
         * expected number of entries in the map and its load factor should be taken
         * into account when setting its initial capacity, so as to minimize the
         * number of rehash operations.  If the initial capacity is greater
         * than the maximum number of entries divided by the load factor, no
         * rehash operations will ever occur.
         * 默认的负载因子 0.75 在时间和空间消耗上提供了非常好的平衡,更高的值减少了空间的消耗但是
         * 增加了查找消耗(体现在 HashMap 的大多数操作中,包括 get 和 put)。当设置初始容量时应该
         * 考虑在 map 中的实体数量和负载因子,以减少 rehash 操作的次数。如果初始容量大于最大容量
         * 除以负载因子,rehash 操作将不会发生
         */
        static final float DEFAULT_LOAD_FACTOR = 0.75f;
    
        /**
         * The default initial capacity - MUST be a power of two. => HashMap 默认容量为 16
         */
        static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
    
        /**
         * The next size value at which to resize (capacity * load factor).
         * @serial
         */
        // If table == EMPTY_TABLE then this is the initial capacity at which the
        // table will be created when inflated.
        int threshold;
    
        /**
         * The table, resized as necessary. Length MUST Always be a power of two.
         */
        transient Entry<K, V>[] table = (Entry<K, V>[]) EMPTY_TABLE;
    
        /**
         * An empty table instance to share when the table is not inflated.
         */
        static final Entry<?, ?>[] EMPTY_TABLE = {};
    
        /**
         * Constructs an empty <tt>HashMap</tt> with the default initial capacity
         * (16) and the default load factor (0.75).
         */
        public HashMap() {
            this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
        }
    
        /**
         * Constructs an empty <tt>HashMap</tt> with the specified initial
         * capacity and load factor.
         *
         * @param  initialCapacity the initial capacity
         * @param  loadFactor      the load factor
         * @throws IllegalArgumentException if the initial capacity is negative
         *         or the load factor is nonpositive
         */
        public HashMap(int initialCapacity, float loadFactor) {
            if (initialCapacity < 0)
                throw new IllegalArgumentException("Illegal initial capacity: " +
                        initialCapacity);
            if (initialCapacity > MAXIMUM_CAPACITY)
                initialCapacity = MAXIMUM_CAPACITY;
            if (loadFactor <= 0 || Float.isNaN(loadFactor))
                throw new IllegalArgumentException("Illegal load factor: " +
                        loadFactor);
    
            this.loadFactor = loadFactor;
            threshold = initialCapacity;
            init();
        }
    
        /**
         * Associates the specified value with the specified key in this map.
         * If the map previously contained a mapping for the key, the old
         * value is replaced.
         *
         * @param key key with which the specified value is to be associated
         * @param value value to be associated with the specified key
         * @return the previous value associated with <tt>key</tt>, or
         *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
         *         (A <tt>null</tt> return can also indicate that the map
         *         previously associated <tt>null</tt> with <tt>key</tt>.)
         */
        public V put(K key, V value) {
            if (table == EMPTY_TABLE) {
                inflateTable(threshold);
            }
            if (key == null)
                return putForNullKey(value);
            int hash = hash(key);
            int i = indexFor(hash, table.length);
    
            // 如果找到相同的则覆盖
            for (Entry<K, V> e = table[i]; e != null; e = e.next) {
                Object k;
                if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                    V oldValue = e.value;
                    e.value = value;
                    e.recordAccess(this);
                    return oldValue;
                }
            }
    
            modCount++;
    
            // 没有相同的则在链表中追加一个
            addEntry(hash, key, value, i);
            return null;
        }
    
        /**
         * Inflates the table.
         */
        private void inflateTable(int toSize) {
            // Find a power of 2 >= toSize
            int capacity = roundUpToPowerOf2(toSize);
    
            threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    
            // 数组
            table = new Entry[capacity];
            initHashSeedAsNeeded(capacity);
        }
    
        /*
         * 向上取整为2的幂
         */
        private static int roundUpToPowerOf2(int number) {
            // assert number >= 0 : "number must be non-negative";
            return number >= MAXIMUM_CAPACITY
                    ? MAXIMUM_CAPACITY
                    : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
        }
    
        /**
         * Returns index for hash code h.
         */
        static int indexFor(int h, int length) {
            // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
            return h & (length-1);
        }
    
        /**
         * Adds a new entry with the specified key, value and hash code to
         * the specified bucket.  It is the responsibility of this
         * method to resize the table if appropriate.
         *
         * Subclass overrides this to alter the behavior of put method.
         */
        void addEntry(int hash, K key, V value, int bucketIndex) {
            if ((size >= threshold) && (null != table[bucketIndex])) {
                resize(2 * table.length);
                hash = (null != key) ? hash(key) : 0;
                bucketIndex = indexFor(hash, table.length);
            }
    
            createEntry(hash, key, value, bucketIndex);
        }
    
        /**
         * Rehashes the contents of this map into a new array with a
         * larger capacity.  This method is called automatically when the
         * number of keys in this map reaches its threshold.
         *
         * If current capacity is MAXIMUM_CAPACITY, this method does not
         * resize the map, but sets threshold to Integer.MAX_VALUE.
         * This has the effect of preventing future calls.
         *
         * @param newCapacity the new capacity, MUST be a power of two;
         *        must be greater than current capacity unless current
         *        capacity is MAXIMUM_CAPACITY (in which case value
         *        is irrelevant).
         */
        void resize(int newCapacity) {
            Entry[] oldTable = table;
            int oldCapacity = oldTable.length;
            if (oldCapacity == MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return;
            }
    
            Entry[] newTable = new Entry[newCapacity];
            transfer(newTable, initHashSeedAsNeeded(newCapacity));
            table = newTable;
            threshold = (int) Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
        }
    
        /**
         * Transfers all entries from current table to newTable.
         */
        void transfer(Entry[] newTable, boolean rehash) {
            int newCapacity = newTable.length;
            for (Entry<K, V> e : table) {
                while (null != e) {
                    Entry<K, V> next = e.next;
    
                    // resize 的时候会重新计算 hashCode 值
                    if (rehash) {
                        e.hash = null == e.key ? 0 : hash(e.key);
                    }
                    int i = indexFor(e.hash, newCapacity);
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                }
            }
        }
    
        /**
         * Retrieve object hash code and applies a supplemental hash function to the
         * result hash, which defends against poor quality hash functions.  This is
         * critical because HashMap uses power-of-two length hash tables, that
         * otherwise encounter collisions for hashCodes that do not differ
         * in lower bits. Note: Null keys always map to hash 0, thus index 0.
         */
        final int hash(Object k) {
            int h = hashSeed;
            if (0 != h && k instanceof String) {
                return sun.misc.Hashing.stringHash32((String) k);
            }
    
            h ^= k.hashCode();
    
            // This function ensures that hashCodes that differ only by
            // constant multiples at each bit position have a bounded
            // number of collisions (approximately 8 at default load factor).
            h ^= (h >>> 20) ^ (h >>> 12);
            return h ^ (h >>> 7) ^ (h >>> 4);
        }
    }
    

    Java 7 HashMap 问题

    1. 并发环境中易死锁 => HashMap 死循环
    2. 潜在的安全隐患 => 可以通过精心构造的恶意请求引发 DoS(哈希桶退化到链表) => CVE-2011-4858 + Tomcat 邮件组的讨论 => 如果计算 String 的 hash 时,使用另外一种计算方法

    Java 8 HashMap

    public class HashMap<K, V> extends AbstractMap<K, V>
            implements Map<K, V>, Cloneable, Serializable {
    
        /*
         * Implementation notes.
         *
         * This map usually acts as a binned (bucketed) hash table, but
         * when bins get too large, they are transformed into bins of
         * TreeNodes, each structured similarly to those in
         * java.util.TreeMap. Most methods try to use normal bins, but
         * relay to TreeNode methods when applicable (simply by checking
         * instanceof a node).  Bins of TreeNodes may be traversed and
         * used like any others, but additionally support faster lookup
         * when overpopulated. However, since the vast majority of bins in
         * normal use are not overpopulated, checking for existence of
         * tree bins may be delayed in the course of table methods.
         * 
         * map 通常是 hash table bin,但是当 bin 过大时,他们将转化为 TreeNode 的 bin,
         * 每一个都类似于 TreeNode 中的结构。大多数方法尝试使用 hash table bin,但在适当
         * 的时候使用 TreeNode(只需检查节点的实例),TreeNode 的 bin 可以像其他一样使用和
         * 遍历,但是在数据过多时支持更快的查找。然而,绝大多数正常使用的 bin 没有
         * 过多的数据,因此在 table 方法的过程中检查 tree bin 是否存在会被延迟
         *
         * Tree bins (i.e., bins whose elements are all TreeNodes) are
         * ordered primarily by hashCode, but in the case of ties, if two
         * elements are of the same "class C implements Comparable<C>",
         * type then their compareTo method is used for ordering. (We
         * conservatively check generic types via reflection to validate
         * this -- see method comparableClassFor).  The added complexity
         * of tree bins is worthwhile in providing worst-case O(log n)
         * operations when keys either have distinct hashes or are
         * orderable, Thus, performance degrades gracefully under
         * accidental or malicious usages in which hashCode() methods
         * return values that are poorly distributed, as well as those in
         * which many keys share a hashCode, so long as they are also
         * Comparable. (If neither of these apply, we may waste about a
         * factor of two in time and space compared to taking no
         * precautions. But the only known cases stem from poor user
         * programming practices that are already so slow that this makes
         * little difference.)
         * 
         * Tree bin(即 bin 内的元素都是 TreeNode) 主要按照 hashCode 进行排序,如果
         * 两个元素都是相同的 "class C implements Comparable<C>" 类型,他们的
         * compareTo 用于排序(我们保守地通过反射检查泛型类型来验证这一点,参见方法 
         * compatibleClassFor)。 当 key 具有不同的 hash 值或可排序时,tree bin 
         * 增加的复杂性提供了最坏情况 O(log n) 的操作是值得的,因此,在 hashCode() 
         * 方法返回分布不均的值以及许多键共享一个 hashCode 的意外或恶意使用情况下,
         * 性能会优雅地下降,只要它们也是 Comparable 即可。(如果这些都不适用,与不
         * 采取预防措施相比,我们可能会浪费两倍的时间和空间。但唯一已知的案例源于糟糕
         * 的用户编程实践,这些实践已经很慢,以至于没什么区别)
         *
         * Because TreeNodes are about twice the size of regular nodes, we
         * use them only when bins contain enough nodes to warrant use
         * (see TREEIFY_THRESHOLD). And when they become too small (due to
         * removal or resizing) they are converted back to plain bins.  In
         * usages with well-distributed user hashCodes, tree bins are
         * rarely used.  Ideally, under random hashCodes, the frequency of
         * nodes in bins follows a Poisson distribution
         * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
         * parameter of about 0.5 on average for the default resizing
         * threshold of 0.75, although with a large variance because of
         * resizing granularity. Ignoring variance, the expected
         * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
         * factorial(k)). The first values are:
         *
         * 0:    0.60653066
         * 1:    0.30326533
         * 2:    0.07581633
         * 3:    0.01263606
         * 4:    0.00157952
         * 5:    0.00015795
         * 6:    0.00001316
         * 7:    0.00000094
         * 8:    0.00000006
         * more: less than 1 in ten million
         *
         * The root of a tree bin is normally its first node.  However,
         * sometimes (currently only upon Iterator.remove), the root might
         * be elsewhere, but can be recovered following parent links
         * (method TreeNode.root()).
         * 
         * 因为 TreeNode 的大小是常规节点的两倍,我们只有在 bin 包含了足够的节点时才
         * 使用它。并且他们变得太小(由于移除或调整大小)时,他们将被转换为普通的 bin。
         * 在具有良好分布的 hashCode 的使用中,tree bin 会很少使用。理论上,在随机的
         * hashCode 中,节点在 bin 中的分布遵从泊松分布,参数平均约为 0.5,默认调整
         * 大小阈值是 0.75。尽管由于调整粒度而存在很大差异。忽略方差,列表大小 k 的
         * 预期出现是 (exp(-0.5) * pow(0.5, k) / factorial(k))。第一个值是:
         * 8 之后出现的可能性是千万分之一
         * tree bin 的根一般来说是它的第一个节点,然而,有些时候(目前仅在 Iterator.remove),
         * 根可能会在其他地方,但可以通过父链接恢复(TreeNode.root() 方法)
         *
         * All applicable internal methods accept a hash code as an
         * argument (as normally supplied from a public method), allowing
         * them to call each other without recomputing user hashCodes.
         * Most internal methods also accept a "tab" argument, that is
         * normally the current table, but may be a new or old one when
         * resizing or converting.
         *
         * 所有的内部方法都接受 hash code 作为第一个参数(通常由公共方法提供),允许
         * 他们在不重新计算 hashCode 的情况下相互调用。大部分的内部方法还接受一个
         * tab 参数,通常是当前的 table,但在调整大小或转换时可能是新 table 或旧 table 
         *
         * When bin lists are treeified, split, or untreeified, we keep
         * them in the same relative access/traversal order (i.e., field
         * Node.next) to better preserve locality, and to slightly
         * simplify handling of splits and traversals that invoke
         * iterator.remove. When using comparators on insertion, to keep a
         * total ordering (or as close as is required here) across
         * rebalancings, we compare classes and identityHashCodes as
         * tie-breakers.
         *
         * 当 bin 列表是树化,拆分或未树化时,我们将他们保持在相同的相对访问遍历顺序
         * (即 Node.next 字段)以更好的保留局部性,并且轻微简化调用 Iterator.remove 
         * 的拆分和遍历的处理。当在插入时使用比较器时,为了在重新平衡之间保持总排序
         * (或此处要求的接近),我们将类和 identityHashCodes 比较为决胜局
         * 
         * The use and transitions among plain vs tree modes is
         * complicated by the existence of subclass LinkedHashMap. See
         * below for hook methods defined to be invoked upon insertion,
         * removal and access that allow LinkedHashMap internals to
         * otherwise remain independent of these mechanics. (This also
         * requires that a map instance be passed to some utility methods
         * that may create new nodes.)
         * 
         * 由于子类 LinkedHashMap 的存在,普通模式与树模式之间的使用和转换变得复杂。
         * 请参阅下面定义为在插入、删除和访问时调用的钩子方法,这些方法允许 
         * LinkedHashMap 内部保持独立于这些机制。(这还需要将地图实例传递给一些可能
         * 创建新节点的实用程序方法)
         *
         * The concurrent-programming-like SSA-based coding style helps
         * avoid aliasing errors amid all of the twisty pointer operations.
         */
    
        /**
         * The bin count threshold for using a tree rather than list for a
         * bin.  Bins are converted to trees when adding an element to a
         * bin with at least this many nodes. The value must be greater
         * than 2 and should be at least 8 to mesh with assumptions in
         * tree removal about conversion back to plain bins upon
         * shrinkage.
         */
        // hash table bin 变为 TreeNodes bin 的临界值
        static final int TREEIFY_THRESHOLD = 8;
    
        /**
         * Associates the specified value with the specified key in this map.
         * If the map previously contained a mapping for the key, the old
         * value is replaced.
         *
         * @param key key with which the specified value is to be associated
         * @param value value to be associated with the specified key
         * @return the previous value associated with <tt>key</tt>, or
         *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
         *         (A <tt>null</tt> return can also indicate that the map
         *         previously associated <tt>null</tt> with <tt>key</tt>.)
         */
        public V put(K key, V value) {
            return putVal(hash(key), key, value, false, true);
        }
    
        /**
         * Implements Map.put and related methods.
         *
         * @param hash hash for key
         * @param key the key
         * @param value the value to put
         * @param onlyIfAbsent if true, don't change existing value
         * @param evict if false, the table is in creation mode.
         * @return previous value, or null if none
         */
        final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                       boolean evict) {
            Node<K, V>[] tab;
            Node<K, V> p;
            int n, i;
    
            // 初始化
            if ((tab = table) == null || (n = tab.length) == 0)
                n = (tab = resize()).length;
    
            // 如果是桶内的第一个节点 
            if ((p = tab[i = (n - 1) & hash]) == null)
                tab[i] = newNode(hash, key, value, null);
            else {
                Node<K, V> e;
                K k;
    
                // 如果和第一个节点相等则覆盖
                if (p.hash == hash &&
                        ((k = p.key) == key || (key != null && key.equals(k))))
                    e = p;
    
                    // 如果是一个 TreeNode,执行树节点的插入操作
                else if (p instanceof TreeNode)
                    e = ((TreeNode<K, V>) p).putTreeVal(this, tab, hash, key, value);
                else {
                    // 链表
                    for (int binCount = 0; ; ++binCount) {
                        if ((e = p.next) == null) {
                            p.next = newNode(hash, key, value, null);
    
                            // 如果元素个数超过了变树阈值,将链表变成树
                            if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                                treeifyBin(tab, hash);
                            break;
                        }
                        if (e.hash == hash &&
                                ((k = e.key) == key || (key != null && key.equals(k))))
                            break;
                        p = e;
                    }
                }
                if (e != null) { // existing mapping for key
                    V oldValue = e.value;
                    if (!onlyIfAbsent || oldValue == null)
                        e.value = value;
                    afterNodeAccess(e);
                    return oldValue;
                }
            }
            ++modCount;
            if (++size > threshold)
                resize();
            afterNodeInsertion(evict);
            return null;
        }
    
        /**
         * Initializes or doubles table size.  If null, allocates in
         * accord with initial capacity target held in field threshold.
         * Otherwise, because we are using power-of-two expansion, the
         * elements from each bin must either stay at same index, or move
         * with a power of two offset in the new table.
         *
         * @return the table
         */
        final Node<K, V>[] resize() {
            Node<K, V>[] oldTab = table;
            int oldCap = (oldTab == null) ? 0 : oldTab.length;
            int oldThr = threshold;
            int newCap, newThr = 0;
            if (oldCap > 0) {
                if (oldCap >= MAXIMUM_CAPACITY) {
                    threshold = Integer.MAX_VALUE;
                    return oldTab;
                } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                        oldCap >= DEFAULT_INITIAL_CAPACITY)
                    newThr = oldThr << 1; // double threshold
            } else if (oldThr > 0) // initial capacity was placed in threshold
                newCap = oldThr;
            else {               // zero initial threshold signifies using defaults
                newCap = DEFAULT_INITIAL_CAPACITY;
                newThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
            }
            if (newThr == 0) {
                float ft = (float) newCap * loadFactor;
                newThr = (newCap < MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ?
                        (int) ft : Integer.MAX_VALUE);
            }
            threshold = newThr;
            @SuppressWarnings({"rawtypes", "unchecked"})
            Node<K, V>[] newTab = (Node<K, V>[]) new Node[newCap];
            table = newTab;
            if (oldTab != null) {
                for (int j = 0; j < oldCap; ++j) {
                    Node<K, V> e;
                    if ((e = oldTab[j]) != null) {
                        oldTab[j] = null;
                        if (e.next == null)
                            newTab[e.hash & (newCap - 1)] = e;
                        else if (e instanceof TreeNode)
                            ((TreeNode<K, V>) e).split(this, newTab, j, oldCap);
                        else { // preserve order => 保持顺序
                            // 扩容时,16 -> 32 即 1111 -> 11111,hash 按位与(&)操作之后,低位相同,高位有两种可能 0 或 1,即 hash(1101) & 15 -> hash(1101) & 31,01101 或 11101,1101 是低位
                            // 扩容时,将某位的数据拆开,一部分还在该位置,另一部分去到了相应的高位上。即将一个链表拆成两个链表
                            Node<K, V> loHead = null, loTail = null;
                            Node<K, V> hiHead = null, hiTail = null;
                            Node<K, V> next;
                            do {
                                next = e.next;
                                if ((e.hash & oldCap) == 0) {
                                    if (loTail == null)
                                        loHead = e;
                                    else
                                        loTail.next = e;
                                    loTail = e;
                                } else {
                                    if (hiTail == null)
                                        hiHead = e;
                                    else
                                        hiTail.next = e;
                                    hiTail = e;
                                }
                            } while ((e = next) != null);
                            if (loTail != null) {
                                loTail.next = null;
                                newTab[j] = loHead;
                            }
                            if (hiTail != null) {
                                hiTail.next = null;
                                newTab[j + oldCap] = hiHead;
                            }
                        }
                    }
                }
            }
            return newTab;
        }
    
        /**
         * Computes key.hashCode() and spreads (XORs) higher bits of hash
         * to lower.  Because the table uses power-of-two masking, sets of
         * hashes that vary only in bits above the current mask will
         * always collide. (Among known examples are sets of Float keys
         * holding consecutive whole numbers in small tables.)  So we
         * apply a transform that spreads the impact of higher bits
         * downward. There is a tradeoff between speed, utility, and
         * quality of bit-spreading. Because many common sets of hashes
         * are already reasonably distributed (so don't benefit from
         * spreading), and because we use trees to handle large sets of
         * collisions in bins, we just XOR some shifted bits in the
         * cheapest possible way to reduce systematic lossage, as well as
         * to incorporate impact of the highest bits that would otherwise
         * never be used in index calculations because of table bounds.
         */
        static final int hash(Object key) {
            int h;
            return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
        }
    
        /**
         * Returns the value to which the specified key is mapped,
         * or {@code null} if this map contains no mapping for the key.
         *
         * <p>More formally, if this map contains a mapping from a key
         * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
         * key.equals(k))}, then this method returns {@code v}; otherwise
         * it returns {@code null}.  (There can be at most one such mapping.)
         *
         * <p>A return value of {@code null} does not <i>necessarily</i>
         * indicate that the map contains no mapping for the key; it's also
         * possible that the map explicitly maps the key to {@code null}.
         * The {@link #containsKey containsKey} operation may be used to
         * distinguish these two cases.
         *
         * @see #put(Object, Object)
         */
        public V get(Object key) {
            Node<K, V> e;
            return (e = getNode(hash(key), key)) == null ? null : e.value;
        }
    
        /**
         * Implements Map.get and related methods.
         *
         * @param hash hash for key
         * @param key the key
         * @return the node, or null if none
         */
        final Node<K, V> getNode(int hash, Object key) {
            Node<K, V>[] tab;
            Node<K, V> first, e;
            int n;
            K k;
            if ((tab = table) != null && (n = tab.length) > 0 &&
                    (first = tab[(n - 1) & hash]) != null) {
                if (first.hash == hash && // always check first node
                        ((k = first.key) == key || (key != null && key.equals(k))))
                    return first;
                if ((e = first.next) != null) {
                    if (first instanceof TreeNode)
                        // 树查找
                        return ((TreeNode<K, V>) first).getTreeNode(hash, key);
    
                    // 链表查找
                    do {
                        if (e.hash == hash &&
                                ((k = e.key) == key || (key != null && key.equals(k))))
                            return e;
                    } while ((e = e.next) != null);
                }
            }
            return null;
        }
    }
    

    Java 8 HashMap 的改进

    1. 数组 + 链表 改为 数组 + 红黑树
    2. 扩容时插入顺序的改进 => 保持顺序
    3. 函数方法
      • forEach
      • compute
    4. Map 的新 API
      • merge
      • replace

    散列函数 Hash Function

    从任何一种数据中创建小的数字指纹的方法。该函数将数据打乱混合,重新创建一个叫做散列值的指纹。散列值通常用一个短的随机字母和数字组成的字符串来表示。散列算法所计算出来的散列值具有不可逆

    所有散列函数都有如下一个基本特性:如果两个散列值是不相同的,那么这两个散列值的原始输入也是不相同的。这个特性是散列函数具有确定性的结果,具有这种性质的散列函数称为单向散列函数。但另一方面,散列函数的输入和输出不是唯一对应关系的,如果两个散列值相同,两个输入值很可能是相同的,但也可能不同,这种情况称为 **
    散列碰撞(collision)**,这通常是两个不同长度的输入值,可以计算出相同地输出值

    HashMap 相关问题

    HashMap 的数组大小为什么一定要是2的幂

    1. Java 7 创建 HashMap 的时候哈希桶并没有开辟出来,当第一次 put 的时候,这些空间才会开辟出来。如果 HashMapempty 则会调用 inflateTableinflateTable 会调用 roundUpToPowerOf2,所以如果传入的 toSize 为 17,此时也会将 17 转化为 2 的幂,即32
    2. hash 值类型是 int,从-231 ~ 231 - 1(42亿个数)种可能,如何将这么多的可能性放入到 n 个桶中(即将42亿个数变为 0 ~ n - 1)?
      1. 取余 => i % n => 缺点:
        • A. 负数取余还是负数 => 负数要变为正数
        • B. 较慢 => 比 HashMap 里面的实现慢一点
      2. HashMap 中有 2 ^ n 个哈希桶,之后添加元素的时候将元素添加到哪个桶里面,调用第8行 indexFor(hash, table.length) 从而确定索引。索引值为 hashCode & (length - 1),只有大小为 2 的幂时,length - 1 的二进制才会都是 1,此时进行按位 & 操作,可以快速拿到数组下标并且分布是均匀的。如果不是 2 的幂,那么则有 0,那么按位 & 之后一直是 0,分布不均匀
        • 2n == 1000..
        • 2n - 1 == 1111..
        • hash(32位 int) & 1111 => hash 后 4(1111 的位数) 位(01010...10011 & 11111 == 10011)
    put inflateTable indexFor

    知识点

    1. aka => also known as => 又名
    2. 为什么用数组 + 链表 => 哈希表学院派经典的实现
    3. Hash冲突解决方法 => 碰撞里面再套一个 hash 表
    4. 能否使用 LinkedList 代替数组 => 不可以,数组的随机寻址是 O(1) 的,LinkedList 不支持随机访问,如果使用的话获取某个索引值上的值是 O(n) 的访问时间
    5. 为啥 TREEIFY_THRESHOLD 的值是 8?
      答:因为它符合泊松分布 ,超过8的时候概率会很小
    6. hash 避免了高位不同,低位相同产生严重的碰撞
      hash
    7. 红黑树:二叉平衡树

    相关文章

      网友评论

          本文标题:Java HashMap

          本文链接:https://www.haomeiwen.com/subject/nfojvrtx.html