你懂HashCode？你不懂！

作者: Crazy丶Mark | 来源:发表于2020-11-17 13:52 被阅读0次

你懂HashCode？你不懂！
2017-09-29
学会提问，一层一层剥开自己的心
你不懂我
女神
你
懂
2017-08-15
羊的孤独
懂是什么东西

在讨论hashCode之前，我先提出几个问题，后面我会一一解答

什么是hashCode?
hashCode有什么用？
为什么会发生哈希冲突，可以避免吗？
为什么重写equals()方法一定要重写hashCode()方法？
hashCode在Java中是如何生成的？

在讨论上述问题之前，我们先看看java里面是如何对hashCode定义的，下面我摘抄一段java8 Object对hashCode()的一段注释

    /**
     * Returns a hash code value for the object. This method is
     * supported for the benefit of hash tables such as those provided by
     * {@link java.util.HashMap}.
     * <p>
     * The general contract of {@code hashCode} is:
     * <ul>
     * <li>Whenever it is invoked on the same object more than once during
     *     an execution of a Java application, the {@code hashCode} method
     *     must consistently return the same integer, provided no information
     *     used in {@code equals} comparisons on the object is modified.
     *     This integer need not remain consistent from one execution of an
     *     application to another execution of the same application.
     * <li>If two objects are equal according to the {@code equals(Object)}
     *     method, then calling the {@code hashCode} method on each of
     *     the two objects must produce the same integer result.
     * <li>It is <em>not</em> required that if two objects are unequal
     *     according to the {@link java.lang.Object#equals(java.lang.Object)}
     *     method, then calling the {@code hashCode} method on each of the
     *     two objects must produce distinct integer results.  However, the
     *     programmer should be aware that producing distinct integer results
     *     for unequal objects may improve the performance of hash tables.
     * </ul>
     */

我把他的意思大致归纳于以下几点

每个对象都能返回一个hashCode，hashCode是用来加速哈希表的
在同一java应用程序执行期间，对同一个对象调用超过一次时，如果equals()比较的内容没有发生变化，hashCode()的返回应该也不应该发生变化。（特别需要注意的是同一java程序，不同java程序可以允许hashCode发生变化）
根据equals()方法，两个对象相等，那么hashCode也必须要相等
根据equals()方法，两个对象不相等，那么hashCode可能相等，但是要尽可能的让他们不相等，这样可以提升哈希表的性能

什么是hashCode?

hashCode是用来加速哈希表的，在同一个java应用程序中，hashCode是对java对象的一个签名，相等的对象hashCode一定相等。

hashCode有什么用？

上面也已经提到了，是用来加速哈希表的，采取了空间换时间的方式。

这一篇博客不是说哈希表的，因此这里不做展开

大致思路是先通过hashcode去哈希表里面找到对应的哈希桶，然后调用equals()方法去判断哈希桶里面挂载的元素有没有相等的。

为什么相等的元素会出现在同一个哈希桶里面？

前文已经说过了，相等的对象hashcode一定相等，hashcode相等的，对象却不一定相等，因此需要再调用equals()方法做判断

为什么会发生哈希冲突，可以避免吗？

先来看一下java里面hashCode的方法签名

public native int hashCode();

可以看到hashCode本质上面来说是一个int，int的表示的数据范围是有限的，而对象是无限的，把一个无限的集合压缩进一个有限的集合中，发生哈希冲突就成了一个必然事件，哈希冲突是无法避免的。

当然，在我们的应用中，对象不可能是无穷无尽的，因此选择好的哈希算法是可以降低哈希冲突的概率的

为什么重写equals()方法一定要重写hashCode()方法？

这个问题实际上前面已经多次提到了，根据java规范，equals()相等的对象，hashCode()必须相等

重写equals()必然导致对象是否相等的结果发生变化，因此也就需要修改hashCode()方法

如果只修改equals()不修改hashCode()方法，就会导致使用哈希表的时候不能准确定位到哈希桶，导致哈希表工作异常

hashCode在Java中是如何生成的？

这个问题其实很复杂，我再摘抄一段java8 Object里面的注释

    /**
     * As much as is reasonably practical, the hashCode method defined by
     * class {@code Object} does return distinct integers for distinct
     * objects. (This is typically implemented by converting the internal
     * address of the object into an integer, but this implementation
     * technique is not required by the
     * Java&trade; programming language.)
     */

这句话的意思是说，很多时候hashcode是对象内存地址的一个映射，但是java里面不是的。

那么java里面的hashCode是如何生成的呢？

实际上java默认的hashcode是由一套随机算法生成的，只与对象生成的顺序和线程有关。

下面我用代码证明下

    public static void main(String[] args) {
        for (int i = 0; i < 5; i++) {
            Object o = new Object();
            System.out.println("hashCode:" + o.hashCode());
            System.out.println("address:" + VM.current().addressOf(o));
        }
    }

可以直接用Unsafe类拿到内存地址，不过操作比较繁琐，因此我用了别人封装的工具包

        <dependency>
            <groupId>org.openjdk.jol</groupId>
            <artifactId>jol-core</artifactId>
            <version>0.10</version>
        </dependency>

反复执行上面代码，结果如下

第一次	第二次	第三次
hashCode:531885035 address:31856529176 hashCode:705265961 address:31874395832 hashCode:428746855 address:31874396520 hashCode:317983781 address:31874397208 hashCode:987405879 address:31874397896	hashCode:531885035 address:31856529344 hashCode:705265961 address:31874395856 hashCode:428746855 address:31874396544 hashCode:317983781 address:31874397232 hashCode:987405879 address:31874397920	hashCode:531885035 address:31856529920 hashCode:705265961 address:31874407200 hashCode:428746855 address:31874407888 hashCode:317983781 address:31874408576 hashCode:987405879 address:31874409264

我们可以清楚的看到，相同的顺序hashCode是一样的，内存地址是不一样的，因此可以肯定java的hashCode和内存是无关的，看起来与创建顺序有关。

接下来我们用代码去查看发生哈希冲突时的情况

    public static void main(String[] args) {
        HashSet<Integer> set = new HashSet<Integer>();
        int count = 0, num = 0;
        while (true) {
            count++;
            Object o = new Object();
            if (set.contains(o.hashCode())) {
                num++;
                System.out.println("count:" + count);
                System.out.println("hashCode:" + o.hashCode());
                if (num == 5) break;
            }
            set.add(o.hashCode());
        }
    }

重复运行上面代码

第一次	第二次	第三次
count:105730 hashCode:2134400190 count:111177 hashCode:651156501 count:121838 hashCode:1867750575 count:145361 hashCode:2038112324 count:146294 hashCode:1164664992	count:105730 hashCode:2134400190 count:111177 hashCode:651156501 count:121838 hashCode:1867750575 count:145361 hashCode:2038112324 count:146294 hashCode:1164664992	count:105730 hashCode:2134400190 count:111177 hashCode:651156501 count:121838 hashCode:1867750575 count:145361 hashCode:2038112324 count:146294 hashCode:1164664992

我们可以看到，发生哈希冲突的位置是一样的，发生哈希冲突时的hashCode也是一样的，因此我们可以断定hashcode的生成与对象生成的顺序有关。

接下来我们用不同的线程去重复上述操作

    public static void main(String[] args) throws InterruptedException {
        new Thread(() ->{
            HashSet<Integer> set = new HashSet<Integer>();
            int count = 0, num = 0;
            while (true) {
                count++;
                Object o = new Object();
                if (set.contains(o.hashCode())) {
                    num++;
                    System.out.println("count:" + count);
                    System.out.println("hashCode:" + o.hashCode());
                    if (num == 5) break;
                }
                set.add(o.hashCode());
            }
        }).start();
        Thread.sleep(1000L);
    }

结果如下

第一次	第二次	第三次
count:53869 hashCode:820543451 count:87947 hashCode:951498229 count:99859 hashCode:175788428 count:100940 hashCode:1979813773 count:123438 hashCode:916565907	count:51579 hashCode:1573800482 count:124111 hashCode:1834341042 count:139761 hashCode:2021276643 count:142276 hashCode:1807321828 count:143515 hashCode:1522017140	count:69186 hashCode:742597489 count:102967 hashCode:2084692455 count:144290 hashCode:424089733 count:176112 hashCode:53258565 count:178639 hashCode:1046986547

看起来没有任何规律了，这就需要去扒java的源码才能解答这个现象了

Object源码地址

在Object中我们可以看到hashCode实际上是调用了IHashCode

static JNINativeMethod methods[] = {
    {"hashCode",    "()I",                    (void *)&JVM_IHashCode},
    {"wait",        "(J)V",                   (void *)&JVM_MonitorWait},
    {"notify",      "()V",                    (void *)&JVM_MonitorNotify},
    {"notifyAll",   "()V",                    (void *)&JVM_MonitorNotifyAll},
    {"clone",       "()Ljava/lang/Object;",   (void *)&JVM_Clone},
};

然后在jvm.cpp找到了IHashCode实际调用的是FastHashCode

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
  JVMWrapper("JVM_IHashCode");
  // as implemented in the classic virtual machine; return 0 if object is NULL
  return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END

接着扒FastHashCode,FastHashCode在synchronizer.cpp中

intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
  if (UseBiasedLocking) {
    // NOTE: many places throughout the JVM do not expect a safepoint
    // to be taken here, in particular most operations on perm gen
    // objects. However, we only ever bias Java instances and all of
    // the call sites of identity_hash that might revoke biases have
    // been checked to make sure they can handle a safepoint. The
    // added check of the bias pattern is to avoid useless calls to
    // thread-local storage.
    if (obj->mark()->has_bias_pattern()) {
      // Box and unbox the raw reference just in case we cause a STW safepoint.
      Handle hobj (Self, obj) ;
      // Relaxing assertion for bug 6320749.
      assert (Universe::verify_in_progress() ||
              !SafepointSynchronize::is_at_safepoint(),
             "biases should not be seen by VM thread here");
      BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
      obj = hobj() ;
      assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
    }
  }

  // hashCode() is a heap mutator ...
  // Relaxing assertion for bug 6320749.
  assert (Universe::verify_in_progress() ||
          !SafepointSynchronize::is_at_safepoint(), "invariant") ;
  assert (Universe::verify_in_progress() ||
          Self->is_Java_thread() , "invariant") ;
  assert (Universe::verify_in_progress() ||
         ((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant") ;

  ObjectMonitor* monitor = NULL;
  markOop temp, test;
  intptr_t hash;
  markOop mark = ReadStableMark (obj);

  // object should remain ineligible for biased locking
  assert (!mark->has_bias_pattern(), "invariant") ;
    //mark是对象头
  if (mark->is_neutral()) {
    hash = mark->hash();              // 取出hash值（实际上可以理解为缓存，有就直接返回，没有就生成一个新的）
    if (hash) {                       // if it has hash, just return it
      return hash;
    }
    hash = get_next_hash(Self, obj);  // 这是生成hashcode的核心方法
    temp = mark->copy_set_hash(hash); // merge the hash code into header
    // use (machine word version) atomic operation to install the hash
    test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
    if (test == mark) {
      return hash;
    }
    // If atomic operation failed, we must inflate the header
    // into heavy weight monitor. We could add more code here
    // for fast path, but it does not worth the complexity.
  } else if (mark->has_monitor()) {
    monitor = mark->monitor();
    temp = monitor->header();
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();
    if (hash) {
      return hash;
    }
    // Skip to the following code to reduce code size
  } else if (Self->is_lock_owned((address)mark->locker())) {
    temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
    assert (temp->is_neutral(), "invariant") ;
    hash = temp->hash();              // by current thread, check if the displaced
    if (hash) {                       // header contains hash code
      return hash;
    }
    // WARNING:
    //   The displaced header is strictly immutable.
    // It can NOT be changed in ANY cases. So we have
    // to inflate the header into heavyweight monitor
    // even the current thread owns the lock. The reason
    // is the BasicLock (stack slot) will be asynchronously
    // read by other threads during the inflate() function.
    // Any change to stack may not propagate to other threads
    // correctly.
  }

  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert (mark->is_neutral(), "invariant") ;
  hash = mark->hash();
  if (hash == 0) {
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert (temp->is_neutral(), "invariant") ;
    test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);
    if (test != mark) {
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert (test->is_neutral(), "invariant") ;
      assert (hash != 0, "Trivial unexpected object/monitor header usage.");
    }
  }
  // We finally get the hash
  return hash;
}

get_next_hash

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  //随机获取一个
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  //根据内存地址计算一个
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  //返回1
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  //不太清楚什么意思 求大佬解释
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  //直接返回内存地址
  if (hashCode == 4) {
     value = cast_from_oop<intptr_t>(obj) ;
  }
  //这是一种生成随机数的一种算法
   else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

到了这里我们基本上就知道了hashcode是怎么生成的了，实际上在jdk1.8在是用的第五种生成方式，我们可以在Linux系统下输入：java -XX:+PrintFlagsFinal -version|grep hashCode命令查看

image

ok,接下来我们来分析一下第5种方式，看到这个代码我的第一反应是懵逼的，那个什么_hashStateX是个什么鬼？
我们来看一下thead.cpp里面是怎样定义的：

// thread-specific hashCode stream generator state - Marsaglia shift-xor form
  _hashStateX = os::random() ;
  _hashStateY = 842502087 ;
  _hashStateZ = 0x8767 ;    // (int)(3579807591LL & 0xffff) ;
  _hashStateW = 273326509 ;

在thread里面定义了一个随机数，三个常数，通过这四个数根据上述的算法来生成hashcode。

具体原理请参考论文：Xorshift RNGs

因为在上述算法在，需要得到线程里面的一个随机数作为一个初始值，上述算法在前后具有因果关系，后面的结果是根据前面的结果推算而来的，因此对于相同的线程来说，在某种意义上，对象的hashcode的生成是和顺序有关的。

为什么主函数里面的hashcode的生成一直是有序的呢？因为主线程里面的是固定值。

可见hashCode在1.8中和内存地址是无关的，与所在线程（或者说生成的一个随机数）以及生成顺序有关

你懂HashCode？你不懂！
在讨论hashCode之前，我先提出几个问题，后面我会一一解答什么是hashCode? hashCode有什么用...
2017-09-29
你懂！你不懂？
学会提问，一层一层剥开自己的心
你不懂我，我不怪你。你若懂我，该有多好。 ----题记你真的懂我吗？我连自己都弄不懂自己，你说你懂我？或许，...
你不懂我
你不懂我你不敢懂我因为懂我太累你要的恰恰是轻松而我注定是你的累赘你不懂我因为你不想懂我因为你不会懂我...
女神
你不懂我，我不怪你，你若懂我，亦不怪你；懂与不懂，皆不怪你，问其缘由，只因爱你！如不懂我，何需怪你，如若...
你
你笑了我懂你哭了我懂你说，分手我不懂
懂
你懂我懂，似懂非懂。若懂真懂，不懂别懂。
2017-08-15
——这是好酒！ ——你懂酒？ ——不懂，我爱喝 ——这是好琴曲！ ——你懂琴？ ——不懂，但我爱听！
羊的孤独
你懂我吗？你不懂我！
懂是什么东西
你懂我的时候我不懂你我懂你的时候你不懂我我懂你不懂我的时候懂你你不懂我懂你的时候不懂我懂与不懂是个问...

你懂HashCode？你不懂！

什么是hashCode?

hashCode有什么用？

为什么会发生哈希冲突，可以避免吗？

为什么重写equals()方法一定要重写hashCode()方法？

hashCode在Java中是如何生成的？

相关文章

你懂HashCode？你不懂！

2017-09-29

学会提问，一层一层剥开自己的心

你不懂我

女神

你

懂

2017-08-15

羊的孤独

懂是什么东西

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

CrazyMark