美文网首页
当重写equals方法时需要重写hashcode

当重写equals方法时需要重写hashcode

作者: hello_kd | 来源:发表于2019-03-03 17:24 被阅读0次

    下文为effective Java3中item11的原文以及本人阅读时做的笔记。

    You must override hashCode in every class that overrides equals. If you fail to do so, your class will violate the general contract for hashCode, which will prevent it from functioning properly in collections such as HashMap and HashSet. Here is the contract, adapted from the Object specification
    每个重写了equals方法的类都必须要重写hashCode方法,如果没这做,那么你的类会违反hashCode的通用契约,这将会阻止类在集合的正常运行,比如HashMap和HashSet。这里是根据Object对象规范改编的

    • When the hashCode method is invoked on an object repeatedly during an execution of an application, it must consistently return the same value,
      provided no information used in equals comparisons is modified. This value need not remain consistent from one execution of an application to another.
      当一个对象的hashcode方法被重复调用在应用程序执行期间,必须始终返回相同的值。前提是equals方法比较的信息没有被修改。在应用程序的一次执行到另一次执行时,这个值无需保持一致(多次执行应用程序)

    • If two objects are equal according to the equals(Object) method, then calling hashCode on the two objects must produce the same integer result.
      如果两个对象的equals比较后相等,那么这两个对象的hashCode的值必须是相同的整型

    • If two objects are unequal according to the equals(Object) method, it is not required that calling hashCode on each of the objects must produce distinct results. However, the programmer should be aware that producing distinct results for unequal objects may improve the performance of hash tables.
      如果两个对象equals比较后不相等,并不要求这两个对象调用hashCode方法产生不同的值。然后,程序员应该意思到对不相等的对象产生不一样的哈希值可能提高哈希表的性能。

    The key provision that is violated when you fail to override hashCode is
    the second one: equal objects must have equal hash codes. Two distinct
    instances may be logically equal according to a class’s equals method, but to Object’s hashCode method, they’re just two objects with nothing much in common. Therefore, Object’s hashCode method returns two seemingly random numbers instead of two equal numbers as required by the contract.
    当无法覆盖hashCode时打破上述第二个关键契约:相等的对象必须有相同的哈希值。两个不同的实例根据equals方法可能是逻辑上相等的,但对于对象的hashCode方法,他们只是没有太多相同点的两个对象。因此,对象的hashCode方法返回两个看似随即的数字,而不是契约要求的两个相等的数字。

    For example, suppose you attempt to use instances of the PhoneNumber class from Item 10 as keys in a HashMap:
    比如,假设你试图将一个PhoneNumber的实例放到HashMap中
    Map<PhoneNumber, String> m = new HashMap<>();
    m.put(new PhoneNumber(707, 867, 5309), "Jenny");

    At this point, you might expect m.get(new PhoneNumber(707, 867, 5309)) to return "Jenny", but instead, it returns null. Notice that two PhoneNumber instances are involved: one is used for insertion into the HashMap, and a second, equal instance is used for (attempted) retrieval. The PhoneNumber class’s failure to override hashCode causes the two equal instances to have unequal hash codes, in violation of the hashCode contract. Therefore, the get method is likely to look for the phone number in a different hash bucket from the one in which it was stored by the put method. Even if the two instances happen to hash to the same bucket, the get method will almost certainly return null, because HashMap has an optimization that caches the hash code associated with each entry and doesn’t bother checking for object equality if the hash codes don’t match
    这个时候,你可能希望m.get(new PhoneNumber(707, 867, 5309))返回“Jenny”,但是,它返回null。注意,这包含了两个PhoneNumber实例:一个之前插入到HashMap的,一个用于试图获取在哈希表中相等的对象。PhoneNumber 类没有覆盖hashCode方法导致两个相等的实例有着不同的哈希值,违反了hashCode方法的契约。因此,get方法好像是查找一个与之前通过put方法放到哈希表中不同位置的PhoneNumber对象。即使两个实例通过hash函数映射到相同的哈希槽,get方法也会总是返回null,因为HashMap有一个优化机制,缓存每一个对象的哈希码,如果哈希码不匹配,那么不会检查对象是否相等。

    Fixing this problem is as simple as writing a proper hashCode method for
    PhoneNumber. So what should a hashCode method look like? It’s trivial to write a bad one. This one, for example, is always legal but should never be used:
    解决这个问题一个比较简单的方式是在PhoneNumber类重写一个正确的hashCode方法,那么hashCode应该怎么写?下面是一个糟糕的方式,它是合理的但不应该被使用:
    @Override public int hashCode() { return 42; }

    It’s legal because it ensures that equal objects have the same hash code. It’s atrocious because it ensures that every object has the same hash code. Therefore, every object hashes to the same bucket, and hash tables degenerate to linked lists. Programs that should run in linear time instead run in quadratic time. For large hash tables, this is the difference between working and not working.
    这是合理的,因为它确保了相等的对象有相同的哈希码。但这也是极不好的,因此它确保了所有的对象都有相同的哈希码。一年春,每个对象都会映射到相同的哈希槽,并且哈希表会退化成一个链表。程序应该以线性时间运行变为以二次方时间运行。对于比较大的哈希表,这是工作和不工作的区别。

    A good hash function tends to produce unequal hash codes for unequal
    instances. This is exactly what is meant by the third part of the hashCode contract. Ideally, a hash function should distribute any reasonable collection of unequal instances uniformly across all int values. Achieving this ideal can be difficult. Luckily it’s not too hard to achieve a fair approximation. Here is a simple recipe
    一个好的哈希函数趋向于产生不同的哈希码对于不相等的实例, 从hasCode契约的第三条的意思来看这是准确的。理想的情况下,一个哈希函数应该将任何不相等的实例分布到合理的集合中统一通过所有的int值,达到这样的理想效果是很难的,幸运的是并不会太难达到一个近似公平的结果,这里是一个简单的方式:

    1. Declare an int variable named result, and initialize it to the hash code c for the first significant field in your object, as computed in step 2.a. (Recall from Item 10 that a significant field is a field that affects equals comparisons.)
      声明一个int类型的变量result,并且初始化一个哈希码C对于对象的第一个重要属性。如2.a计算的(回忆条款10,一个重要的属性会影响equals方法的比较)

    2. For every remaining significant field f in your object, do the following:
      对于对象剩下的每一个重要属性,按如下操作
      a. Compute an int hash code c for the field:
      i. If the field is of a primitive type, compute Type.hashCode(f), where Type is the boxed primitive class corresponding to f’s type
      如果属性是原生数据类型,计算Type.hashCode(f),其中Type是f的包装类型
      ii. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, use 0 (or some other constant, but 0 is traditional).
      如果属性是一个对象引用且这个类的equals方法比较属性是通过递归调用equals方法,递归调用属性的hashCode方法。如果需要更负责的比较,计算这个属性的一个规范表示并且在规范表示上调用hashCode方法。如果属性的值为null,那么使用0(或其他常量,但传统方式一般为0)
      iii. If the field is an array, treat it as if each significant element were a separate field. That is, compute a hash code for each significant element by applying these rules recursively, and combine the values per step 2.b. If the array has no significant elements, use a constant, preferably not 0. If all elements are significant, use Arrays.hashCode.
      如果属性是一个数组,可以将数组的每个元素当成一个分开的属性处理。即,计算每个重要元素的哈希码通过递归应用这些规则,并且将通过步骤2.b将这计算结果合并。如果数组没有重要的元素,那么可以使用一个常量,最好不要用0,如果数组的所有元素都是重要的,那么使用Arrays.hashCode
      b. Combine the hash code c computed in step 2.a into result as follows:
      合并步骤2.a计算后的哈希码c到一个result:
      result = 31 * result + c;

    3. 返回result

    When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition (unless you used AutoValue to generate your equals and hashCode methods, in which case you can safely omit these tests). If equal instances have unequal hash codes, figure out why and fix the problem
    当写完hashCode方法时,问问你自己,是否equals相等的实例有相同的哈希码,写一个单元测试来检查你的直觉(除非你使用AutoValue框架来自动生成equals和hashCode方法,这种情况下你可以安全的忽略这些测试)如果相等的实例哟不同的哈希码,想想为什么和如何解决问题

    You may exclude derived fields from the hash code computation. In other words, you may ignore any field whose value can be computed from fields included in the computation. You must exclude any fields that are not used in equals comparisons, or you risk violating the second provision of the hashCode contract
    你可能排除衍生字段的哈希码计算,换句话说,你可以忽略任何衍生属性(这种属性的值可以从已经包含到哈希码计算的属性得出),你必须排除任何没用在equals方法比较的属性,或者你可以冒险打破hashCode契约的第二条。

    The multiplication in step 2.b makes the result depend on the order of the fields, yielding a much better hash function if the class has multiple similar fields. For example, if the multiplication were omitted from a String hash function, all anagrams would have identical hash codes. The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, because multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance on some architectures: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.
    步骤2.b的乘法产生的结果依赖于属性的顺序,如果类有多个相似的属性会产生一个更好的哈希函数。比如,如果String的哈希函数中忽略乘法,那么所有的字谜都有相同的哈希码,选择31是它是一个奇素数,如果是一个偶数,那么乘法将会溢出,信息将会丢失。因此乘以2相当于移位,使用素数的优势不太明显,但这是传统方式。31的一个很好特性是乘法可以被以为代替且减法在某些结构上有更好的性能:31 * i == (i << 5) - i现在的VM自动执行这种优化。

    Let’s apply the previous recipe to the PhoneNumber class:

    // Typical hashCode method
        @Override public int hashCode() {
            int result = Short.hashCode(areaCode);
            result = 31 * result + Short.hashCode(prefix);
            result = 31 * result + Short.hashCode(lineNum);
            return result;
        }
    

    Because this method returns the result of a simple deterministic computation whose only inputs are the three significant fields in a PhoneNumber instance, it is clear that equal PhoneNumber instances have equal hash codes. This method is, in fact, a perfectly good hashCode implementation for PhoneNumber, on par with those in the Java platform libraries. It is simple, is reasonably fast, and does a reasonable job of dispersing unequal phone numbers into different hash buckets.
    因为这个方法返回的结果是一个简单确定性的计算。它的输入是PhoneNumber实例三个重要的属性。这很明显,相等的PhoneNumber实例有相同的哈希码。事实上,一个更好的hashCode方法实现对于PhoneNumber,与Java平台的类库一样,它是简单的,速度相当快,合理的将不同的电话号码分散到不同的哈希槽中。

    While the recipe in this item yields reasonably good hash functions, they are not state-of-the-art. They are comparable in quality to the hash functions found in the Java platform libraries’ value types and are adequate for most uses. If you have a bona fide need for hash functions less likely to produce collisions, see Guava’s com.google.common.hash.Hashing [Guava].
    然而,这个条款所提到的方法会产生一个相当好的哈希函数,但并不是最先进的。他们在质量上可以与Java平台类库的值类型的哈希函数进行比较,并且也足够在大多数场景下使用。如果你需要有一个产生更少冲突的哈希函数,看guava的com.google.common.hash.Hashing

    The Objects class has a static method that takes an arbitrary number of objects and returns a hash code for them. This method, named hash, lets you write one-line hashCode methods whose quality is comparable to those written according to the recipe in this item. Unfortunately, they run more slowly because they entail array creation to pass a variable number of arguments, as well as boxing and unboxing if any of the arguments are of primitive type. This style of hash function is recommended for use only in situations where performance is not critical. Here is a hash function for PhoneNumber written using this technique:
    Objects类有一个静态方法hash,用于计算其传入参数的哈希码。让你写一个单行的hashCode方法,其质量与根据此条款的方法所写的相当。不幸的是,他们运行的更慢,因为它们需要创建数组来传递可变参数(从Object.hash源码可知)还有自动装箱与拆箱如果参数的任何一个类型是原生数据类型的话。这种风格的哈希函数推荐在对性能要求不高的场景下使用。这里有一个通过此技术重写的PhoneNumber 的哈希函数:

    // One-line hashCode method - mediocre performance
        @Override public int hashCode() {
            return Objects.hash(lineNum, prefix, areaCode);
        }
    

    If a class is immutable and the cost of computing the hash code is significant, you might consider caching the hash code in the object rather than recalculating it each time it is requested. If you believe that most objects of this type will be used as hash keys, then you should calculate the hash code when the instance is created. Otherwise, you might choose to lazily initialize the hash code the first time hashCode is invoked. Some care is required to ensure that the class remains thread safe in the presence of a lazily initialized field (Item 83). Our PhoneNumber class does not merit this treatment, but just to show you how it’s done, here it is. Note that the initial value for the hashCode field (in this case, 0) should not be the hash code of a commonly created instance:
    如果一个类是不可变的且计算哈希码需要花费比较大的代价时,你可以思考将哈希码放在缓存中而不是每次需要使用的使用再去计算。如果你相信这个类型的大部分对象将会被用作哈希键时,那么你可以在对象创建的时候就计算哈希码。或者,你可以选择延迟初始化哈希码在hashCode方法第一次被调用时。一些需要确保类保持线程安全当延迟初始化属性时。PhoneNumber 类不需要这样处理。但下面是显示如何这样做。注意,hashCode属性的初始值为0,不能被作为哈希码在普通的对象创建时

    // hashCode method with lazily initialized cached hash code
        private int hashCode; // Automatically initialized to 0
        @Override public int hashCode() {
            int result = hashCode;
            if (result == 0) {
                result = Short.hashCode(areaCode);
                result = 31 * result + Short.hashCode(prefix);
                result = 31 * result + Short.hashCode(lineNum);
                hashCode = result;
            }
            return result;
        }
    

    Do not be tempted to exclude significant fields from the hash code computation to improve performance. While the resulting hash function may run faster, its poor quality may degrade hash tables’ performance to the point where they become unusable. In particular, the hash function may be confronted with a large collection of instances that differ mainly in regions you’ve chosen to ignore. If this happens, the hash function will map all these instances to a few hash codes, and programs that should run in linear time will instead run in quadratic time
    不要试图用排除重要属性在计算哈希码时来提升性能,虽然哈希函数可能运行得更快,但是不可靠的质量可能降低哈希表的性能,以至于在某个点变得不可用。特别是,哈希函数可能面临着一个大量实例,这些实例
    主要区别于你选择的区域。如果这种情况发生,哈希函数将会映射所有这些实例到少数的哈希码中,且程序本该以线性时间运行会变为以二次方时间运行。

    This is not just a theoretical problem. Prior to Java 2, the String hash function used at most sixteen characters evenly spaced throughout the string, starting with the first character. For large collections of hierarchical names, such as URLs, this function displayed exactly the pathological behavior described earlier.
    这并不仅是理论上的问题,在Java2之前,字符串的哈希函数最多使用16个字符,这些字符在整个字符串中的间隔大致一样。对于分级较多的名称,比如URL,这个哈希函数就会显示出上面提到的问题。

    Don’t provide a detailed specification for the value returned by hashCode, so clients can’t reasonably depend on it; this gives you the flexibility to change it. Many classes in the Java libraries, such as String and Integer, specify the exact value returned by their hashCode method as a function of the instance value. This is not a good idea but a mistake that we’re forced to live with: It impedes the ability to improve the hash function in future releases. If you leave the details unspecified and a flaw is found in the hash function or a better hash function is discovered, you can change it in a subsequent release.
    不要提供一个详细规格对于hashCode方法返回的值,因此客户端不能合理的依赖它,这会给你的改变带来更大的灵活性。很多Java类库中的类,比如String和Integer,hashCode方法返回的值与实例值一致,这不是一个好的注意,我们会被强迫面临一个错误:它阻碍哈希函数性能的提升在未来的版本中。如果你在哈希函数中没有详细的指定规格和发现缺陷或者发现更好的哈希函数,你可以在后续的版本中改变它。

    In summary, you must override hashCode every time you override equals, or your program will not run correctly. Your hashCode method must obey the general contract specified in Object and must do a reasonable job assigning unequal hash codes to unequal instances. This is easy to achieve, if slightly tedious, using the recipe on page 51. As mentioned in Item 10, the AutoValue framework provides a fine alternative to writing equals and hashCode methods manually, and IDEs also provide some of this functionality.
    总结,你在重写equals方法时必须重写hashCode方法,或者你的程序运行不正确(不重写),你重写的hashCode方法必须遵守Object规定的契约并且做一个合理的工作来分配不同的哈希码到不相等的对象。这是很容易实现的,如果你有些厌烦,可以使用AutoValue框架来自动生成equals和hashCode方法,IDE也有提供这些功能。

    相关文章

      网友评论

          本文标题:当重写equals方法时需要重写hashcode

          本文链接:https://www.haomeiwen.com/subject/nglpuqtx.html