hashcode的作用,与equal()有什么区别?
此文章结构:
1.英文文档关于hashmap工作原理、hashcode对hashmap的效率影响。
也是google “hashcode作用”,然后就是罗列的1.2.3,看完后的赶脚就是,像没有看过一样,没有任何印象。
偶然从一篇讲hashmap的外文网站看到一篇讲How does a HashMap work in JAVA(hashmap工作原理)里说明hashcode()的实现会对hashmap的效率影响,我一下子记住了。所以,这里我就把hashcode对hashmap效率影响的英文搬运过来了,并有图帮助理解。
英文阅读能力强的,可以直接去阅读原文,链接如下:
http://coding-geek.com/how-does-a-hashmap-work-in-java/
Performance issues
Skewed HashMap vs well balanced HashMap
In the best case scenario, the get() and put() methods have a O(1) cost in time complexity. But, if you don’t take care of the hash function of the key, you might end up with very slow put() and get() calls. The good performance of the put() and get depends on the repartition of the data into the different indexes of the inner array (the buckets). If the hash function of your key is ill-designed, you’ll have a skew repartition (no matter how big the capacity of the inner array is). All the put() and get() that use the biggest linked lists of entry will be slow because they’ll need to iterate the entire lists. In the worst case scenario (if most of the data are in the same buckets), you could end up with a O(n) time complexity.
Here is a visual example. The first picture shows a skewed HashMap and the second picture a well balanced one.
skewed_java_hashmap.jpg well_balanced_java_hashmap.jpg
In the case of this skewed HashMap the get()/put() operations on the bucket 0 are costly. Getting the Entry K will cost 6 iterations
skewed_java_hashmap.jpg
In the case of this well balanced HashMap, getting the Entry K will cost 3 iterations. Both HashMaps store the same amount of data and have the same inner array size. The only difference is the hash (of the key) function that distributes the entries in the buckets.
(下面是 hashCode 作用的重点)
Here is an extreme example in JAVA where I create a hash function that puts all the data in the same bucket then I add 2 million elements.
public class Test {
public static void main(String[] args) {
class MyKey {
Integer i;
public MyKey(Integer i){
this.i =i;
}
@Override
public int hashCode() {
return 1;
}
@Override
public boolean equals(Object obj) {
…
}
}
Date begin = new Date();
Map <MyKey,String> myMap= new HashMap<>(2_500_000,1);
for (int i=0;i<2_000_000;i++){
myMap.put( new MyKey(i), "test "+i);
}
Date end = new Date();
System.out.println("Duration (ms) "+ (end.getTime()-begin.getTime()));
}
}
On my core i5-2500k @ 3.6Ghz it takes** more than 45 minutes** with java 8u40 (I stopped the process after 45 minutes).
Now, If I run the same code but this time I use the following hash function
@Override
public int hashCode() {
int key = 2097152-1;
return key+2097152*i;
}
it takes 46 seconds, which is way better! This hash function has a better repartition than the previous one so the put() calls are faster.
And If I run the same code with the following hash function that provides an even better hash repartition
@Override
public int hashCode() {
return i;
}
it now takes 2 seconds
I hope you realize how important the hash function is. If a ran the same test on JAVA 7, the results would have been worse for the first and second cases (since the time complexity of put is O(n) in JAVA 7 vs O(log(n)) in JAVA 8)
When using a HashMap, you need to find a hash function for your keys that spreads the keys into the most possible buckets. To do so, you need to avoid hash collisions. The String Object is a good key because of it has good hash function. Integers are also good because their hashcode is their own value.
以上意思就是:在hashmap中,你需要找到一个hash函数,可以尽可能地使得key散列分布到bucket中,以避免hash冲突。
关于equals():
equals():比较的是2个对象的content是否等。
例如:
String str1=new String ("1");
String str2 =new String ("1");
Log.e("----str1.equals()str2", "" + (str1.equals(str2)));
Log.e("----str1==()str2", "" + (str1==(str2)));
上述输出
----str1.equals()str2 true
----str1==()str2 false
使用equals方法,内部实现分为三个步骤:
- 先比较引用是否相同(是否为同一对象)
- 再 判断类型是否一致(是否为同一类型)
- 最后 比较内容是否一致
Java 中所有内置的类的 equals 方法的实现步骤均是如此,特别是诸如 Integer,Double 等包装器类。
关于“==”:
- == : 该操作符生成的是一个boolean结果,它计算的是操作数的值之间的关系
- equals : Object 的 实例方法,比较两个对象的content是否相同
- hashCode : Object 的 native方法 , 获取对象的哈希值,用于确定该对象在哈希表中的索引位置,它实际上是一个int型整数
“==” 操作数的值
-
基本数据类型变量
在Java中有八种基本数据类型:
浮点型:float(4 byte), double(8 byte)
整型:byte(1 byte), short(2 byte), int(4 byte) , long(8 byte)
字符型: char(2 byte)
布尔型: boolean(JVM规范没有明确规定其所占的空间大小,仅规定其只能够取字面值”true”和”false”)
对于这八种基本数据类型的变量,变量直接存储的是“值”。因此,在使用关系操作符 == 来进行比较时,比较的就是“值”本身。**要注意的是,**浮点型和整型都是有符号类型的(最高位仅用表示正负,不参与计算【以 byte 为例,其范围为 -2^7 ~ 2^7 - 1,-0即-128】),而char是无符号类型的(所有位均参与计算,所以char类型取值范围为0~2^16-1)**。
- 引用类型变量
在Java中,引用类型的变量存储的并不是“值”本身,而是与其关联的对象在内存中的地址。比如下面这行代码,
String str1;
这句话声明了一个引用类型的变量,此时它并没有和任何对象关联。而通过 new 来产生一个对象,并将这个对象和str1进行绑定:
str1= new String("hello");
那么 str1 就指向了这个对象,此时引用变量str1中存储的是它指向的对象在内存中的存储地址,并不是“值”本身,也就是说并不是直接存储的字符串”hello”。这里面的引用和 C/C++ 中的指针很类似。
2、小结
因此,对于关系操作符 ==:
- 若操作数的类型是基本数据类型,则该关系操作符判断的是左右两边操作数的值是否相等
- 若操作数的类型是引用数据类型,则该关系操作符判断的是左右两边操作数的内存地址是否相同。也就是说,若此时返回true,则该操作符作用的一定是同一个对象。
。
网友评论