JVM中的字符串常量池是个有些玄幻的玩意儿,关于它的细节,各类书籍和网站上众说纷纭。本文试图参考尽量权威的资料,找一个切入点来理清这团乱麻。所有参考文档均有传送门。
本文提到的JVM就是HotSpot。如果不特别说明,JDK版本默认采用1.8,涉及到对比时会用1.6和1.7。
字符串驻留
字符串驻留(String interning)是字符串常量池产生的根本原因。英文维基上提供了非常好的解释,大意如下:
所谓字符串驻留,是指在系统中,对每个字面量唯一的字符串,都只保留唯一的一份副本,称作“驻留量”(intern),并且它们都是不可变的。这些彼此不同的字符串被存储在字符串常量池中。
各编程语言有各自的方法来取得字符串常量池中的驻留量,或者将一个字符串驻留,比如Java中的String.intern()。在Java中,所有编译期能确定的字符串也都会自动驻留。
不仅字符串可以驻留。例如在Java中,[-128,127]区间内的Integer被缓存在内部类IntegerCache中,这个类就相当于整形常量池。在该区间内两个数值相同的整形值,在自动装箱后实际上指向堆内的同一个Integer对象(也就是驻留量),可以参考Integer.valueOf()方法的源码。
字符串驻留是设计模式中的享元模式(flyweight pattern)的典型实现,这里就不展开描述了。
字符串字面量
前面提到了字符串字面量(String literal)的概念。Java语言规范中说:
字符串字面量是双引号括起来的0或多个字符。它是对String类实例的引用。
一个字符串字面量总是引用String类的同一个实例。这是因为字符串字面量以及字符串常量表达式都通过使用String.intern()方法而驻留了,从而可以共享唯一的实例。
字符串字面量和字符串常量表达式都属于上面说的“编译期能确定的字符串”。来看Java语言规范里的示例:
package testPackage;
class Test {
public static void main(String[] args) {
String hello = "Hello", lo = "lo";
System.out.print((hello == "Hello") + " ");
System.out.print((Other.hello == hello) + " ");
System.out.print((other.Other.hello == hello) + " ");
System.out.print((hello == ("Hel"+"lo")) + " ");
System.out.print((hello == ("Hel"+lo)) + " ");
System.out.println(hello == ("Hel"+lo).intern());
}
}
class Other { static String hello = "Hello"; }
package other;
public class Other { public static String hello = "Hello"; }
输出是true true true true false true
。这可以说明:
- 字符串常量池在JVM中是全局的,与类和包的作用域都无关;
- 编译期不能确定的字符串(如上面的
"Hel"+lo
),运行期会产生新的String对象(通过反编译可以看出是通过StringBuilder来拼接的)。
String.intern()
在JDK中,String.intern()方法是一个native方法:
/**
* Returns a canonical representation for the string object.
* <p>
* A pool of strings, initially empty, is maintained privately by the
* class {@code String}.
* <p>
* When the intern method is invoked, if the pool already contains a
* string equal to this {@code String} object as determined by
* the {@link #equals(Object)} method, then the string from the pool is
* returned. Otherwise, this {@code String} object is added to the
* pool and a reference to this {@code String} object is returned.
* <p>
* It follows that for any two strings {@code s} and {@code t},
* {@code s.intern() == t.intern()} is {@code true}
* if and only if {@code s.equals(t)} is {@code true}.
* <p>
* All literal strings and string-valued constant expressions are
* interned. String literals are defined in section 3.10.5 of the
* <cite>The Java™ Language Specification</cite>.
*
* @return a string that has the same contents as this string, but is
* guaranteed to be from a pool of unique strings.
*/
public native String intern();
如果逐字解释这段JavaDoc,大意是:
String类会维护一个私有的、初始为空的字符串池。
当调用intern()方法时,如果该池中已经存在与本字符串this字面量相同的一个字符串(用equals()方法判定),那么就直接返回池中的那个字符串。如果不存在,那么this会被加入池中(驻留),并返回对它的引用。
对两个字符串s和t,当且仅当s.equals(t)
为真时,s.intern() == t.intern()
才为真。
所有字符串字面量和字符串常量表达式都会被驻留。
由此可见,Java的字符串驻留和常量池机制在JDK源码里是找不到的,它们是由JVM底层来实现的。
事情没有那么简单,我们要解决以下几个问题:
- 字符串常量池位于JVM内存空间中的哪个位置?
- 它里面存储的是String对象,还是String对象的引用,抑或两者兼而有之?
- 它内部是如何实现的,有什么调节方法?
字符串常量池的位置
既然要涉及JVM内存空间了,先放上经典的图。
JVM运行时数据区
在官方发布的JDK7 Release Notes中,有这样一段话:
Area: HotSpot
Synopsis: In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation......(以下略)
RFE: 6962931
大意是:在JDK7中,驻留字符串不再在永久代上分配,而是在Java堆的主要部分(新生代和老年代)分配。
由此可得,JDK6的字符串常量池位于永久代(它是HotSpot的方法区实现)。到了JDK7,字符串常量池就直接放在堆里。
下面用《深入理解Java虚拟机(第二版)》的经典例子来证明。它产生一个无限递增的数字字符串序列,并依次放进字符串常量池。
public class OOMExample {
public static void main(String[] args) {
// 使用List保持引用,避免常量池被GC
List<String> list = new ArrayList<String>();
int i = 0;
while (true) {
list.add(String.valueOf(i++).intern());
}
}
}
JVM参数统一为:
-Xms8m -Xmx8m -XX:PermSize=8m -XX:MaxPermSize=8m -XX:+UseParallelGC -XX:+PrintGCDetails
然后分别在JDK6、7、8的环境下运行,观察输出结果。
- JDK6
[GC [PSYoungGen: 2012K->304K(2368K)] 2012K->420K(7872K), 0.0014317 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC [PSYoungGen: 2352K->320K(2368K)] 2468K->705K(7872K), 0.0013064 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
[GC [PSYoungGen: 1331K->288K(2368K)] 1717K->697K(7872K), 0.0007446 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[Full GC [PSYoungGen: 288K->0K(2368K)] [PSOldGen: 409K->617K(5504K)] 697K->617K(7872K) [PSPermGen: 8191K->8191K(8192K)], 0.0130018 secs] [Times: user=0.01 sys=0.00, real=0.02 secs]
[GC [PSYoungGen: 0K->0K(2368K)] 617K->617K(7872K), 0.0001804 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
[Full GC [PSYoungGen: 0K->0K(2368K)] [PSOldGen: 617K->471K(5504K)] 617K->471K(7872K) [PSPermGen: 8191K->8180K(8192K)], 0.0134341 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
......
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
at java.lang.String.intern(Native Method)
at me.lmagics.OOMExample.main(OOMExample.java:16)
- JDK7
[GC [PSYoungGen: 2048K->507K(2560K)] 2048K->1651K(8192K), 0.0026340 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC [PSYoungGen: 2555K->501K(2560K)] 3699K->3389K(8192K), 0.0028820 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC [PSYoungGen: 2549K->496K(2560K)] 5437K->5192K(8192K), 0.0038110 secs] [Times: user=0.01 sys=0.01, real=0.01 secs]
[Full GC [PSYoungGen: 496K->0K(2560K)] [ParOldGen: 4696K->5101K(5632K)] 5192K->5101K(8192K) [PSPermGen: 2603K->2602K(8192K)], 0.0622090 secs] [Times: user=0.27 sys=0.00, real=0.06 secs]
[Full GC [PSYoungGen: 2048K->1535K(2560K)] [ParOldGen: 5101K->5180K(5632K)] 7149K->6716K(8192K) [PSPermGen: 2602K->2602K(8192K)], 0.0550730 secs] [Times: user=0.28 sys=0.01, real=0.05 secs]
[Full GC [PSYoungGen: 2048K->2047K(2560K)] [ParOldGen: 5180K->5180K(5632K)] 7228K->7228K(8192K) [PSPermGen: 2602K->2602K(8192K)], 0.0287170 secs] [Times: user=0.14 sys=0.00, real=0.03 secs]
......
[Full GC [PSYoungGen: 2047K->2047K(2560K)] [ParOldGen: 5543K->5543K(5632K)] 7591K->7591K(8192K) [PSPermGen: 2602K->2602K(8192K)], 0.0285530 secs] [Times: user=0.16 sys=0.00, real=0.03 secs]
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
[Full GC [PSYoungGen: 2047K->0K(2560K)] [ParOldGen: 5546K->220K(5632K)] 7594K->220K(8192K) [PSPermGen: 2627K->2627K(8192K)], 0.0052340 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
at java.lang.Integer.toString(Integer.java:331)
at java.lang.String.valueOf(String.java:2954)
at me.lmagics.OOMExample.main(OOMExample.java:16)
- JDK8
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=8m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=8m; support was removed in 8.0
[GC (Allocation Failure) [PSYoungGen: 1536K->482K(2048K)] 1536K->1210K(7680K), 0.0017302 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 2018K->505K(2048K)] 2746K->2581K(7680K), 0.0021425 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
[GC (Allocation Failure) [PSYoungGen: 2041K->501K(2048K)] 4117K->3969K(7680K), 0.0021064 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
[GC (Allocation Failure) [PSYoungGen: 2037K->496K(2048K)] 5505K->5276K(7680K), 0.0025973 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
[Full GC (Ergonomics) [PSYoungGen: 496K->0K(2048K)] [ParOldGen: 4780K->5090K(5632K)] 5276K->5090K(7680K), [Metaspace: 2652K->2652K(1056768K)], 0.0587041 secs] [Times: user=0.30 sys=0.01, real=0.05 secs]
[Full GC (Ergonomics) [PSYoungGen: 1412K->880K(2048K)] [ParOldGen: 5090K->5570K(5632K)] 6503K->6451K(7680K), [Metaspace: 2652K->2652K(1056768K)], 0.0334546 secs] [Times: user=0.17 sys=0.00, real=0.03 secs]
[Full GC (Ergonomics) [PSYoungGen: 1536K->1535K(2048K)] [ParOldGen: 5570K->5154K(5632K)] 7106K->6690K(7680K), [Metaspace: 2652K->2652K(1056768K)], 0.0320396 secs] [Times: user=0.15 sys=0.00, real=0.04 secs]
......
[Full GC (Ergonomics) [PSYoungGen: 1535K->1535K(2048K)] [ParOldGen: 5542K->5542K(5632K)] 7078K->7078K(7680K), [Metaspace: 2652K->2652K(1056768K)], 0.0273170 secs] [Times: user=0.17 sys=0.00, real=0.03 secs]
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
[Full GC (Ergonomics) [PSYoungGen: 1536K->0K(2048K)] [ParOldGen: 5545K->267K(5632K)] 7081K->267K(7680K), [Metaspace: 2677K->2677K(1056768K)], 0.0039194 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
at java.lang.Integer.toString(Integer.java:401)
at java.lang.String.valueOf(String.java:3099)
at me.lmagics.OOMExample.main(OOMExample.java:16)
从以上输出结果可以看出:
- JDK6报永久代OOM,证明字符串常量池确实在永久代;
- JDK7和8均报超出GC临界限制。在HotSpot中,一旦JVM检查到用98%以上的时间来GC,而回收了少于2%的堆空间,就会报这个错误。如果使用参数
-XX:-UseGCOverheadLimit
来关闭检查,那么一段时间后就会抛出常见的“java.lang.OutOfMemoryError: Java heap space”。这证明字符串常量池确实移动到了堆中; - JDK8还会报设置永久代的参数无效。这是因为JDK8已经完全移除了永久代,改用元空间(Metaspace)来实现方法区了。在GC日志中也可以看到Metaspace GC的情况。
问:为什么字符串常量池要从永久代移动到堆,并且后来永久代还被元空间替代了?
答:永久代作为HotSpot方法区的实现很不好用,并且其他JVM实现都没有永久代。
根据Java虚拟机规范的规定:
方法区存储了每一个类的结构信息,例如运行时常量池、字段和方法数据、构造函数和普通方法的字节码内容等等。
虽然方法区是堆的逻辑组成部分,但是简单的虚拟机实现可以选择在这个区域不做GC与压缩。
在HotSpot中,方法区是存在GC的,就是堆空间的分代GC直接扩展过来。由于方法区内的数据相对于新生代和老年代来讲更“静态”一些,为了保持命名一致性,才把这里叫做“永久代”。
永久代的不好用主要体现在它难以调控。它的内存大小是由-XX:PermSize
和-XX:MaxPermSize
两个参数定死的,如果设定得太小,当常量池过大或者动态加载类的元数据过多时,就会直接OOM。如果设定得太大,会挤占原本可用于堆的空间,也会增大GC的压力。
另外,在JDK7时代就开始推动HotSpot与JRockit两套虚拟机的融合,而JRockit是不存在永久代的,因此HotSpot最后也取消了它。新加入的元空间则位于本地内存(native memory)中,消除了原来的大小限制,变得更加灵活。关于元空间的更多细节就不展开,请参见这里。
字符串常量池内存储的是什么
这个问题因为不容易验证,在网上经常引起争吵。
来看下面一段代码:
public class StringPoolExample {
public static void main(String[] args) {
String s1 = new String("a"); // #1
s1.intern(); // #2
String s2 = "a"; // #3
System.out.println(s1 == s2);
String s3 = s2 + s2; // #4
s3.intern(); // #5
String s4 = "aa"; // #6
System.out.println(s3 == s4);
}
}
这段代码在JDK6执行,输出false false
;但在JDK7/8执行,输出false true
。根据结果的不同,可以推测出字符串常量池内的存储也发生了变化。借助ProcessOn画图详细分析一下:
-
JDK6
在#1语句中,创建了多少个对象?这是面试中极常见的问题,答案是2个,堆中及字符串常量池中各一个。由于"a"是字面量,因此它会自动驻留。#2语句调用intern()时,字符串常量池中就已经存在它了。#3语句会直接找到常量池中的"a",故s1与s2的引用地址是不同的。
#4语句中,s3引用的字符串的值不能在编译期确定,因此生成了一个新的String对象。使用#5语句调用intern()时,常量池里还不存在"aa",将它加入进去。#6语句也会直接找到常量池中的"aa",故s3与s4的引用地址也是不同的。 -
JDK7/8
#1~#3语句的执行结果与上面相同,不再赘述。
而#4~#6执行完后为什么会返回true?既然==运算符比较的是引用类型的地址,那么只能说明s3和s4的引用地址是一样的。因此,上面的图应该做一个改动:
#5语句在执行时,堆中存在String对象"aa",但常量池中没有。这时不再像JDK6一样将对象加入常量池,而是将对"aa"的引用加入。该引用与s3引用的对象都是堆中的同一个String对象。这样,#6语句在常量池中找到"aa"时,实际上是找到了与s3相同的引用,所以s3 == s4是成立的。
-
结论:
在JDK6中,字符串常量池里保存的都是String对象。
在JDK7/8中,对于字符串字面量(当然也包括常量表达式),常量池里会直接保存String对象。如果是编译期不能确定的字符串,调用intern()方法会使得常量池中保存对堆内String对象的引用,而不会在常量池内再生成一个对象。之所以做这种改动,可能是考虑到字符串常量池已经移动到了堆中,因此没有必要在池内和池外各保留一个对象,这样节省空间。 -
附上前面一段代码的反汇编字节码。连同class文件常量池的内容一起贴在下面了:
Constant pool:
#1 = Methodref #14.#33 // java/lang/Object."<init>":()V
#2 = Class #34 // java/lang/String
#3 = String #35 // a
#4 = Methodref #2.#36 // java/lang/String."<init>":(Ljava/lang/String;)V
#5 = Methodref #2.#37 // java/lang/String.intern:()Ljava/lang/String;
#6 = Fieldref #38.#39 // java/lang/System.out:Ljava/io/PrintStream;
#7 = Methodref #40.#41 // java/io/PrintStream.println:(Z)V
#8 = Class #42 // java/lang/StringBuilder
#9 = Methodref #8.#33 // java/lang/StringBuilder."<init>":()V
#10 = Methodref #8.#43 // java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#11 = Methodref #8.#44 // java/lang/StringBuilder.toString:()Ljava/lang/String;
#12 = String #45 // aa
#13 = Class #46 // me/lmagics/StringPoolExample
#14 = Class #47 // java/lang/Object
#15 = Utf8 <init>
#16 = Utf8 ()V
#17 = Utf8 Code
#18 = Utf8 LineNumberTable
#19 = Utf8 LocalVariableTable
#20 = Utf8 this
#21 = Utf8 Lme/lmagics/StringPoolExample;
#22 = Utf8 main
#23 = Utf8 ([Ljava/lang/String;)V
#24 = Utf8 args
#25 = Utf8 [Ljava/lang/String;
#26 = Utf8 s1
#27 = Utf8 Ljava/lang/String;
#28 = Utf8 s2
#29 = Utf8 s3
#30 = Utf8 s4
#31 = Utf8 SourceFile
#32 = Utf8 StringPoolExample.java
#33 = NameAndType #15:#16 // "<init>":()V
#34 = Utf8 java/lang/String
#35 = Utf8 a
#36 = NameAndType #15:#48 // "<init>":(Ljava/lang/String;)V
#37 = NameAndType #49:#50 // intern:()Ljava/lang/String;
#38 = Class #51 // java/lang/System
#39 = NameAndType #52:#53 // out:Ljava/io/PrintStream;
#40 = Class #54 // java/io/PrintStream
#41 = NameAndType #55:#56 // println:(Z)V
#42 = Utf8 java/lang/StringBuilder
#43 = NameAndType #57:#58 // append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
#44 = NameAndType #59:#50 // toString:()Ljava/lang/String;
#45 = Utf8 aa
#46 = Utf8 me/lmagics/StringPoolExample
#47 = Utf8 java/lang/Object
#48 = Utf8 (Ljava/lang/String;)V
#49 = Utf8 intern
#50 = Utf8 ()Ljava/lang/String;
#51 = Utf8 java/lang/System
#52 = Utf8 out
#53 = Utf8 Ljava/io/PrintStream;
#54 = Utf8 java/io/PrintStream
#55 = Utf8 println
#56 = Utf8 (Z)V
#57 = Utf8 append
#58 = Utf8 (Ljava/lang/String;)Ljava/lang/StringBuilder;
#59 = Utf8 toString
{
public me.lmagics.StringPoolExample();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 3: 0
LocalVariableTable:
Start Length Slot Name Signature
0 5 0 this Lme/lmagics/StringPoolExample;
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=3, locals=5, args_size=1
0: new #2 // class java/lang/String
3: dup
4: ldc #3 // String a
6: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V
9: astore_1
10: aload_1
11: invokevirtual #5 // Method java/lang/String.intern:()Ljava/lang/String;
14: pop
15: ldc #3 // String a
17: astore_2
18: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
21: aload_1
22: aload_2
23: if_acmpne 30
26: iconst_1
27: goto 31
30: iconst_0
31: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
34: new #8 // class java/lang/StringBuilder
37: dup
38: invokespecial #9 // Method java/lang/StringBuilder."<init>":()V
41: aload_2
42: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
45: aload_2
46: invokevirtual #10 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
49: invokevirtual #11 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
52: astore_3
53: aload_3
54: invokevirtual #5 // Method java/lang/String.intern:()Ljava/lang/String;
57: pop
58: ldc #12 // String aa
60: astore 4
62: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
65: aload_3
66: aload 4
68: if_acmpne 75
71: iconst_1
72: goto 76
75: iconst_0
76: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
79: return
字符串常量池的实现方法
想从Oracle/Sun JDK获取这些信息不太可能,因此我们可以通过研究OpenJDK的native部分代码来大致得知字符串常量池的实现方法。在GitHub上可以找到OpenJDK各版本的源代码镜像,这里选取OpenJDK 7u版本分支,从String类开始入手读源码即可。
- openjdk/jdk/src/share/native/java/lang/String.c
#include "jvm.h"
#include "java_lang_String.h"
JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
return JVM_InternString(env, this);
}
- openjdk/hotspot/src/share/vm/prims/jvm.h
/*
* java.lang.String
*/
JNIEXPORT jstring JNICALL
JVM_InternString(JNIEnv *env, jstring str);
- openjdk/hotspot/src/share/vm/prims/jvm.cpp
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
JVMWrapper("JVM_InternString");
JvmtiVMObjectAllocEventCollector oam;
if (str == NULL) return NULL;
oop string = JNIHandles::resolve_non_null(str);
oop result = StringTable::intern(string, CHECK_NULL);
return (jstring) JNIHandles::make_local(env, result);
JVM_END
- openjdk/hotspot/src/share/vm/classfile/symbolTable.hpp
class StringTable : public Hashtable<oop, mtSymbol> {
friend class VMStructs;
private:
// The string table
static StringTable* _the_table;
// Set if one bucket is out of balance due to hash algorithm deficiency
static bool _needs_rehashing;
// Claimed high water mark for parallel chunked scanning
static volatile int _parallel_claimed_idx;
static oop intern(Handle string_or_null, jchar* chars, int length, TRAPS);
oop basic_add(int index, Handle string_or_null, jchar* name, int len,
unsigned int hashValue, TRAPS);
oop lookup(int index, jchar* chars, int length, unsigned int hashValue);
// Apply the give oop closure to the entries to the buckets
// in the range [start_idx, end_idx).
static void buckets_oops_do(OopClosure* f, int start_idx, int end_idx);
// Unlink the entries to the buckets in the range [start_idx, end_idx).
static void buckets_unlink(BoolObjectClosure* is_alive, int start_idx, int end_idx, int* processed, int* removed);
StringTable() : Hashtable<oop, mtSymbol>((int)StringTableSize,
sizeof (HashtableEntry<oop, mtSymbol>)) {}
StringTable(HashtableBucket<mtSymbol>* t, int number_of_entries)
: Hashtable<oop, mtSymbol>((int)StringTableSize, sizeof (HashtableEntry<oop, mtSymbol>), t,
number_of_entries) {}
public:
// The string table
static StringTable* the_table() { return _the_table; }
static void create_table() {
assert(_the_table == NULL, "One string table allowed.");
_the_table = new StringTable();
}
static void create_table(HashtableBucket<mtSymbol>* t, int length,
int number_of_entries) {
assert(_the_table == NULL, "One string table allowed.");
assert((size_t)length == StringTableSize * sizeof(HashtableBucket<mtSymbol>),
"bad shared string size.");
_the_table = new StringTable(t, number_of_entries);
}
// GC support
// Delete pointers to otherwise-unreachable objects.
static void unlink(BoolObjectClosure* cl) {
int processed = 0;
int removed = 0;
unlink(cl, &processed, &removed);
}
static void unlink(BoolObjectClosure* cl, int* processed, int* removed);
// Serially invoke "f->do_oop" on the locations of all oops in the table.
static void oops_do(OopClosure* f);
// Possibly parallel version of the above
static void possibly_parallel_oops_do(OopClosure* f);
static void possibly_parallel_unlink(BoolObjectClosure* cl, int* processed, int* removed);
// Hashing algorithm, used as the hash value used by the
// StringTable for bucket selection and comparison (stored in the
// HashtableEntry structures). This is used in the String.intern() method.
static unsigned int hash_string(const jchar* s, int len);
// Internal test.
static void test_alt_hash() PRODUCT_RETURN;
// Probing
static oop lookup(Symbol* symbol);
// Interning
static oop intern(Symbol* symbol, TRAPS);
static oop intern(oop string, TRAPS);
static oop intern(const char *utf8_string, TRAPS);
// Debugging
static void verify();
static void dump(outputStream* st);
// Sharing
static void copy_buckets(char** top, char*end) {
the_table()->Hashtable<oop, mtSymbol>::copy_buckets(top, end);
}
static void copy_table(char** top, char*end) {
the_table()->Hashtable<oop, mtSymbol>::copy_table(top, end);
}
static void reverse() {
the_table()->Hashtable<oop, mtSymbol>::reverse();
}
// Rehash the symbol table if it gets out of balance
static void rehash_table();
static bool needs_rehashing() { return _needs_rehashing; }
// Parallel chunked scanning
static void clear_parallel_claimed_index() { _parallel_claimed_idx = 0; }
static int parallel_claimed_index() { return _parallel_claimed_idx; }
};
- openjdk/hotspot/src/share/vm/classfile/symbolTable.cpp
StringTable* StringTable::_the_table = NULL;
oop StringTable::intern(Handle string_or_null, jchar* name,
int len, TRAPS) {
unsigned int hashValue = hash_string(name, len);
int index = the_table()->hash_to_index(hashValue);
oop found_string = the_table()->lookup(index, name, len, hashValue);
// Found
if (found_string != NULL) return found_string;
debug_only(StableMemoryChecker smc(name, len * sizeof(name[0])));
assert(!Universe::heap()->is_in_reserved(name) || GC_locker::is_active(),
"proposed name of symbol must be stable");
Handle string;
// try to reuse the string if possible
if (!string_or_null.is_null() && (!JavaObjectsInPerm || string_or_null()->is_perm())) {
string = string_or_null;
} else {
string = java_lang_String::create_tenured_from_unicode(name, len, CHECK_NULL);
}
// Grab the StringTable_lock before getting the_table() because it could
// change at safepoint.
MutexLocker ml(StringTable_lock, THREAD);
// Otherwise, add to symbol to table
return the_table()->basic_add(index, string, name, len,
hashValue, CHECK_NULL);
}
oop StringTable::lookup(int index, jchar* name,
int len, unsigned int hash) {
int count = 0;
for (HashtableEntry<oop, mtSymbol>* l = bucket(index); l != NULL; l = l->next()) {
count++;
if (l->hash() == hash) {
if (java_lang_String::equals(l->literal(), name, len)) {
return l->literal();
}
}
}
// If the bucket size is too deep check if this hash code is insufficient.
if (count >= BasicHashtable<mtSymbol>::rehash_count && !needs_rehashing()) {
_needs_rehashing = check_rehash_table(count);
}
return NULL;
}
oop StringTable::basic_add(int index_arg, Handle string, jchar* name,
int len, unsigned int hashValue_arg, TRAPS) {
assert(java_lang_String::equals(string(), name, len),
"string must be properly initialized");
// Cannot hit a safepoint in this function because the "this" pointer can move.
No_Safepoint_Verifier nsv;
// Check if the symbol table has been rehashed, if so, need to recalculate
// the hash value and index before second lookup.
unsigned int hashValue;
int index;
if (use_alternate_hashcode()) {
hashValue = hash_string(name, len);
index = hash_to_index(hashValue);
} else {
hashValue = hashValue_arg;
index = index_arg;
}
// Since look-up was done lock-free, we need to check if another
// thread beat us in the race to insert the symbol.
oop test = lookup(index, name, len, hashValue); // calls lookup(u1*, int)
if (test != NULL) {
// Entry already added
return test;
}
HashtableEntry<oop, mtSymbol>* entry = new_entry(hashValue, string());
add_entry(index, entry);
return string();
}
代码非常长,并且我也不是C/C++方面的行家,不过仍然能大致看出来,字符串常量池是用类似HashMap/Hashtable的数据结构维护的,名称为StringTable。
在StringTable::intern()方法中,也可以清晰地看出,如果能够在StringTable中找到目标字符串,就直接返回。否则,通过检查该字符串的引用是否为null,可以判断它是否在堆中已经存在。如果存在,就保留一个它的引用(C++代码内是Handle,即句柄),不存在的话就直接创建一个String对象。最后将引用或对象加入StringTable中。
StringTable的大小(也就是hash分桶数)固定为StringTableSize,不会扩容,并采用链地址法解决冲突。这就意味着如果进入字符串常量池中的String过多,就会产生比较严重的hash冲突,再调用String.intern()方法的耗时会变长。
StringTable的大小是能调整的。首先通过-XX:+PrintFlagsFinal
参数,可以找出StringTable的默认大小。在JDK7和8中,这个值都是60013:
uintx StringTableSize = 60013 {product}
而在JDK6和比较旧版本的JDK7中,默认值是1009。显然60013更合适一些。
通过-XX:StringTableSize
参数,可以改变StringTable的大小,如:
-XX:StringTableSize=75979
如果要手动改变它的大小,一般建议先估算整个程序中需要驻留的字符串的大致数量,然后设置一个它2倍左右的素数(可以减少冲突)。
另外,通过-XX:+PrintStringTableStatistics
参数,还可以得到当前JVM中StringTable的统计信息,如:
StringTable statistics:
Number of buckets : 60013 = 480104 bytes, avg 8.000
Number of entries : 1543 = 37032 bytes, avg 24.000
Number of literals : 1543 = 144088 bytes, avg 93.382
Total footprint : = 661224 bytes
Average bucket size : 0.026
Variance of bucket size : 0.026
Std. dev. of bucket size: 0.161
Maximum bucket size : 2
关于StringTable的更多测试,可以参考这里。
导出字符串常量池中的内容
可以通过HotSpot SA(Serviceability Agent)来实现。HotSpot SA是一套用来调试HotSpot虚拟机的内部代码,我们常用的jstack、jmap等调试工具都离不开它。
直接上代码:
import sun.jvm.hotspot.memory.SystemDictionary;
import sun.jvm.hotspot.oops.InstanceKlass;
import sun.jvm.hotspot.oops.OopField;
import sun.jvm.hotspot.runtime.VM;
import sun.jvm.hotspot.tools.Tool;
public class StringPoolDumpTool extends Tool {
@Override
public void run() {
// Use Reflection-like API to reference String class and String.value field
SystemDictionary dict = VM.getVM().getSystemDictionary();
InstanceKlass stringKlass = (InstanceKlass)dict.find("java/lang/String", null, null);
OopField valueField = (OopField)stringKlass.findField("value", "[C");
// Counters
long[] stats = new long[2];
// Iterate through the String Pool printing out each String object
VM.getVM().getStringTable().stringsDo(s -> {
s.printValueOn(System.out);
System.out.println();
stats[0]++;
stats[1] += s.getObjectSize() + valueField.getValue(s).getObjectSize();
});
System.out.printf("%d strings in pool with total size %d\n", stats[0], stats[1]);
}
public static void main(String[] args) {
// Use default SA tool launcher
new StringPoolDumpTool().execute(args);
}
}
然后执行:java -cp $JAVA_HOME/lib/sa-jdi.jar:. StringPoolDumpTool [PID]
,PID是要导出字符串常量池的JVM进程ID。从执行结果也可以看出,即使非常简单的程序中也存在大量的驻留字符串(如上面的StringPoolExample程序,也有至少700个),其中也包含像"java"这样的字符串。
The End
JVM内的东西千变万化,本文几乎可以肯定有疏漏,欢迎批评指正,不胜感激~
网友评论