Java之String

作者: Cool_Pomelo | 来源:发表于2020-04-01 11:25 被阅读0次

Java之String

开篇

下面这段代码的输出:


        String str1= "abc";
        String str2= new String("abc");
        String str3= str2.intern();

        System.out.println(str1==str2);

        System.out.println(str2==str3);

        System.out.println(str1==str3);

String对象的内部实现

图示:

图1.png

在 Java6 以及之前的版本中，String 对象是对 char 数组进行了封装实现的对象，主要有四个成员变量：char 数组、偏移量 offset、字符数量 count、哈希值 hash。

String 对象是通过 offset 和 count 两个属性来定位 char[] 数组，获取字符串。这么做可以高效、快速地共享数组对象，同时节省内存空间，但这种方式很有可能会导致内存泄漏。

从 Java7 版本开始到 Java8 版本，Java 对 String 类做了一些改变。String 类中不再有 offset 和 count 两个变量了。这样的好处是 String 对象占用的内存稍微少了些，同时，String.substring 方法也不再共享 char[]，从而解决了使用该方法可能导致的内存泄漏问题。
从 Java9 版本开始，将 char[] 字段改为了 byte[] 字段，又维护了一个新的属性 coder，它是一个编码格式的标识。

为什么这样修改？

一个 char 字符占 16 位，2 个字节。这个情况下，存储单字节编码内的字符（占一个字节的字符）就显得非常浪费。JDK1.9 的 String 类为了节约内存空间，于是使用了占 8 位，1 个字节的 byte 数组来存放字符串。

而新属性 coder 的作用是，在计算字符串长度或者使用 indexOf（）函数时，我们需要根据这个字段，判断如何计算字符串长度。coder 属性默认有 0 和 1 两个值，0 代表 Latin-1（单字节编码），1 代表 UTF-16。如果 String 判断字符串只包含了 Latin-1，则 coder 属性值为 0，反之则为 1。

String 对象的不可变性

String 类被 final 关键字修饰了，而且变量 char 数组也被 final 修饰了

类被 final 修饰代表该类不可继承，而 char[] 被 final+private 修饰，代表了 String 对象不可被更改。Java 实现的这个特性叫作 String 对象的不可变性，即 String 对象一旦创建成功，就不能再对它进行改变。

优点

保证 String 对象的安全性。假设 String 对象是可变的，那么 String 对象将可能被恶意修改。
保证 hash 属性值不会频繁变更，确保了唯一性，使得类似 HashMap 容器才能实现相应的 key-value 缓存功能。
可以实现字符串常量池。在 Java 中，通常有两种创建字符串对象的方式，一种是通过字符串常量的方式创建，如 String str=“abc”；另一种是字符串变量通过 new 形式的创建，如 String str = new String(“abc”)。

当代码中使用第一种方式创建字符串对象时，JVM 首先会检查该对象是否在字符串常量池中，如果在，就返回该对象引用，否则新的字符串将在常量池中被创建。这种方式可以减少同一个值的字符串对象的重复创建，节约内存。

String str = new String(“abc”) 这种方式，首先在编译类文件时，"abc"常量字符串将会放入到常量结构中，在类加载时，“abc"将会在常量池中创建；其次，在调用 new 时，JVM 命令将会调用 String 的构造函数，同时引用常量池中的"abc” 字符串，在堆内存中创建一个 String 对象；最后，str 将引用 String 对象。

使用

字符串常量的累计


  public static void main(String[] args) {
        //字符串常量的累计
        
        String s = "a" + "b" + "c";

        System.out.println(s);

    }

首先会生成 a 对象，再生成 ab 对象，最后生成 abc 对象，从理论上来说，这段代码是低效的。

实际运行中，发现只有一个对象生成

查看字节码，编译器自动优化了这行代码


//  public static void main(java.lang.String[]);
//    descriptor: ([Ljava/lang/String;)V
//    flags: ACC_PUBLIC, ACC_STATIC
//    Code:
//      stack=2, locals=2, args_size=1
//         0: ldc           #2                  // String abc
//         2: astore_1
//         3: getstatic     #3                  // Field java/lang/System.out:Ljava/io/PrintStream;
//         6: aload_1
//         7: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
//        10: return

字符串变量的累计



    public static void main(String[] args) {

        //字符串变量的累计
        String str = "abcdef";

        for(int i=0; i<1000; i++) {
            str = str + i;
        }

    }

反编译class文件:


//public class T5
//{
//
//    public T5()
//    {
//    }
//
//    public static void main(String args[])
//    {
//        String str = "abcdef";
//        for(int i = 0; i < 1000; i++)
//            str = (new StringBuilder()).append(str).append(i).toString();
//
//    }

编译器同样对这段代码进行了优化。Java 在进行字符串的拼接时，偏向使用StringBuilder，这样可以提高程序的效率。

String.intern

JDK文档:


  /**
     * Returns a canonical representation for the string object.
     * <p>
     * A pool of strings, initially empty, is maintained privately by the
     * class {@code String}.
     * <p>
     * When the intern method is invoked, if the pool already contains a
     * string equal to this {@code String} object as determined by
     * the {@link #equals(Object)} method, then the string from the pool is
     * returned. Otherwise, this {@code String} object is added to the
     * pool and a reference to this {@code String} object is returned.
     * <p>
     * It follows that for any two strings {@code s} and {@code t},
     * {@code s.intern() == t.intern()} is {@code true}
     * if and only if {@code s.equals(t)} is {@code true}.
     * <p>
     * All literal strings and string-valued constant expressions are
     * interned. String literals are defined in section 3.10.5 of the
     * <cite>The Java&trade; Language Specification</cite>.
     *
     * @return  a string that has the same contents as this string, but is
     *          guaranteed to be from a pool of unique strings.
     */
    public native String intern();

从注释中看到，这个方法的作用是如果常量池中存在当前字符串，就会直接返回当前字符串，如果常量池中没有此字符串，会将此字符串放入常量池中后再返回

例子



 public static void main(String[] args) {

        String s = new String("1");
        s.intern();
        String s2 = "1";
        System.out.println(s == s2);

        String s3 = new String("1") + new String("1");
        s3.intern();
        String s4 = "11";
        System.out.println(s3 == s4);
        /*
        false
        true
         */

    }

图示:

图2.png

图中绿色线条代表 string 对象的内容指向。蓝色线条代表地址指向。

jdk7 的版本中，字符串常量池已经从 Perm 区移到正常的 Java Heap 区域

s3和s4字符串

String s3 = new String("1") + new String("1");，这句代码中现在生成了2最终个对象，是字符串常量池中的“1” 和 JAVA Heap 中的 s3引用指向的对象。中间还有2个匿名的new String("1")我们不去讨论它们。此时s3引用对象内容是”11”，但此时常量池中是没有 “11”对象的。

接下来s3.intern();这一句代码，是将 s3中的“11”字符串放入 String 常量池中，因为此时常量池中不存在“11”字符串，所以在常量池中生成一个 “11” 的对象，关键点是 jdk7 中常量池不在 Perm 区域了，这块做了调整。常量池中不需要再存储一份对象了，可以直接存储堆中的引用。这份引用指向 s3 引用的对象。也就是说引用地址是相同的。

最后String s4 = "11"; 这句代码中”11”是显示声明的，因此会直接去常量池中创建，创建的时候发现已经有这个对象了，此时也就是指向 s3 引用对象的一个引用。所以 s4 引用就指向和 s3 一样了。因此最后的比较 s3 == s4 是 true

s 和 s2 对象

String s = new String("1"); 第一句代码，生成了2个对象。常量池中的“1” 和 JAVA Heap 中的字符串对象。s.intern(); 这一句是 s 对象去常量池中寻找后发现 “1” 已经在常量池里了。

接下来String s2 = "1"; 这句代码是生成一个 s2的引用指向常量池中的“1”对象。结果就是 s 和 s2 的引用地址明显不同

调整下代码:



 public static void main(String[] args) {

        String s = new String("1");
        String s2 = "1";
        s.intern();
        System.out.println(s == s2);

        String s3 = new String("1") + new String("1");
        String s4 = "11";
        s3.intern();
        System.out.println(s3 == s4);
        /*

        false
        false
         */

    }

图示:

图3.png

图中绿色线条代表 string 对象的内容指向。蓝色线条代表地址指向。

第一段代码和第二段代码的改变就是 s3.intern(); 的顺序是放在String s4 = "11";后了。这样，首先执行String s4 = "11";声明 s4 的时候常量池中是不存在“11”对象的，执行完毕后，“11“对象是 s4 声明产生的新对象。然后再执行s3.intern();时，常量池中“11”对象已经存在了，因此 s3 和 s4 的引用是不同的。

第二段代码中的 s 和 s2 代码中，s.intern();，这一句往后放也不会有什么影响了，因为对象池中在执行第一句代码String s = new String("1");的时候已经生成“1”对象了。下边的s2声明都是直接从常量池中取地址引用的。 s 和 s2 的引用地址是不会相等的。

参考资料

https://tech.meituan.com/2014/03/06/in-depth-understanding-string-intern.html

Java之String

Java之String

开篇

String对象的内部实现

String 对象的不可变性

使用

字符串常量的累计

字符串变量的累计

String.intern

例子

s3和s4字符串

s 和 s2 对象

参考资料

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

一些收藏