String源码浅析

作者: lvlvforever | 来源:发表于2018-06-20 18:08 被阅读0次

Java中的String类平时用的多，但对其内部实现机制并不了解，本着知其然，更要知其所以然的学习态度，今天就研究下String源码。以1.7.0_80为例。

String类定义

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence{}

这是String类的定义，final说明该类不允许被继承，如果有一个String的引用，那么肯定是String的引用，而不会是其他的类。接口方面实现了以下三个接口:

Serializable:序列化接口，该接口无任何方法和域，仅用于说明此类可以序列化。
Comparable<String>：用来实现字符串比较，需要实现其compareTo(T o)方法。
CharSequence:该接口代表了一个只读的字符序列。

成员变量

/** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

主要的成员变量就是上面的三个，value数组是用来保存字符串的，也就是字符串是以char[]数组的方式保存的，使用了private修饰符，而且并没有提供任何其他访问方法，使用了final关键字修饰，使得value的引用不可改变(引用的具体内容是可以改变的,不过因为外部无法获取value引用，所以也无法改变其内容)。
hash保存了字符串的hashcode值，将其缓存起来，提高效率。

构造方法

看几个构造方法:

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

这个方法仅仅是将original字符串的value引用和hash复制了一下，因为string的不可变性，所以使用这一方法没有必要。

public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

这个方法是复制了一份字符数组给value。

public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

所有的构造方法均是给value设置值。

方法

    public int length() {
        return value.length;
    }
    public boolean isEmpty() {
        return value.length == 0;
    }

length()和isEmpty()都是使用的value数组的属性。

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String) anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                            return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

equals(Object anObject)方法写的很经典，首先判断这两个对象地址是否一样，如果一样那肯定就相等了，其次判断anObject是否是string类型的，然后对其中的value数组进行长度判断，通过后逐一字符是否相等。

 public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;
            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

hashcode计算利用了 s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]公式
现在IDE都可以自动生成equals(Object o)和hashCode()方法。

public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

字符串比较方法，按照ascii码的顺序和长度进行比较。

public int compareToIgnoreCase(String str) {
        return CASE_INSENSITIVE_ORDER.compare(this, str);
    }

使用了一个静态内部类来做忽略字符大小写的比较。

public String concat(String str) {
        int otherLen = str.length();
        if (otherLen == 0) {
            return this;
        }
        int len = value.length;
        char buf[] = Arrays.copyOf(value, len + otherLen);
        str.getChars(buf, len);
        return new String(buf, true);
    }

连接字符串，如果str是空的，就返回当前字符串，否则就使用Arrays.copy()在建立一个buf数组，将str也保存到buf中，在建立一个新的字符串。

static int indexOf(char[] source, int sourceOffset, int sourceCount,
            char[] target, int targetOffset, int targetCount,
            int fromIndex) {
        if (fromIndex >= sourceCount) {
            return (targetCount == 0 ? sourceCount : -1);
        }
        if (fromIndex < 0) {
            fromIndex = 0;
        }
        if (targetCount == 0) {
            return fromIndex;
        }

        char first = target[targetOffset];
        int max = sourceOffset + (sourceCount - targetCount);

        for (int i = sourceOffset + fromIndex; i <= max; i++) {
            /* Look for first character. */
            if (source[i] != first) {
                while (++i <= max && source[i] != first);
            }

            /* Found first character, now look at the rest of v2 */
            if (i <= max) {
                int j = i + 1;
                int end = j + targetCount - 1;
                for (int k = targetOffset + 1; j < end && source[j]
                        == target[k]; j++, k++);

                if (j == end) {
                    /* Found whole string. */
                    return i - sourceOffset;
                }
            }
        }
        return -1;
    }

这个是查找子串的方法，这里使用了最基础的查找，首先找到第一个匹配的字符，接着看剩下的字符是否匹配，如果不匹配，则从下一个字符重新查找。这里为啥不使用kmp算法呢？这里有说明。stackoverflow
大概意思就是一般字符串比较短，这种暴力算法就可以了，如果字符串很长，那么会用其他的数据结构来做，KMP算法有一定的预处理操作以及空间占用。

public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

获取子串。这个方法在1.7里进行了调整，之前的版本里string里有一个offset和count成员变量，来标识此字符串是value数组里的哪段。这里导致内存泄漏的问题，在对长字符串进行substring()操作时，直接修改了offset和count属性，这种模式是享元模式，两个字符串共享同一个底层数组，当长字符串被回收后，子串会保留长字符串的完整的数组，所以会出现内存泄漏问题。1.7里通过复制所需要的数组来解决这个问题，虽然多了些空间消耗，但解决了内存泄漏的问题。

public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */

        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

消除字符串两端空白字符。

总结下，String类实质是一个private final char[] value数组，所有的都是围绕value来处理的。

网友评论

本文标题：String源码浅析

本文链接：https://www.haomeiwen.com/subject/rokxyftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！