String类定义和基本属性
public final class String implements java.io.Serializable, Comparable<String>, CharSequence
,这个final表明本类不可继承,至于为什么不给继承,后面再说。
private final char value[];
表明String的内部实现是char数组。
- value的final限制
众所周知,Java的String类是不可变类,即为String对象一旦初始化,就不可以修改字符串的内容。那这个特征是如何保证的呢,就靠这个char数组属性的final限制吗?
明显不能够。final char value[]的设置,只能限制value变量初始化之后就不可以再修改引用指向其他地址,即为:
final char[] v = {'a','b','c'}; // 在堆上开辟空间,并且把地址给v
v = {'a'}; // 会报错
引用不可改变,但是引用的引用是可以改变的。如果执行以下代码:
v[0] = 'd'; // v变成{'d','b','c'}
这样的话,底层数组一变,String内容其实就变了。那么String类是如何保障它不变呢。
- 在String类中所有对
value
的操作都很谨慎,有修改都是重写创建数组再拷贝值返回,没有把char数组暴露出去; - 定义String类为final,限制它不可继承来保护
value
数组。
看以下代码:
public static void main(String[] args) {
String a = "abcd";
a = "efg";
System.out.println(a);
}
这个的打印结果是"efg",这样的话,a不是变了吗?实际上这里变的只是a指向的地址,原先的字符串“abcd”并没有改变,像以下的示意图:

Constructor
String类一共提供了18个重载的构造器,看几个常见的:
- 无参构造器,初始化char数组为空数组。
public String() {
this.value = new char[0];
}
- 使用String对象构建,这里的数组赋值是直接传递了引用,因为外部取不到内部的char数组,不会影响其不可变性。
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
- 通过char数组创建,数组拷贝,在底层的实现实际上是创建了新的数组,并不是简单的引用传递:
public String(char value[]) {
this.value = Arrays.copyOf(value, value.length);
}
// Arrays.copyof:
public static char[] copyOf(char[] original, int newLength) {
char[] copy = new char[newLength];
System.arraycopy(original, 0, copy, 0,
Math.min(original.length, newLength));
return copy;
}
这里如果不是复制数组而是直接引用,就会造成String可变了。
- 和上一个类似,Arrays.copyOfRange也是创建新的数组,复制指定区间的内容。
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
- 传入代码点(codePoint)的int数组,并指定选择范围。代码点是Unicode的知识。Unicode编码表中,一个代码点唯一代表一个现实世界的字符,反之亦然。
此处的逻辑是把codePoint转换成char,创建新数组容纳这些char,把数组传递给value。
public String(int[] codePoints, int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
// Note: offset or count might be near -1>>>1.
if (offset > codePoints.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
final int end = offset + count;
// Pass 1: Compute precise size of char[]
int n = count;
for (int i = offset; i < end; i++) {
int c = codePoints[i];
// 看下面
if (Character.isBmpCodePoint(c))
continue;
else if (Character.isValidCodePoint(c))
n++;
else throw new IllegalArgumentException(Integer.toString(c));
}
// Pass 2: Allocate and fill in char[]
final char[] v = new char[n];
for (int i = offset, j = 0; i < end; i++, j++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
v[j] = (char)c;
else
Character.toSurrogates(c, v, j++);
}
this.value = v;
}
/**
* Determines whether the specified character (Unicode code point)
* is in the <a href="#BMP">Basic Multilingual Plane (BMP)</a>.
* Such code points can be represented using a single {@code char}.
*
* @param codePoint the character (Unicode code point) to be tested
* @return {@code true} if the specified code point is between
* {@link #MIN_VALUE} and {@link #MAX_VALUE} inclusive;
* {@code false} otherwise.
* @since 1.7
*/
public static boolean isBmpCodePoint(int codePoint) {
return codePoint >>> 16 == 0;
// Optimized form of:
// codePoint >= MIN_VALUE && codePoint <= MAX_VALUE
// We consistently use logical shift (>>>) to facilitate
// additional runtime optimizations.
}
/**
* Determines whether the specified code point is a valid
* <a href="http://www.unicode.org/glossary/#code_point">
* Unicode code point value</a>.
*
* @param codePoint the Unicode code point to be tested
* @return {@code true} if the specified code point value is between
* {@link #MIN_CODE_POINT} and
* {@link #MAX_CODE_POINT} inclusive;
* {@code false} otherwise.
* @since 1.5
*/
public static boolean isValidCodePoint(int codePoint) {
// Optimized form of:
// codePoint >= MIN_CODE_POINT && codePoint <= MAX_CODE_POINT
int plane = codePoint >>> 16;
return plane < ((MAX_CODE_POINT + 1) >>> 16);
}
- 这里是把字节数组转成字符数组,在网络传输中,都是以字节为单位的,从字符到字节需要编码,反之需要解码。编解码过程就涉及到字符的编码(charset),编码的知识内容很多,此处不涉及。需要注意默认编码以及编码和解码使用的字符集一致才能得到预期的结果。
public String(byte bytes[], int offset, int length, String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null)
throw new NullPointerException("charsetName");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charsetName, bytes, offset, length);
}
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charset, bytes, offset, length);
}
public String(byte bytes[], String charsetName)
throws UnsupportedEncodingException {
this(bytes, 0, bytes.length, charsetName);
}
public String(byte bytes[], Charset charset) {
this(bytes, 0, bytes.length, charset);
}
public String(byte bytes[], int offset, int length) {
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(bytes, offset, length);
}
public String(byte bytes[]) {
this(bytes, 0, bytes.length);
}
- 这两个一般是反过来用的,StringBuffer或者StringBuilder使用toString方法得到String。
StringBuffer和StringBuilder一般用于拼接字符串,前者线程安全后者不是,但是后者效率会高些。
public String(StringBuffer buffer) {
synchronized(buffer) {
this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
}
}
public String(StringBuilder builder) {
this.value = Arrays.copyOf(builder.getValue(), builder.length());
}
- 这个和第 3个相似,但是这里使用的引用的直接传递,也就是两个变量指向同一个char数组,在前面说这样会造成String可变(修改char数组元素值);
所以它是包访问级别,供内部合理使用的。
/*
* Package private constructor which shares value array for speed.
* this constructor is always expected to be called with share==true.
* a separate constructor is needed because we already have a public
* String(char[]) constructor that makes a copy of the given char[].
*/
String(char[] value, boolean share) {
// assert share : "unshared not supported";
this.value = value;
}
String对+运算符的重载
在Java中是没有运算符重载这个特性的,仅有一个内置的运算符重载,就是String类的“+”;
看一个示例:
public static void main(String[] args) {
String a = "hello", b = "world", c;
c = a + b;
System.out.print(c);
}
使用javap -c
反编译:
public static void main(java.lang.String[]);
Code:
0: ldc #16 // String hello
2: astore_1
3: ldc #18 // String world
5: astore_2
6: new #20 // class java/lang/StringBuilder 这里是StringBuilder
9: dup
10: aload_1
11: invokestatic #22 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
14: invokespecial #28 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
17: aload_2
18: invokevirtual #31 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: invokevirtual #35 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
24: astore_3
25: aload_3
26: invokestatic #39 // Method print:(Ljava/lang/Object;)V
29: return
实际上+的操作是使用StringBuilder来连接字符串完成的,然后执行toString()返回String对象。
其他方法
- length(),如果不借助IDE的代码补全功能,数组的长度和String的长度,会混淆是谁是length谁是是length()。其实仔细一想,因为String底层是数组,数组有length这个属性,如果String再定义一个length岂不是冗余属性了,直接定义length()方法,返回内部数组的length就很方便简洁。
public int length() {
return value.length;
}
- codePointAt
查看字符的codePoint
public int codePointAt(int index) {
if ((index < 0) || (index >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return Character.codePointAtImpl(value, index, value.length);
}
- getChars,有几个重载方法,获取String内部char数组,copy到指定数组里面
void getChars(char dst[], int dstBegin) {
System.arraycopy(value, 0, dst, dstBegin, value.length);
}
- getBytes,有多个重载,用来获取字节数组
public byte[] getBytes(String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null) throw new NullPointerException();
// 使用指定字符集进行编码
return StringCoding.encode(charsetName, value, 0, value.length);
}
- equals方法
在一般的引用类型上,equals方法和==符号都是比较的对象地址,只是String等类重写的equals方法,才使得他们可以被用来比较内容;
对应基本数据类型,equals和==是比较的值。
public boolean equals(Object anObject) {
// 先比较对象相等
if (this == anObject) {
return true;
}
// 判断是否是String
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
// 比较char数组长度
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
// 依次比较数组内容
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
- 忽略大小写的比较
public boolean equalsIgnoreCase(String anotherString) {
return (this == anotherString) ? true
: (anotherString != null)
&& (anotherString.value.length == value.length)
&& regionMatches(true, 0, anotherString, 0, value.length);
}
忽略大小写的比较,实际上在内部是转成大写来比较的,并没有使用正则表达式:
public boolean regionMatches(boolean ignoreCase, int toffset,
String other, int ooffset, int len) {
char ta[] = value;
int to = toffset;
char pa[] = other.value;
int po = ooffset;
// Note: toffset, ooffset, or len might be near -1>>>1.
if ((ooffset < 0) || (toffset < 0)
|| (toffset > (long)value.length - len)
|| (ooffset > (long)other.value.length - len)) {
return false;
}
while (len-- > 0) {
char c1 = ta[to++];
char c2 = pa[po++];
if (c1 == c2) {
continue;
}
if (ignoreCase) {
// If characters don't match but case may be ignored,
// try converting both characters to uppercase.
// If the results match, then the comparison scan should
// continue.
char u1 = Character.toUpperCase(c1);
char u2 = Character.toUpperCase(c2);
if (u1 == u2) {
continue;
}
// Unfortunately, conversion to uppercase does not work properly
// for the Georgian alphabet, which has strange rules about case
// conversion. So we need to make one last check before
// exiting.
if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
continue;
}
}
return false;
}
return true;
}
- startsWith&&endsWith,逻辑是比较char数组的相等性。关注一下while循环的写法。
public boolean startsWith(String prefix, int toffset) {
char ta[] = value;
int to = toffset;
char pa[] = prefix.value;
int po = 0;
int pc = prefix.value.length;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > value.length - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}
public boolean endsWith(String suffix) {
return startsWith(suffix, value.length - suffix.value.length);
}
- join,使用字符拼接多个字符串,1.8新加。
/*
* @since 1.8
*/
public static String join(CharSequence delimiter, CharSequence... elements) {
Objects.requireNonNull(delimiter);
Objects.requireNonNull(elements);
// Number of elements not likely worth Arrays.stream overhead.
// StringJoiner是基于StringBuilder的字符串封装
StringJoiner joiner = new StringJoiner(delimiter);
for (CharSequence cs: elements) {
joiner.add(cs);
}
return joiner.toString();
}
- valueOf 转换为字符串对象
值得一看的是这个重载实现
public static String valueOf(char c) {
char data[] = {c};
//share
return new String(data, true);
}
-
public native String intern();
String类维护一个初始为空的字符串的对象池,当intern方法被调用时,如果对象池中已经包含这一个相等的字符串对象则返回对象池中的实例,否则添加字符串到对象池并返回该字符串的引用。
网友评论