从Emoji联想到的
比如 这个Emoji 😂的unicode 编码是 U+1F602
java char只有2个字节,肯定无法表示。
那String 怎么表示 Emoji这种 需要3个及以上字节表示的(大于 U+FFFF) 的字符呢?
其实是 UTF-16。
UTF-16 uses sequences of one or two unsigned 16-bit code units to encode Unicode code points. Values U+0000 to U+FFFF are encoded in one 16-bit unit with the same value. Supplementary characters are encoded in two code units, the first from the high-surrogates range (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). This may seem similar in concept to multi-byte encodings, but there is an important difference: The values U+D800 to U+DFFF are reserved for use in UTF-16; no characters are assigned to them as code points. This means, software can tell for each individual code unit in a string whether it represents a one-unit character or whether it is the first or second unit of a two-unit character. This is a significant improvement over some traditional multi-byte character encodings, where the byte value 0x41 could mean the letter "A" or be the second byte of a two-byte character.
😂的UTF-16编码是 \uD83D\uDE02
String中用2个char来表示。
你可以定义
public String emostring ="😂";
emostring.length() // 返回2
emostring.codePointCount(0,emostring.length()) // 返回1
http://www.oracle.com/us/technologies/java/supplementary-142654.html
网友评论