美文网首页
再看Java编码

再看Java编码

作者: xxjacob | 来源:发表于2019-01-24 12:26 被阅读0次

    从Emoji联想到的

    比如 这个Emoji 😂的unicode 编码是 U+1F602

    java char只有2个字节,肯定无法表示。

    那String 怎么表示 Emoji这种 需要3个及以上字节表示的(大于 U+FFFF) 的字符呢?

    其实是 UTF-16。

    UTF-16 uses sequences of one or two unsigned 16-bit code units to encode Unicode code points. Values U+0000 to U+FFFF are encoded in one 16-bit unit with the same value. Supplementary characters are encoded in two code units, the first from the high-surrogates range (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). This may seem similar in concept to multi-byte encodings, but there is an important difference: The values U+D800 to U+DFFF are reserved for use in UTF-16; no characters are assigned to them as code points. This means, software can tell for each individual code unit in a string whether it represents a one-unit character or whether it is the first or second unit of a two-unit character. This is a significant improvement over some traditional multi-byte character encodings, where the byte value 0x41 could mean the letter "A" or be the second byte of a two-byte character. 

    😂的UTF-16编码是 \uD83D\uDE02

    String中用2个char来表示。

    你可以定义 

    public String emostring ="😂"; 

    emostring.length()  // 返回2

    emostring.codePointCount(0,emostring.length())  // 返回1

    http://www.oracle.com/us/technologies/java/supplementary-142654.html

    相关文章

      网友评论

          本文标题:再看Java编码

          本文链接:https://www.haomeiwen.com/subject/cheyjqtx.html