美文网首页
字符集,排序规则,Unicode

字符集,排序规则,Unicode

作者: szn好色仙人 | 来源:发表于2019-12-22 21:13 被阅读0次

    序言

    show character set;
    --查看mysql支持的字符集,部分输出如下
    
    Character Describption Default collation Maxlen
    ascii US ASCII ascii_general_ci 1
    big5 Big5 Traditional Chinese(繁体中文) big5_chinese_ci 2
    binary Binary pseudo charset binary 1
    gb18030 China National Standard GB18030 gb18030_chinese_ci 4
    gb2312 GB2312 Simplified Chinese gb2312_chinese_ci 2
    gbk GBK Simplified Chinese gbk_chinese_ci 2
    utf16 UTF-16 Unicode utf16_general_ci 4
    utf16le UTF-16LE Unicode utf16le_general_ci 4
    utf32 UTF-32 Unicode utf32_general_ci 4
    utf8 UTF-8 Unicode utf8_general_ci 3
    utf8mb4 UTF-8 Unicode utf8mb4_0900_ai_ci 4
    show collation;
    --查看mysql支持的collation
    

    指定字符集与排序规则

    服务器的字符集与排序规则

    • character_set_server的默认值为utf8mb4

      show variables like 'character_set_server';
      --输出的Value列为utf8mb4
      
    • my.ini中可以设置character_set_server的值:

      character-set-server=utf8;
      --在my.ini中添加上述行,重启服务,即可设置character_set_server的值
      
    • 在运行过程中,可以更改character_set_server的值:

      set character_set_server = gbk;
      
    • 如果未在create database语句中指定字符集和排序规则,则会将character_set_servercollation_server作为默认值

    • 若只指定了字符集没有指定排序规则,则使用字符集对应的默认的排序规则

    数据库的字符集与排序规则

    • 可以在create databasealter database时指定字符集和排序规则

      create database d character set gbk collate gbk_bin;
      alter database d character set gb2312 collate gb2312_bin;
      
      use d;
      SELECT @@character_set_database, @@collation_database;
      --此sql语句用于输出当前数据库的字符集与排序规则
      
    • 当前数据库的字符集和排序规则,可以通过查看character_set_databasecollation_database来确定。若当前没有默认数据库,则character_set_databasecollation_databasecharacter_set_servercollation_server相同

      create database d0 character set gbk collate gbk_bin;
      create database d1 character set utf8 collate utf8_bin;
      
      use d0;
      show variables like 'character_set_database';
      --输出的character_set_database值为gbk
      
      use d1;
      show variables like 'character_set_database';
      --输出的character_set_database值为utf8
      
    • 数据库的字符集和排序规则会产生以下影响:

      • create table未指定字符集和排序规则时,则采用数据库的字符集和排序规则

      • load data未指定字符集和排序规则,则使用数据库的字符集来解释文件中的信息

      • 创建的存储过程和函数未指定字符集和排序规则,则使用数据库的字符集和排序规则

        use d0; --create database d0 character set gbk collate gbk_bin;
        delimiter \
        
        create procedure p0(in str varchar(111))
        begin
                select charset(str);
        end\
        
        delimiter ;
        
        call p0("szn");
        --输出:gbk
        
        use d1; --database d1 character set utf8 collate utf8_bin;
        delimiter \
        
        create procedure p1(in str varchar(111))
        begin
                select charset(str);
        end\
        
        delimiter ;
        
        call p1("szn");
        --输出:utf8
        
        use d1; --database d1 character set utf8 collate utf8_bin;
        delimiter \
        
        create procedure p_set(in str varchar(111) character set gb2312)
        begin
                select charset(str);
        end\
        
        delimiter ;
        
        call p_set("szn");
        --输出:gb2312
        

    表的字符集与排序规则

    • 可以在create tablealter table时指定表的字符集和排序规则

      create table t0(name varchar(1024)) character set gbk collate gbk_bin;
      alter table t0 character set utf8;
      
    • 若未指定表的字符集和排序规则,则使用对应数据库的字符集和排序规则

    • 若表中的列未指定字符集和排序规则,则对应的列使用表的字符集和排序规则

    列的字符集与排序规则

    • create tablealter table时指定列的字符集和排序规则

      create table t2(name varchar(1024) character set gbk collate gbk_bin);
      alter table t2 modify column name varchar(1024) character set utf8 collate utf8_bin;
      
    • 使用alter table将列的字符集进行更改时,若字符集不兼容,则可能会丢失数据

    字符串字面量的字符集与排序规则

    • 字符串字面量的默认字符集和排序规则由character_set_connectioncollation_connection指定

    • 字符串字面量可以设定字符集与排序规则:

      show variables like 'character_set_connection';
      --输出值:utf8
      
      set @b = "szn";
      select charset(@b);
      --输出值:utf8
      
      set @c = _gbk"哈" COLLATE gbk_bin;
      --字符串字面量前指定编码方式,这种方式称为介绍器(Character Set Introducers)
      
      select charset(@c);
      --输出值:gbk
      
    • 字符串字面量指定字符集并不会更改其值,解析器最终仍按照character_set_connection进行处理

      set @c0 = _gbk"哈" COLLATE gbk_bin;
      set @c1 = _utf8"哈" COLLATE utf8_bin;
      select hex(@c0), hex(@c1);
      --两个输出均为"哈"的utf8编码 E59388
      

    Character Set Introducers

    • 字符串字面量,十六进制字面量,位字面量都有一个可选的字符集和排序规则设置,这称为介绍器

    • 介绍器告诉解析器后面字符串使用的字符集,但是不会改变字符串的值

      set @s0 = _utf8 0xE59388;
      set @s1 = _gbk 0xB9FE;
      set @s2 = _gbk X'B9FE';
      set @s3 = _utf8 b'111001011001001110001000';
      set @s4 = _utf8 0b111001011001001110001000;
      select @s0, @s1, @s2, @s3, @s4;
      --输出:哈,哈,哈,哈, 哈
      
      SET @v1 = X'000D' | X'0BC0';
      --@V1是bigint类型
      
      SET @v2 = _binary X'000D' | X'0BC0';
      --@v2是字符串
      
      SELECT HEX(@v1), HEX(@v2);
      --输出:BCD, 0BCD
      
      select @v1 + 1, @v2 + 1, "0BCD" + "1";.
      --输出:3022, 1, 1
      

    连接的字符集与排序规则

    • 每个客户端的连接都有特定且可更改的字符集和排序规则

    • 客户端发起sql查询,至结果返回到客户端:

      • character_set_client表明客户端发送过来的sql语句的字符集
      • 服务器将接收到的sql语句的字符集转换为character_set_connection
      • 服务器将sql的执行结果的字符集转换为character_set_results
    • character_set_client不允许一些字符集设置:

      ucs2
      utf16
      utf16le
      utf32
      
    • set character set会同时将character_set_clientcharacter_set_results设定为给定值,并且将character_set_connection设置为character_set_database的值

      set character_set_client = ascii;
      set character_set_results = big5;
      set character_set_database = gb2312;
      set character_set_connection = gbk;
      
      set character set latin2;
      show variables like 'character%';
      
      Variable_name Value
      character_set_client latin2
      character_set_connection gb2312
      character_set_database gb2312
      character_set_results latin2
    • my.ini中添加default-character-set=latin2,重启服务,将会同时更改character_set_client,character_set_connection,character_set_resultslatin2

    • 可以在登录mysql时:mysql -u root -p --default-character-set=latin1

    • set names gb2312;
      --等效于下面三句代码
      
      set character_set_client = gb2312;
      set character_set_results = gb2312;
      set character_set_connection = gb2312;
      

    Unicode支持

    ​ The Unicode Standard includes characters from the Basic Multilingual(使用多种语言的) Plane (BMP) and supplementary(追加的) characters that lie outside the BMP.

    ​ BMP characters have these characteristics:

    • Their code point values are between 0 and 65535 (or U+0000 and U+FFFF).
    • They can be encoded in a variable-length encoding using 8, 16, or 24 bits (1 to 3 bytes).
    • They can be encoded in a fixed-length encoding using 16 bits (2 bytes).
    • They are sufficient(充分的) for almost all characters in major(主要的) languages.

    ​ Supplementary characters lie outside the BMP:

    • Their code point values are between U+10000 and U+10FFFF).
    • 占用的空间比BMP内的字符大,最多占用4字节

    mysql支持的unicode

    字符集 单个字符占用空间 支持的字符 描述
    utf8mb4 1到4字节 BMP and supplementary A UTF-8 encoding of the Unicode character set
    utf8mb3 1到3字节 BMP only A UTF-8 encoding of the Unicode character set
    utf8 1到3字节 BMP only An alias for utf8mb3.
    ucs2 2字节 BMP and supplementary The UCS-2 encoding of the Unicode character set
    utf16 2或4字节 BMP and supplementary The UTF-16 encoding for the Unicode character set . Like ucs2 but with an extension for supplementary characters.
    utf16le 2或4字节 BMP and supplementary The UTF-16LE encoding for the Unicode character set. Like utf16 but little-endian rather than big-endian.
    utf32 4字节 BMP and supplementary The UTF-32 encoding for the Unicode character set
    • utf8当前是utf8mb3的别称,但是utf8mb3将在mysql将来的版本被删除。有时候utf8会变成utf8mb4的别称,所以为了避免模糊不清,请考虑使用utf8mb4

    • 对于BMP字符,utf8mb4utf8mb3具有相同编码值,占用的空间也相同。即utf8mb4utf8mb3的超集

    • 对于补充字符,utf8mb4需要四个字节来存储

    相关文章

      网友评论

          本文标题:字符集,排序规则,Unicode

          本文链接:https://www.haomeiwen.com/subject/upcanctx.html