字符集和字节序
什么是字符集和字符序?简单的说:
- 字符集(character set):定义了字符以及字符的编码.
- 字符序(collation):定义了字符的比较规则.
关于字符集,我们在Mysql中最容易遇到的问题就是插入的字符中含有emoji表情,如果我们设置的字符集不是utf8mb4,而是utf8我们就会遇见1366
的错误码.其中的原因就是Mysql的utf8是被阉割后只支持3个字节,而emoji表情需要4个字节.
关于字符序,例如我们在查询中的sql语句是:select * from t_content where content = 'abc'
,很可能会出现content=ABC
的记录.而这种情况发生主要就是字符序导致的.
Mysql支持的字符集和字符序
因为Mysql版本的不同,所支持的字符集合字符序也有细微差异,我们可以通过下面查询语句列出:
-- 查看字符集
SHOW CHARSET;
-- 查看字符序
SHOW COLLATION;
查询结果如下(Mysql版本为5.7.21):


字符集中的Charset
表示该字符集在Mysql中的名称,Default collation
表示该字符集默认的字符序,Maxlen
表示该字符集支持的最大长度.
字符序中的Collation
表示字符序在Mysql中的名称,Charset
表示其对应的字符集,Default
表示是不是默认的的字符序.对于字节序的命名规则,它的组成一般都是字符集_语言_后缀
.而其中的后缀代表它的比较规则,如下表所示:
后缀 | 描述 |
---|---|
ai | 不区分重音 |
as | 区分重音 |
ci | 不区分大小写 |
cs | 区分大小写 |
bin | 以二进制方式比较 |
字符集和字符序查看
Mysql支持对server,database,table,column的字符集和字符序进行设置和修改.
server
通过下面语句可以查看Mysql服务的目前的字符集和字节序:
-- 查询字符集
mysql> SHOW VARIABLES LIKE "character_set_server";
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_server | utf8 |
+----------------------+-------+
1 row in set
-- 查询字符序
mysql> SHOW VARIABLES LIKE "collation_server";
+------------------+-----------------+
| Variable_name | Value |
+------------------+-----------------+
| collation_server | utf8_general_ci |
+------------------+-----------------+
1 row in set
我们可以通过下面语句来修改server的字符集和字符序:
-- 设置字符集
mysql> SET character_set_server = utf8mb4;
Query OK, 0 rows affected
-- 设置字符序
mysql> SET collation_server = utf8mb4_bin;
Query OK, 0 rows affected
mysql> SHOW VARIABLES LIKE "collation_server";
+------------------+-------------+
| Variable_name | Value |
+------------------+-------------+
| collation_server | utf8mb4_bin |
+------------------+-------------+
1 row in set
mysql> SHOW VARIABLES LIKE "character_set_server";
+----------------------+---------+
| Variable_name | Value |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+
1 row in set
database
通过下面方式我们可以查看database的字符集和字符序:
-- 选择tests数据库
mysql> use test;
Database changed
-- 查询
mysql> select @@character_set_database,@@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4 | utf8mb4_general_ci |
+--------------------------+----------------------+
1 row in set
还可以通过下面这种方式查询:
mysql> SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE schema_name="test";
+-------------+----------------------------+------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+-------------+----------------------------+------------------------+
| test | utf8mb4 | utf8mb4_general_ci |
+-------------+----------------------------+------------------------+
在创建表时我们可以通过下面语句设置字符集和字符序:
CREATE DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
如果表已经创建完成,后续需要修改我们可以通过下面语句修改:
ALTER DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
下面展示如何修改test库
的字符集和字节序:
mysql> alter database test character set utf8 collate utf8_bin;
Query OK, 1 row affected
mysql> SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE schema_name="test";
+-------------+----------------------------+------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+-------------+----------------------------+------------------------+
| test | utf8 | utf8_bin |
+-------------+----------------------------+------------------------+
在创建数据时,如果我们指定了字符集和字符序则会使用我们指定的,如果我们没有指定,则会使用server中的字符集和字符序.
table
通过下面方式我们可以查看table的字符集和字符序:
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| buydeem_order |
| data |
| employees |
| jpa |
| mysql |
| performance_schema |
| rest |
| sakila |
| sales_support |
| sell |
| sys |
| test |
| world |
+--------------------+
14 rows in set (0.00 sec)
-- 查看字符序 show table status from 库名 where name = 表名
mysql> show table status from test where name = 't_content'\G
*************************** 1. row ***************************
Name: t_content
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2020-09-15 15:22:08
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
Collation
就是该table的字符序,该结果中为utf8_general_ci
,与之对应的字符集就是utf8
.我们可以使用下面语句修改table的字符集和字符序:
-- ALTER TABLE 表名 DEFAULT CHARACTER SET=utf8 COLLATE=utf8_bin;
mysql> use test;
Database changed
mysql> alter table t_content default character set utf8 collate utf8_bin;
Query OK, 0 rows affected
Records: 0 Duplicates: 0 Warnings: 0
mysql> show table status from test where name = 't_content'\G
*************************** 1. row ***************************
Name: t_content
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2020-09-15 17:42:59
Update_time: NULL
Check_time: NULL
Collation: utf8_bin
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
通过上面语句我们修改了table对应的字符集和字符序.
column
通过下面语句查看column的字符集和字符序:
SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA=数据库名称 AND TABLE_NAME=表名 AND COLUMN_NAME=字段名;
同样我们可以修改column属性时设置CHARACTER SET
和COLLATE
来修改原先的字符集和字符序:
-- 修改表t_content的content的字符集为utf8mb4字符序为utf8mb4_bin
ALTER TABLE t_content
MODIFY COLUMN content CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
网友评论