摘要:MySQL
,JSON类型
,多值索引
, 用户画像
MySQL是结构化数据存储,JSON是非结构化格式,在MySQL中使用JSON类型可以打通关系型和非关系型数据的存储之间的界限,为业务提供更好的架构选择,以下内容包括
- (1)JSON 数据类型
- (2)JSON类型创建插入数据
- (3)提取JSON内字段
- (4)JSON类型和字符串的区别
- (5)JSON类型数据修改
- (6)JSON类型使用索引
- (7)JSON类型其他常用函数
- (8)JSON ARRAY的多值索引
- (9)基于JSON类型的用户画像设计
(1)JSON 数据类型
JSON(JavaScript Object Notation)主要用于互联网应用服务之间的数据交换。MySQL 支持JSON 对象
和JSON 数组
两种类型,JSON 类型是从 MySQL 5.7 版本开始支持的功能,MySQL中使用JSON有以下好处
- 无须预定义字段:字段可以无限拓展,避免了ALTER ADD COLUMN的操作,使用更加灵活
- 处理稀疏字段:避免了稀疏字段的NULL值,避免冗余存储
- 支持索引:相比于字符串格式的JSON,JSON类型支持索引做特定的查询优化
总体而言,JSON 类型比较适合存储一些修改较少、相对静态的数据,或者说适合存储修改较少,且容忍稀疏的聚合数据,比如存储用户的登录信息
(2)JSON类型创建插入数据
create database test default charset utf8mb4;
CREATE TABLE student (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
info JSON DEFAULT NULL
);
插入数据
mysql> INSERT student (info) VALUES ('{"sex": "F", "age": 13, "city": "beijing"}');
Query OK, 1 row affected (0.13 sec)
mysql> INSERT student (info) VALUES ('{"sex": "M", "age": 14, "city": "suzhou"}');
Query OK, 1 row affected (0.10 sec)
mysql> INSERT student (info) VALUES ('{"sex": "F", "age": 23, "city": "shenzhen"}');
Query OK, 1 row affected (0.20 sec)
查看数据
mysql> select * from student;
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
试一下插入JSON数组
mysql> INSERT student (info) VALUES ('[1,2,3,4]');
Query OK, 1 row affected (0.12 sec)
mysql> INSERT student (info) VALUES ('[{"sex": "M"},{"sex":"F", "city":"nanjing"}]');
Query OK, 1 row affected (0.11 sec)
mysql> select * from student;
+----+-------------------------------------------------+
| id | info |
+----+-------------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
| 4 | [1, 2, 3, 4] |
| 5 | [{"sex": "M"}, {"sex": "F", "city": "nanjing"}] |
+----+-------------------------------------------------+
5 rows in set (0.00 sec)
- 可以混合插入JSON数组和JSON对象,但是必须要符合JSON格式
- 其中整个字符串使用单引号,键值对使用双引号,
- JSON中可以允许有多余空格,MySQL会自动解析,输出的使用格式为符号后带有一个空格
(3)提取JSON内字段
因为支持了新的JSON类型,MySQL 配套提供了丰富的 JSON 字段处理函数,用于方便地操作 JSON 数据,最常见的就是函数 JSON_EXTRACT
,它用来从 JSON 数据中提取所需要的字段内容
(1)提取JSON对象
主要是JSON_UNQUOTE和JSON_EXTRACT,JSON_EXTRACT
作用是去除最外侧的双引号,JSON_EXTRACT
根据键提取值
mysql> SELECT
-> id,
-> JSON_UNQUOTE(JSON_EXTRACT(info,"$.sex")) sex,
-> JSON_UNQUOTE(JSON_EXTRACT(info,"$.age")) age,
-> JSON_UNQUOTE(JSON_EXTRACT(info,"$.city")) city
-> FROM student;
+----+------+------+----------+
| id | sex | age | city |
+----+------+------+----------+
| 1 | F | 13 | beijing |
| 2 | M | 14 | suzhou |
| 3 | F | 23 | shenzhen |
+----+------+------+----------+
3 rows in set (0.00 sec)
MySQL 还提供了 ->> 表达式,和上述 SQL 效果完全一样,->>也是去除最外面的引号,还有一种符号->,它也能得到提取结果但是不去除外面的符号,相当于->和JSON_EXTRACT对应
mysql> SELECT
-> id,
-> info->>"$.sex" sex,
-> info->>"$.age" age,
-> info->>"$.city" city
-> FROM student;
+----+------+------+----------+
| id | sex | age | city |
+----+------+------+----------+
| 1 | F | 13 | beijing |
| 2 | M | 14 | suzhou |
| 3 | F | 23 | shenzhen |
+----+------+------+----------+
3 rows in set (0.00 sec)
如果JSON对象中查询的键不存在,则返回为NULL
mysql> SELECT
-> id,
-> JSON_UNQUOTE(JSON_EXTRACT(info,"$.sex")) sex,
-> JSON_UNQUOTE(JSON_EXTRACT(info,"$.height")) height
-> FROM student;
+----+------+--------+
| id | sex | height |
+----+------+--------+
| 1 | F | NULL |
| 2 | M | NULL |
| 3 | F | NULL |
+----+------+--------+
3 rows in set (0.00 sec)
(2)提取JSON数组
先创建JSON数组类型,插入数据
mysql> CREATE TABLE student (
-> id BIGINT AUTO_INCREMENT PRIMARY KEY,
-> info JSON DEFAULT NULL
-> );
Query OK, 0 rows affected (0.42 sec)
mysql> INSERT student (info) VALUES ('[1, 2, 3, 4]');
Query OK, 1 row affected (0.09 sec)
mysql> INSERT student (info) VALUES ('[2, 3, 4]');
Query OK, 1 row affected (0.08 sec)
mysql> INSERT student (info) VALUES ('[3, 1, -1]');
Query OK, 1 row affected (0.13 sec)
mysql> select * from student;
+----+--------------+
| id | info |
+----+--------------+
| 1 | [1, 2, 3, 4] |
| 2 | [2, 3, 4] |
| 3 | [3, 1, -1] |
+----+--------------+
3 rows in set (0.00 sec)
JOSN数组通过索引取对应的值,同样是使用JSON_EXTRACT,索引从0开始
mysql> SELECT
-> JSON_EXTRACT(info, '$[0]') first
-> FROM student;
+----------------------------+
| first |
+----------------------------+
| 1 |
| 2 |
| 3 |
+----------------------------+
同样可以采用->>符号,这种方式会去除双引号,如果JSON数组内的元素是双引号字符串,显示出来的时候也会被去除引号
mysql> SELECT
-> id,
-> info->>"$[0]" first,
-> info->>"$[1]" second
-> FROM student;
+----+-------+--------+
| id | first | second |
+----+-------+--------+
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 3 | 1 |
+----+-------+--------+
可以提取数组中的多个值,此时JSON_EXTRACT后面传入多个参数,提取的顺序和输出顺序一致
mysql> SELECT id, JSON_EXTRACT(info, "$[1]", "$[0]") a FROM student;
+----+--------+
| id | a |
+----+--------+
| 1 | [2, 1] |
| 2 | [5, 2] |
| 3 | [3, 1] |
+----+--------+
3 rows in set (0.01 sec)
如果提取的索引位置不存在则返回空
mysql> SELECT
-> id,
-> info->>"$[2]" a
-> FROM student;
+----+------+
| id | a |
+----+------+
| 1 | 3 |
| 2 | 6 |
| 3 | NULL |
+----+------+
3 rows in set (0.00 sec)
(3)提取嵌套数组
嵌套数组主要是JSON数组内包含多个JSON对象或者JSON数组包含多个JSON数组,先以包含多个JSON对象为例
mysql> truncate table student;
Query OK, 0 rows affected (0.70 sec)
mysql> INSERT student (info) VALUES ('[{"no":1,"ent_name":"张家港市杨舍百桥士方园艺场","score":98.2}, {"no":2,"ent_name":"昆山市朱北苗圃有限公司","score":98.2}, {"no":3,"ent_name":"苏州市吴中区临湖现代渔业发展有限公司","score":98.2}]');
Query OK, 1 row affected (0.08 sec)
mysql> INSERT student (info) VALUES ('[{"no":1,"ent_name":"张家港市合力土石方挖掘服务部","score":98.1}, {"no":2,"ent_name":"中国石化销售有限公司江苏苏州养武加油站","score":97.6}, {"no":3,"ent_name":"中国石化销售有限公司江苏苏州太仓璜泾二站服务点","score":97.5}]');
Query OK, 1 row affected (0.09 sec)
mysql> select * from student;
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | info |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | [{"no": 1, "score": 98.2, "ent_name": "张家港市杨舍百桥士方园艺场"}, {"no": 2, "score": 98.2, "ent_name": "昆山市朱北苗圃有限公司"}, {"no": 3, "score": 98.2, "ent_name": "苏州市吴中区临湖现代渔业发展有限公司"}] |
| 2 | [{"no": 1, "score": 98.1, "ent_name": "张家港市合力土石方挖掘服务部"}, {"no": 2, "score": 97.6, "ent_name": "中国石化销售有限公司江苏苏州养武加油站"}, {"no": 3, "score": 97.5, "ent_name": "中国石化销售有限公司江苏苏州太仓璜泾二站服务点"}] |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
提取嵌套数组的值,首先要找到数组中对应的对象,然后根据对象的键找值
mysql> SELECT
-> id,
-> JSON_UNQUOTE(JSON_EXTRACT(info -> "$[0]", "$.ent_name")) first_ent_name
-> FROM student;
+----+--------------------------------------------+
| id | first_ent_name |
+----+--------------------------------------------+
| 1 | 张家港市杨舍百桥士方园艺场 |
| 2 | 张家港市合力土石方挖掘服务部 |
+----+--------------------------------------------+
2 rows in set (0.00 sec)
对比一下之前的写法则只能提取第一个数组对象,可见JSON_UNQUOTE对于内侧的引号不删除,只删除外侧的引号
mysql> SELECT
-> id,
-> JSON_UNQUOTE(JSON_EXTRACT(info, "$[0]")) first
-> FROM student;
+----+------------------------------------------------------------------------------------+
| id | first |
+----+------------------------------------------------------------------------------------+
| 1 | {"no": 1, "score": 98.2, "ent_name": "张家港市杨舍百桥士方园艺场"} |
| 2 | {"no": 1, "score": 98.1, "ent_name": "张家港市合力土石方挖掘服务部"} |
+----+------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
在看数组套数组的情况
mysql> INSERT student (info) VALUES ('[1, 2, [3, 4]]');
Query OK, 1 row affected (0.07 sec)
mysql> select * from student;
+----+----------------+
| id | info |
+----+----------------+
| 1 | [1, 2, [3, 4]] |
+----+----------------+
1 row in set (0.00 sec)
提取其中嵌套的数组,第一个表示嵌套数据的位置索引,第二个*表示嵌套数据内取所有元素
mysql> SELECT
-> id,
-> JSON_EXTRACT(info, "$[2][*]") a
-> FROM student;
+----+--------+
| id | a |
+----+--------+
| 1 | [3, 4] |
+----+--------+
1 row in set (0.00 sec)
如果第二个参数不是*,也可以选取嵌套数据内的指定位置的元素
SELECT
id,
JSON_EXTRACT(info, "$[2][1]") a
FROM student;
mysql> SELECT
-> id,
-> JSON_EXTRACT(info, "$[2][1]") a
-> FROM student;
+----+------+
| id | a |
+----+------+
| 1 | 4 |
+----+------+
1 row in set (0.00 sec)
(4)提取JSON后增加过滤 / 排序条件
提取JSON后不能用新命名的字段做筛选过滤,需要调用把JSON函数或者符号再写一遍
mysql> select * from student;
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)
筛选sex是F,age大于14的
mysql> SELECT
-> id,
-> info
-> FROM student WHERE info->>"$.age" > 14 and info->>"$.sex" = 'F';
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
1 row in set (0.01 sec)
根据age倒序排序取第一,只要city列
mysql> SELECT
-> id,
-> info->>"$.city"
-> FROM student WHERE info->>"$.age" > 14 and info->>"$.sex" = 'F';
+----+-----------------+
| id | info->>"$.city" |
+----+-----------------+
| 3 | shenzhen |
+----+-----------------+
1 row in set (0.00 sec)
(4)JSON类型和字符串的区别
除了JSON类型支持索引之外,看一下脚本语言对于JSON类型取值和字符串是否有区别,使用pymysql连接测试
>>> import pymysql
>>> config = {"user": "root", "password": "123456", "database": "test", "host": "127.0.0.1", "port": 3306}
>>> conn = pymysql.connect(**config)
>>> cursor = conn.cursor()
>>> cursor.execute("select info from student where id = 1")
1
>>> res = cursor.fetchall() # (('{"age": 13, "sex": "F", "city": "beijing"}',),)
>>> cursor.close()
>>> conn.close()
>>> json.loads(res[0][0])
{'age': 13, 'sex': 'F', 'city': 'beijing'}
可见结果是一个JSON格式的字符串,可以直接解析成JSON,所以脚本语言取出的JSON类型结果和字符串没有差别,就是JSON格式的字符串,另外指定JSON格式后,MySQL会对插入的字符串做检验,如果不符合JSON格式插入报错,这也是和传统Varchar或者TEXT的区别
- JSON格式相比于Varchar,TEXT支持索引
- JSON格式会对插入的字符串做JSON格式校验,不符合则报错
- JSON格式的输入输出都是字符串,如果使用Varchar或者TEXT格式人工保证字符串为JSON格式,效果是一致的
(5)JSON类型数据修改
修改数据主要是JSON_SET
,JSON_INSERT
,JSON_REPLACE
三个方法
- JSON_SET:替换现有key的值,插入不存在的key的值
- JSON_INSERT:插入不存在的key的值,已经存在的不修改
- JSON_REPLACE:只替换已存在的key的值,不存在的不做插入
使用的时候结合update语句
mysql> select * from student;
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.01 sec)
JSON_SET,不存在则插入,有则替换
mysql> UPDATE student SET info = JSON_SET(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
Query OK, 1 row affected (0.87 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from student;
+----+--------------------------------------------------------+
| id | info |
+----+--------------------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "wuxi", "height": 123} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+--------------------------------------------------------+
JSON_INSERT,只会插入不存在的值
mysql> UPDATE student SET info = JSON_INSERT(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
Query OK, 1 row affected (0.11 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from student;
+----+-----------------------------------------------------------+
| id | info |
+----+-----------------------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing", "height": 123} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+-----------------------------------------------------------+
3 rows in set (0.00 sec)
JSON_REPLACE,只会替换已有值
mysql> UPDATE student SET info = JSON_REPLACE(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
Query OK, 1 row affected (0.06 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from student;
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "wuxi"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)
删除使用JSON_REMOVE
,在JSON对象中指定key删除
mysql> select * from student;
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "wuxi"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)
mysql> UPDATE student SET info = JSON_REMOVE(info, "$.age") WHERE id = 1;
Query OK, 1 row affected (0.45 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from student;
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"sex": "F", "city": "wuxi"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)
在JSON数组中指定下标删除
mysql> select * from student;
+----+----------------+
| id | info |
+----+----------------+
| 1 | [1, 2, [3, 4]] |
| 2 | [2, 5, 6] |
| 3 | [1, 3] |
+----+----------------+
3 rows in set (0.00 sec)
mysql> UPDATE student SET info = JSON_REMOVE(info, "$[1]") WHERE id=2;
Query OK, 1 row affected (0.72 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from student;
+----+----------------+
| id | info |
+----+----------------+
| 1 | [1, 2, [3, 4]] |
| 2 | [2, 6] |
| 3 | [1, 3] |
+----+----------------+
3 rows in set (0.00 sec)
(6)JSON类型使用索引
当 JSON 数据量非常大,用户希望对 JSON 数据进行有效检索时,可以利用 MySQL 的函数索引功能对 JSON 中的某个字段进行索引,具体方式是先创建一个虚拟列,再对虚拟列创建索引
先看一下没有索引下,对JSON对象中某个key做条件检索的EXPLAIN计划,可见访问方式type是ALL全表扫面,使用的索引Key是NULL
mysql> select * from student;
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)
mysql> explain select * from student where info->>"$.age" = 13;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | student | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)
创建虚拟列age,并且添加普通索引,可见表新增了一列(虽然是虚拟列,但是在SELECT和脚本语言取数都能取到新增的虚拟列),重新EXPLAIN之后Type改为ref代表普通索引,使用的Key是idx_age
mysql> ALTER TABLE student ADD COLUMN age INT as (info->>"$.age");
Query OK, 0 rows affected (0.54 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> select * from student;
+----+---------------------------------------------+------+
| id | info | age |
+----+---------------------------------------------+------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} | 13 |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} | 14 |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | 23 |
+----+---------------------------------------------+------+
3 rows in set (0.00 sec)
mysql> ALTER TABLE student ADD INDEX idx_age(age);
Query OK, 0 rows affected (0.51 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain select * from student where info->>"$.age" = 13;
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | student | NULL | ref | idx_age | idx_age | 5 | const | 1 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
除了普通索引,同理还可以为JSON对象的其他Key值创建唯一索引等
除了再建表之后创建虚拟列增加索引,也可以在建表的时候就为JSON类型创建索引
mysql> CREATE TABLE student (
-> id BIGINT AUTO_INCREMENT PRIMARY KEY,
-> info JSON DEFAULT NULL,
-> age Int as (info->>"$.age"),
-> index idx_age(age)
-> );
(7)JSON类型其他常用函数
除了上面的JSON_EXTRACT和JSON_UNQUOTE,常用函数如下
- MEMBER OF:只能对JSON数组使用,返回1元素存在数组中,0元素不存在数组中
- JSON_CONTAINS:可以对JSON数组和JSON对象使用,针对JSON数组检查一个元素或者多个元素是否存在,对于JSON对象检查指定路径下是否有某个值或者是否有某个路径(Key)
- JSON_OVERLAP:比较两个JSON数组是否至少有一个元素一致,如果是返回1,否则返回0,如果是JSON对象,判断是否是找有一对key value一致
- JSON_KEYS:返回JSON对象的Key,也可以是嵌套JSON对象
以上函数可以在前面加上NOT关键字就可以取反
MEMBER OF是MySQL 8.0.17新增的函数,查看元素或者子数组是否存在在JSON数组中,如果把MEMBER OF放在where条件后面则直接筛选结果是1的,可以省略写=1
mysql> select * from student;
+----+----------------+
| id | info |
+----+----------------+
| 1 | [1, 2, [3, 4]] |
| 2 | [2, 5, 6] |
| 3 | [1, 3] |
+----+----------------+
3 rows in set (0.01 sec)
mysql> SELECT * FROM student WHERE 3 MEMBER OF(info);
+----+--------+
| id | info |
+----+--------+
| 3 | [1, 3] |
+----+--------+
1 row in set (0.00 sec)
mysql> SELECT * FROM student WHERE JSON_ARRAY(3, 4) MEMBER OF(info);
+----+----------------+
| id | info |
+----+----------------+
| 1 | [1, 2, [3, 4]] |
+----+----------------+
1 row in set (0.00 sec)
JSON_CONTAINS和MEMBER OF类似,但是JSON_CONTAINS可以作用与JSON对象,对于JSON数组两者也有区别,JSON_CONTAINS可以指定多个数组内的元素,相当于是且的关系,JSON_CONTAINS放在where后面默认筛选结果值是1的
mysql> select * from student;
+----+----------------+
| id | info |
+----+----------------+
| 1 | [1, 2, [3, 4]] |
| 2 | [2, 5, 6] |
| 3 | [1, 3] |
+----+----------------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[2, 6]');
+----+-----------+
| id | info |
+----+-----------+
| 2 | [2, 5, 6] |
+----+-----------+
1 row in set (0.00 sec)
mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[2]');
+----+----------------+
| id | info |
+----+----------------+
| 1 | [1, 2, [3, 4]] |
| 2 | [2, 5, 6] |
+----+----------------+
mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[3]');
+----+----------------+
| id | info |
+----+----------------+
| 1 | [1, 2, [3, 4]] |
| 3 | [1, 3] |
+----+----------------+
当JSON_CONTAINS作用与JSON对象时,需要判断某个key-value对是否存,格式是(target,value,key),在指定value的时候,数值用单引号,字符串还要在内加一层双引号
mysql> select * from student;
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} |
| 2 | {"age": 14, "sex": "M", "city": "suzhou"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '"F"', '$.sex');
+----+---------------------------------------------+
| id | info |
+----+---------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} |
| 3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '13', '$.age');
+----+--------------------------------------------+
| id | info |
+----+--------------------------------------------+
| 1 | {"age": 13, "sex": "F", "city": "beijing"} |
+----+--------------------------------------------+
JSON_OVERLAP返回两个JSON数组或者JSON对象至少有一个/对元素一致则是1,房子啊where后面筛选出存在至少一个交集的数据
mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[1, 5]');
+----+--------------+
| id | info |
+----+--------------+
| 1 | [1, 2, 5] |
| 2 | [2, 5, 6] |
| 3 | [1, 3, 8] |
| 4 | [1, 2, 7, 8] |
+----+--------------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[1, 5, 6]');
+----+--------------+
| id | info |
+----+--------------+
| 1 | [1, 2, 5] |
| 2 | [2, 5, 6] |
| 3 | [1, 3, 8] |
| 4 | [1, 2, 7, 8] |
+----+--------------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[5]');
+----+-----------+
| id | info |
+----+-----------+
| 1 | [1, 2, 5] |
| 2 | [2, 5, 6] |
+----+-----------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[9]');
Empty set (0.00 sec)
(8)JSON ARRAY的多值索引
从MySQL8.0.17开始,InnoDB支持多值索引。多值索引是在存储JSON数组的列上定义的辅助索引,对于JSON ARRAY的MEMBER OF,JSON_CONTAINS,JSON_OVERLAPS可以利用多值索引进行性能优化
mysql> select * from student;
+----+--------------+
| id | info |
+----+--------------+
| 1 | [1, 2, 5, 9] |
| 2 | [2, 5, 6, 8] |
| 3 | [5, 3, 8, 9] |
| 4 | [1, 2, 7, 8] |
+----+--------------+
4 rows in set (0.00 sec)
先不创建多值索引,使用JSON_CONTAINS语句EXPLAIN查看执行计划
mysql> EXPLAIN SELECT * FROM student WHERE JSON_CONTAINS(info, '[5, 3]');
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | student | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Type为全表扫描,key为NULL,下一步给表增加多值索引,注意如果这个地方改为idx_info((cast((info->"$") as unsigned array))),则后续所有的函数都要是info->"$",否则走不了索引
mysql> ALTER TABLE student ADD INDEX idx_info((cast(info as unsigned array)));
Query OK, 0 rows affected (0.57 sec)
Records: 0 Duplicates: 0 Warnings: 0
查看建立多值索引之后的EXPLAIN计划,可见现在JSON操作函数都走了索引
mysql> EXPLAIN SELECT * FROM student WHERE 5 MEMBER OF(info);
+----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | student | NULL | ref | idx_info | idx_info | 9 | const | 1 | 100.00 | Using where |
+----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM student WHERE JSON_CONTAINS(info, '[5, 3]');
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | student | NULL | range | idx_info | idx_info | 9 | NULL | 4 | 100.00 | Using where |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM student WHERE JSON_OVERLAPS(info, '[5, 3]');
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | student | NULL | range | idx_info | idx_info | 9 | NULL | 4 | 100.00 | Using where |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
(9)基于JSON类型的用户画像设计
下面基于30万级别的数据,以MySQL的JSON做用户画像的存储和查询测试,画像值已经全部处理成枚举值,不加入数值型的字段,第一步梳理用户标签,定义一张标签表,记录标签值和标签号
mysql> CREATE TABLE tags_info (
-> tag_id bigint auto_increment primary key,
-> tag_name varchar(255) not null,
-> tag_value varchar(255) not null
-> );
Query OK, 0 rows affected (0.80 sec)
插入所有的标签,预览表如下
mysql> select * from tags_info order by rand() limit 10;
+--------+--------------+--------------------+
| tag_id | tag_name | tag_value |
+--------+--------------+--------------------+
| 24 | 渠道名称 | 自助收银 |
| 38 | 类目范围 | 类目多样 |
| 3 | 生日月份 | 1 |
| 35 | RFM | 重要保持会员 |
| 11 | 生日月份 | 9 |
| 44 | 时间偏好 | 常客 |
| 16 | 会员等级 | 员工卡 |
| 27 | 会员状态 | 沉默 |
| 25 | 渠道名称 | 闪电购 |
| 43 | 时间偏好 | 周末客 |
+--------+--------------+--------------------+
下一步构建用户画像表,标签值替换为标签ID,将一个用户的所有标签值存储为JSON ARRAY,先构建一张用户画像结果表
mysql> CREATE TABLE user_tag (
-> user_id bigint not null primary key,
-> user_tags JSON
-> );
Query OK, 0 rows affected (0.41 sec)
插入数据查看预览结果
mysql> select * from user_tag order by rand() limit 10;
+---------+--------------------------------------+
| user_id | user_tags |
+---------+--------------------------------------+
| 36978 | [14, 19, 23, 25, 32, 42] |
| 28515 | [14, 19, 23, 27, 36, 42] |
| 28683 | [14, 19, 28] |
| 39368 | [14, 19, 24, 26, 36, 43] |
| 22269 | [14, 19, 28] |
| 42160 | [14, 19, 23, 26, 32, 43] |
| 22321 | [14, 17, 27, 33, 40, 43, 44, 47, 49] |
| 20407 | [14, 19, 20, 25, 34, 38, 43] |
| 26167 | [14, 17, 23, 27, 35, 38, 43] |
| 38082 | [14, 19, 25, 32, 42] |
+---------+--------------------------------------+
10 rows in set (0.00 sec)
下一步对用户画像进行标签筛选指定的人群做营销,实际上转化为SQL就是多个条件的与或非组合,看几个案例
(1)取用户等级是白银卡(17)或者银卡(20),且用户状态是活跃的(28),逻辑是(17∪20)∩28,33万里面筛选出265个人
mysql> SELECT * FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[18, 17, 20]') AND JSON_CONTAINS(user_tags, '[28]') limit 5;
+---------+------------------+
| user_id | user_tags |
+---------+------------------+
| 3201 | [14, 19, 20, 28] |
| 4183 | [14, 19, 20, 28] |
| 4554 | [14, 19, 20, 28] |
| 4890 | [14, 19, 20, 28] |
| 6334 | [14, 17, 28] |
+---------+------------------+
mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[18, 17, 20]') AND JSON_CONTAINS(user_tags, '[28]');
+----------+
| count(*) |
+----------+
| 265 |
+----------+
1 row in set (0.66 sec)
(2)筛选母婴客群(47)且是周末客(43),但是过滤掉流失会员(29),逻辑是47∩43∩(not 29),最终结果筛选出11678人
mysql> SELECT * FROM user_tag WHERE JSON_CONTAINS(user_tags, '[43, 47]') AND NOT JSON_CONTAINS(user_tags, '[29]') limit 10;
+---------+----------------------------------------------+
| user_id | user_tags |
+---------+----------------------------------------------+
| 3036 | [14, 17, 24, 27, 33, 37, 40, 43, 44, 47, 48] |
| 3049 | [14, 17, 27, 33, 38, 40, 43, 44, 47, 48] |
| 3072 | [14, 19, 20, 27, 33, 37, 40, 43, 44, 47, 48] |
| 3099 | [14, 19, 21, 27, 33, 37, 40, 43, 44, 47] |
| 3110 | [14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] |
| 3127 | [14, 19, 20, 27, 33, 37, 40, 43, 44, 46, 47] |
| 3132 | [14, 19, 27, 33, 40, 41, 43, 44, 47] |
| 3147 | [14, 17, 27, 34, 40, 43, 46, 47, 48] |
| 3157 | [14, 19, 23, 26, 34, 43, 47, 48] |
| 3160 | [14, 19, 23, 27, 34, 40, 43, 44, 47, 48] |
+---------+----------------------------------------------+
10 rows in set (0.00 sec)
mysql> SELECT count(*) FROM user_tag WHERE JSON_CONTAINS(user_tags, '[43, 47]') AND NOT JSON_CONTAINS(user_tags, '[29]');
+----------+
| count(*) |
+----------+
| 11678 |
+----------+
1 row in set (0.55 sec)
(3)筛选RFM为重要开头的高价值客户(34,35,36,37),且是微信会员(15),逻辑是(34∪35∪36∪37)∩15,最终筛选出148人
mysql> SELECT * FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]') limit 10;
+---------+--------------------------------------------------------+
| user_id | user_tags |
+---------+--------------------------------------------------------+
| 3110 | [14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] |
| 6990 | [14, 15, 23, 27, 34, 37, 40, 43, 44, 47, 48, 49] |
| 11214 | [14, 15, 24, 27, 33, 37, 40, 43, 44, 47] |
| 13447 | [0, 2, 14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] |
| 18356 | [14, 15, 25, 34, 38, 43] |
| 22016 | [1, 2, 14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48, 49] |
| 22392 | [14, 15, 23, 27, 33, 37, 40, 41, 43, 44, 48, 49] |
| 22721 | [1, 7, 14, 15, 27, 33, 37, 40, 42, 44, 48] |
| 22800 | [14, 15, 23, 27, 33, 37, 40, 43, 44, 48] |
| 25122 | [14, 15, 26, 34, 37, 40, 43, 47, 48, 49] |
+---------+--------------------------------------------------------+
10 rows in set (0.06 sec)
mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]');
+----------+
| count(*) |
+----------+
| 148 |
+----------+
1 row in set (0.75 sec)
给这张用户画像表增加一下多值索引
ALTER TABLE user_tag ADD INDEX idx_info((cast(user_tags as unsigned array)));
结果是SQL不能得到正确结果,之前能检索到人群现在检索结果为符合条件的人为0,可能是在多值索引的情况下,不能组合多个JSON函数的原因
mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]');
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
另外在多次测试下,就算不使用JSON函数组合,单个使用JSON_CONTAINS,JSON_OVERLAPS也可能会导致索引失效,具体原因不明,如果使用NOT条件多值索引直接失效
网友评论