美文网首页数据库大数据学习
MySQL:MySQL8.0 JSON类型使用整理,基于用户画像

MySQL:MySQL8.0 JSON类型使用整理,基于用户画像

作者: xiaogp | 来源:发表于2021-06-01 11:07 被阅读0次

摘要:MySQLJSON类型多值索引用户画像

MySQL是结构化数据存储,JSON是非结构化格式,在MySQL中使用JSON类型可以打通关系型和非关系型数据的存储之间的界限,为业务提供更好的架构选择,以下内容包括

  • (1)JSON 数据类型
  • (2)JSON类型创建插入数据
  • (3)提取JSON内字段
  • (4)JSON类型和字符串的区别
  • (5)JSON类型数据修改
  • (6)JSON类型使用索引
  • (7)JSON类型其他常用函数
  • (8)JSON ARRAY的多值索引
  • (9)基于JSON类型的用户画像设计

(1)JSON 数据类型

JSON(JavaScript Object Notation)主要用于互联网应用服务之间的数据交换。MySQL 支持JSON 对象JSON 数组两种类型,JSON 类型是从 MySQL 5.7 版本开始支持的功能,MySQL中使用JSON有以下好处

  • 无须预定义字段:字段可以无限拓展,避免了ALTER ADD COLUMN的操作,使用更加灵活
  • 处理稀疏字段:避免了稀疏字段的NULL值,避免冗余存储
  • 支持索引:相比于字符串格式的JSON,JSON类型支持索引做特定的查询优化

总体而言,JSON 类型比较适合存储一些修改较少、相对静态的数据,或者说适合存储修改较少,且容忍稀疏的聚合数据,比如存储用户的登录信息


(2)JSON类型创建插入数据

create database test default charset utf8mb4;

CREATE TABLE student (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  info JSON DEFAULT NULL
);

插入数据

mysql> INSERT student (info) VALUES ('{"sex": "F", "age": 13, "city": "beijing"}');
Query OK, 1 row affected (0.13 sec)

mysql> INSERT student (info) VALUES ('{"sex": "M", "age": 14, "city": "suzhou"}');
Query OK, 1 row affected (0.10 sec)

mysql> INSERT student (info) VALUES ('{"sex": "F", "age": 23, "city": "shenzhen"}');
Query OK, 1 row affected (0.20 sec)

查看数据

mysql> select * from student;
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"}  |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+

试一下插入JSON数组

mysql> INSERT student (info) VALUES ('[1,2,3,4]');
Query OK, 1 row affected (0.12 sec)

mysql> INSERT student (info) VALUES ('[{"sex": "M"},{"sex":"F", "city":"nanjing"}]');
Query OK, 1 row affected (0.11 sec)

mysql> select * from student;
+----+-------------------------------------------------+
| id | info                                            |
+----+-------------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"}      |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}       |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"}     |
|  4 | [1, 2, 3, 4]                                    |
|  5 | [{"sex": "M"}, {"sex": "F", "city": "nanjing"}] |
+----+-------------------------------------------------+
5 rows in set (0.00 sec)
  • 可以混合插入JSON数组和JSON对象,但是必须要符合JSON格式
  • 其中整个字符串使用单引号,键值对使用双引号,
  • JSON中可以允许有多余空格,MySQL会自动解析,输出的使用格式为符号后带有一个空格

(3)提取JSON内字段

因为支持了新的JSON类型,MySQL 配套提供了丰富的 JSON 字段处理函数,用于方便地操作 JSON 数据,最常见的就是函数 JSON_EXTRACT,它用来从 JSON 数据中提取所需要的字段内容

(1)提取JSON对象

主要是JSON_UNQUOTE和JSON_EXTRACT,JSON_EXTRACT作用是去除最外侧的双引号,JSON_EXTRACT根据键提取值

mysql> SELECT
    ->     id,
    ->     JSON_UNQUOTE(JSON_EXTRACT(info,"$.sex")) sex,
    ->     JSON_UNQUOTE(JSON_EXTRACT(info,"$.age")) age,
    ->     JSON_UNQUOTE(JSON_EXTRACT(info,"$.city")) city
    -> FROM student;
+----+------+------+----------+
| id | sex  | age  | city     |
+----+------+------+----------+
|  1 | F    | 13   | beijing  |
|  2 | M    | 14   | suzhou   |
|  3 | F    | 23   | shenzhen |
+----+------+------+----------+
3 rows in set (0.00 sec)

MySQL 还提供了 ->> 表达式,和上述 SQL 效果完全一样,->>也是去除最外面的引号,还有一种符号->,它也能得到提取结果但是不去除外面的符号,相当于->和JSON_EXTRACT对应

mysql> SELECT
    ->     id,
    ->     info->>"$.sex" sex,
    ->     info->>"$.age" age,
    ->     info->>"$.city" city
    -> FROM student;
+----+------+------+----------+
| id | sex  | age  | city     |
+----+------+------+----------+
|  1 | F    | 13   | beijing  |
|  2 | M    | 14   | suzhou   |
|  3 | F    | 23   | shenzhen |
+----+------+------+----------+
3 rows in set (0.00 sec)

如果JSON对象中查询的键不存在,则返回为NULL

mysql> SELECT
    ->     id,
    ->     JSON_UNQUOTE(JSON_EXTRACT(info,"$.sex")) sex,
    ->     JSON_UNQUOTE(JSON_EXTRACT(info,"$.height")) height
    -> FROM student;
+----+------+--------+
| id | sex  | height |
+----+------+--------+
|  1 | F    | NULL   |
|  2 | M    | NULL   |
|  3 | F    | NULL   |
+----+------+--------+
3 rows in set (0.00 sec)
(2)提取JSON数组

先创建JSON数组类型,插入数据

mysql> CREATE TABLE student (
    ->   id BIGINT AUTO_INCREMENT PRIMARY KEY,
    ->   info JSON DEFAULT NULL
    -> );
Query OK, 0 rows affected (0.42 sec)

mysql> INSERT student (info) VALUES ('[1, 2, 3, 4]');
Query OK, 1 row affected (0.09 sec)

mysql> INSERT student (info) VALUES ('[2, 3, 4]');
Query OK, 1 row affected (0.08 sec)

mysql> INSERT student (info) VALUES ('[3, 1, -1]');
Query OK, 1 row affected (0.13 sec)

mysql> select * from student;
+----+--------------+
| id | info         |
+----+--------------+
|  1 | [1, 2, 3, 4] |
|  2 | [2, 3, 4]    |
|  3 | [3, 1, -1]   |
+----+--------------+
3 rows in set (0.00 sec)

JOSN数组通过索引取对应的值,同样是使用JSON_EXTRACT,索引从0开始

mysql> SELECT 
    ->     JSON_EXTRACT(info, '$[0]') first
    -> FROM student;
+----------------------------+
| first                      |
+----------------------------+
| 1                          |
| 2                          |
| 3                          |
+----------------------------+

同样可以采用->>符号,这种方式会去除双引号,如果JSON数组内的元素是双引号字符串,显示出来的时候也会被去除引号

mysql> SELECT
    ->     id,
    ->     info->>"$[0]" first,
    ->     info->>"$[1]" second
    -> FROM student;
+----+-------+--------+
| id | first | second |
+----+-------+--------+
|  1 | 1     | 2      |
|  2 | 2     | 3      |
|  3 | 3     | 1      |
+----+-------+--------+

可以提取数组中的多个值,此时JSON_EXTRACT后面传入多个参数,提取的顺序和输出顺序一致

mysql> SELECT id, JSON_EXTRACT(info, "$[1]", "$[0]") a FROM student;
+----+--------+
| id | a      |
+----+--------+
|  1 | [2, 1] |
|  2 | [5, 2] |
|  3 | [3, 1] |
+----+--------+
3 rows in set (0.01 sec)

如果提取的索引位置不存在则返回空

mysql> SELECT 
    ->     id,
    ->     info->>"$[2]" a
    -> FROM student;
+----+------+
| id | a    |
+----+------+
|  1 | 3    |
|  2 | 6    |
|  3 | NULL |
+----+------+
3 rows in set (0.00 sec)
(3)提取嵌套数组

嵌套数组主要是JSON数组内包含多个JSON对象或者JSON数组包含多个JSON数组,先以包含多个JSON对象为例

mysql> truncate table student;
Query OK, 0 rows affected (0.70 sec)

mysql> INSERT student (info) VALUES ('[{"no":1,"ent_name":"张家港市杨舍百桥士方园艺场","score":98.2}, {"no":2,"ent_name":"昆山市朱北苗圃有限公司","score":98.2}, {"no":3,"ent_name":"苏州市吴中区临湖现代渔业发展有限公司","score":98.2}]');
Query OK, 1 row affected (0.08 sec)

mysql> INSERT student (info) VALUES ('[{"no":1,"ent_name":"张家港市合力土石方挖掘服务部","score":98.1}, {"no":2,"ent_name":"中国石化销售有限公司江苏苏州养武加油站","score":97.6}, {"no":3,"ent_name":"中国石化销售有限公司江苏苏州太仓璜泾二站服务点","score":97.5}]');
Query OK, 1 row affected (0.09 sec)
mysql> select * from student;
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | info                                                                                                                                                                                                                                                                                                   |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | [{"no": 1, "score": 98.2, "ent_name": "张家港市杨舍百桥士方园艺场"}, {"no": 2, "score": 98.2, "ent_name": "昆山市朱北苗圃有限公司"}, {"no": 3, "score": 98.2, "ent_name": "苏州市吴中区临湖现代渔业发展有限公司"}]                                                                                     |
|  2 | [{"no": 1, "score": 98.1, "ent_name": "张家港市合力土石方挖掘服务部"}, {"no": 2, "score": 97.6, "ent_name": "中国石化销售有限公司江苏苏州养武加油站"}, {"no": 3, "score": 97.5, "ent_name": "中国石化销售有限公司江苏苏州太仓璜泾二站服务点"}]                                                         |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

提取嵌套数组的值,首先要找到数组中对应的对象,然后根据对象的键找值

mysql> SELECT 
    ->     id,
    ->     JSON_UNQUOTE(JSON_EXTRACT(info -> "$[0]", "$.ent_name")) first_ent_name
    -> FROM student;
+----+--------------------------------------------+
| id | first_ent_name                             |
+----+--------------------------------------------+
|  1 | 张家港市杨舍百桥士方园艺场                 |
|  2 | 张家港市合力土石方挖掘服务部               |
+----+--------------------------------------------+
2 rows in set (0.00 sec)

对比一下之前的写法则只能提取第一个数组对象,可见JSON_UNQUOTE对于内侧的引号不删除,只删除外侧的引号

mysql> SELECT 
    ->     id,
    ->     JSON_UNQUOTE(JSON_EXTRACT(info, "$[0]")) first
    -> FROM student;
+----+------------------------------------------------------------------------------------+
| id | first                                                                              |
+----+------------------------------------------------------------------------------------+
|  1 | {"no": 1, "score": 98.2, "ent_name": "张家港市杨舍百桥士方园艺场"}                 |
|  2 | {"no": 1, "score": 98.1, "ent_name": "张家港市合力土石方挖掘服务部"}               |
+----+------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

在看数组套数组的情况

mysql> INSERT student (info) VALUES ('[1, 2, [3, 4]]');
Query OK, 1 row affected (0.07 sec)

mysql> select * from student;
+----+----------------+
| id | info           |
+----+----------------+
|  1 | [1, 2, [3, 4]] |
+----+----------------+
1 row in set (0.00 sec)

提取其中嵌套的数组,第一个表示嵌套数据的位置索引,第二个*表示嵌套数据内取所有元素

mysql> SELECT 
    ->     id, 
    ->     JSON_EXTRACT(info, "$[2][*]") a
    -> FROM student;
+----+--------+
| id | a      |
+----+--------+
|  1 | [3, 4] |
+----+--------+
1 row in set (0.00 sec)

如果第二个参数不是*,也可以选取嵌套数据内的指定位置的元素

SELECT 
    id, 
    JSON_EXTRACT(info, "$[2][1]") a
FROM student;
mysql> SELECT 
    ->     id, 
    ->     JSON_EXTRACT(info, "$[2][1]") a
    -> FROM student;
+----+------+
| id | a    |
+----+------+
|  1 | 4    |
+----+------+
1 row in set (0.00 sec)
(4)提取JSON后增加过滤 / 排序条件

提取JSON后不能用新命名的字段做筛选过滤,需要调用把JSON函数或者符号再写一遍

mysql> select * from student;
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"}  |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)

筛选sex是F,age大于14的

mysql> SELECT 
    ->     id, 
    ->     info
    -> FROM student WHERE info->>"$.age" > 14 and info->>"$.sex" = 'F';
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
1 row in set (0.01 sec)

根据age倒序排序取第一,只要city列

mysql> SELECT 
    ->     id, 
    ->     info->>"$.city"
    -> FROM student WHERE info->>"$.age" > 14 and info->>"$.sex" = 'F';
+----+-----------------+
| id | info->>"$.city" |
+----+-----------------+
|  3 | shenzhen        |
+----+-----------------+
1 row in set (0.00 sec)

(4)JSON类型和字符串的区别

除了JSON类型支持索引之外,看一下脚本语言对于JSON类型取值和字符串是否有区别,使用pymysql连接测试

>>> import pymysql
>>> config = {"user": "root", "password": "123456", "database": "test", "host": "127.0.0.1", "port": 3306}
>>> conn = pymysql.connect(**config)
>>> cursor = conn.cursor()
>>> cursor.execute("select info from student where id = 1")
1
>>> res = cursor.fetchall()  # (('{"age": 13, "sex": "F", "city": "beijing"}',),)
>>> cursor.close()
>>> conn.close()
>>> json.loads(res[0][0])
{'age': 13, 'sex': 'F', 'city': 'beijing'}

可见结果是一个JSON格式的字符串,可以直接解析成JSON,所以脚本语言取出的JSON类型结果和字符串没有差别,就是JSON格式的字符串,另外指定JSON格式后,MySQL会对插入的字符串做检验,如果不符合JSON格式插入报错,这也是和传统Varchar或者TEXT的区别

  • JSON格式相比于Varchar,TEXT支持索引
  • JSON格式会对插入的字符串做JSON格式校验,不符合则报错
  • JSON格式的输入输出都是字符串,如果使用Varchar或者TEXT格式人工保证字符串为JSON格式,效果是一致的

(5)JSON类型数据修改

修改数据主要是JSON_SETJSON_INSERTJSON_REPLACE三个方法

  • JSON_SET:替换现有key的值,插入不存在的key的值
  • JSON_INSERT:插入不存在的key的值,已经存在的不修改
  • JSON_REPLACE:只替换已存在的key的值,不存在的不做插入

使用的时候结合update语句

mysql> select * from student;
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"}  |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.01 sec)

JSON_SET,不存在则插入,有则替换

mysql> UPDATE student SET info = JSON_SET(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
Query OK, 1 row affected (0.87 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from student;
+----+--------------------------------------------------------+
| id | info                                                   |
+----+--------------------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "wuxi", "height": 123} |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}              |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"}            |
+----+--------------------------------------------------------+

JSON_INSERT,只会插入不存在的值

mysql> UPDATE student SET info = JSON_INSERT(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
Query OK, 1 row affected (0.11 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from student;
+----+-----------------------------------------------------------+
| id | info                                                      |
+----+-----------------------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing", "height": 123} |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}                 |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"}               |
+----+-----------------------------------------------------------+
3 rows in set (0.00 sec)

JSON_REPLACE,只会替换已有值

mysql> UPDATE student SET info = JSON_REPLACE(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1;
Query OK, 1 row affected (0.06 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from student;
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "wuxi"}     |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)

删除使用JSON_REMOVE,在JSON对象中指定key删除

mysql> select * from student;
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "wuxi"}     |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)

mysql> UPDATE student SET info = JSON_REMOVE(info, "$.age") WHERE id = 1;
Query OK, 1 row affected (0.45 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from student;
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"sex": "F", "city": "wuxi"}                |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)

在JSON数组中指定下标删除

mysql> select * from student;
+----+----------------+
| id | info           |
+----+----------------+
|  1 | [1, 2, [3, 4]] |
|  2 | [2, 5, 6]      |
|  3 | [1, 3]         |
+----+----------------+
3 rows in set (0.00 sec)

mysql> UPDATE student SET info = JSON_REMOVE(info, "$[1]") WHERE id=2;
Query OK, 1 row affected (0.72 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from student;
+----+----------------+
| id | info           |
+----+----------------+
|  1 | [1, 2, [3, 4]] |
|  2 | [2, 6]         |
|  3 | [1, 3]         |
+----+----------------+
3 rows in set (0.00 sec)

(6)JSON类型使用索引

当 JSON 数据量非常大,用户希望对 JSON 数据进行有效检索时,可以利用 MySQL 的函数索引功能对 JSON 中的某个字段进行索引,具体方式是先创建一个虚拟列,再对虚拟列创建索引
先看一下没有索引下,对JSON对象中某个key做条件检索的EXPLAIN计划,可见访问方式type是ALL全表扫面,使用的索引Key是NULL

mysql> select * from student;
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"}  |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+
3 rows in set (0.00 sec)
mysql> explain select * from student where info->>"$.age" = 13;
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |   100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.01 sec)

创建虚拟列age,并且添加普通索引,可见表新增了一列(虽然是虚拟列,但是在SELECT和脚本语言取数都能取到新增的虚拟列),重新EXPLAIN之后Type改为ref代表普通索引,使用的Key是idx_age

mysql> ALTER TABLE student ADD COLUMN age INT as (info->>"$.age");
Query OK, 0 rows affected (0.54 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from student;
+----+---------------------------------------------+------+
| id | info                                        | age  |
+----+---------------------------------------------+------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"}  |   13 |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |   14 |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |   23 |
+----+---------------------------------------------+------+
3 rows in set (0.00 sec)

mysql> ALTER TABLE student ADD INDEX idx_age(age);
Query OK, 0 rows affected (0.51 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select * from student where info->>"$.age" = 13;
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table   | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | student | NULL       | ref  | idx_age       | idx_age | 5       | const |    1 |   100.00 | NULL  |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

除了普通索引,同理还可以为JSON对象的其他Key值创建唯一索引等
除了再建表之后创建虚拟列增加索引,也可以在建表的时候就为JSON类型创建索引

mysql> CREATE TABLE student (
    ->   id BIGINT AUTO_INCREMENT PRIMARY KEY,
    ->   info JSON DEFAULT NULL, 
    ->   age Int as (info->>"$.age"),
    ->   index idx_age(age)
    -> );

(7)JSON类型其他常用函数

除了上面的JSON_EXTRACT和JSON_UNQUOTE,常用函数如下

  • MEMBER OF:只能对JSON数组使用,返回1元素存在数组中,0元素不存在数组中
  • JSON_CONTAINS:可以对JSON数组和JSON对象使用,针对JSON数组检查一个元素或者多个元素是否存在,对于JSON对象检查指定路径下是否有某个值或者是否有某个路径(Key)
  • JSON_OVERLAP:比较两个JSON数组是否至少有一个元素一致,如果是返回1,否则返回0,如果是JSON对象,判断是否是找有一对key value一致
  • JSON_KEYS:返回JSON对象的Key,也可以是嵌套JSON对象

以上函数可以在前面加上NOT关键字就可以取反

MEMBER OF是MySQL 8.0.17新增的函数,查看元素或者子数组是否存在在JSON数组中,如果把MEMBER OF放在where条件后面则直接筛选结果是1的,可以省略写=1

mysql> select * from student;
+----+----------------+
| id | info           |
+----+----------------+
|  1 | [1, 2, [3, 4]] |
|  2 | [2, 5, 6]      |
|  3 | [1, 3]         |
+----+----------------+
3 rows in set (0.01 sec)

mysql> SELECT * FROM student WHERE 3 MEMBER OF(info);
+----+--------+
| id | info   |
+----+--------+
|  3 | [1, 3] |
+----+--------+
1 row in set (0.00 sec)

mysql> SELECT * FROM student WHERE JSON_ARRAY(3, 4) MEMBER OF(info);
+----+----------------+
| id | info           |
+----+----------------+
|  1 | [1, 2, [3, 4]] |
+----+----------------+
1 row in set (0.00 sec)

JSON_CONTAINS和MEMBER OF类似,但是JSON_CONTAINS可以作用与JSON对象,对于JSON数组两者也有区别,JSON_CONTAINS可以指定多个数组内的元素,相当于是且的关系,JSON_CONTAINS放在where后面默认筛选结果值是1的

mysql> select * from student;
+----+----------------+
| id | info           |
+----+----------------+
|  1 | [1, 2, [3, 4]] |
|  2 | [2, 5, 6]      |
|  3 | [1, 3]         |
+----+----------------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[2, 6]');
+----+-----------+
| id | info      |
+----+-----------+
|  2 | [2, 5, 6] |
+----+-----------+
1 row in set (0.00 sec)

mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[2]');
+----+----------------+
| id | info           |
+----+----------------+
|  1 | [1, 2, [3, 4]] |
|  2 | [2, 5, 6]      |
+----+----------------+

mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[3]');
+----+----------------+
| id | info           |
+----+----------------+
|  1 | [1, 2, [3, 4]] |
|  3 | [1, 3]         |
+----+----------------+

当JSON_CONTAINS作用与JSON对象时,需要判断某个key-value对是否存,格式是(target,value,key),在指定value的时候,数值用单引号,字符串还要在内加一层双引号

mysql> select * from student;
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"}  |
|  2 | {"age": 14, "sex": "M", "city": "suzhou"}   |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+

mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '"F"', '$.sex');
+----+---------------------------------------------+
| id | info                                        |
+----+---------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"}  |
|  3 | {"age": 23, "sex": "F", "city": "shenzhen"} |
+----+---------------------------------------------+

mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '13', '$.age');
+----+--------------------------------------------+
| id | info                                       |
+----+--------------------------------------------+
|  1 | {"age": 13, "sex": "F", "city": "beijing"} |
+----+--------------------------------------------+

JSON_OVERLAP返回两个JSON数组或者JSON对象至少有一个/对元素一致则是1,房子啊where后面筛选出存在至少一个交集的数据

mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[1, 5]');
+----+--------------+
| id | info         |
+----+--------------+
|  1 | [1, 2, 5]    |
|  2 | [2, 5, 6]    |
|  3 | [1, 3, 8]    |
|  4 | [1, 2, 7, 8] |
+----+--------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[1, 5, 6]');
+----+--------------+
| id | info         |
+----+--------------+
|  1 | [1, 2, 5]    |
|  2 | [2, 5, 6]    |
|  3 | [1, 3, 8]    |
|  4 | [1, 2, 7, 8] |
+----+--------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[5]');
+----+-----------+
| id | info      |
+----+-----------+
|  1 | [1, 2, 5] |
|  2 | [2, 5, 6] |
+----+-----------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[9]');
Empty set (0.00 sec)

(8)JSON ARRAY的多值索引

从MySQL8.0.17开始,InnoDB支持多值索引。多值索引是在存储JSON数组的列上定义的辅助索引,对于JSON ARRAY的MEMBER OF,JSON_CONTAINS,JSON_OVERLAPS可以利用多值索引进行性能优化

mysql> select * from student;
+----+--------------+
| id | info         |
+----+--------------+
|  1 | [1, 2, 5, 9] |
|  2 | [2, 5, 6, 8] |
|  3 | [5, 3, 8, 9] |
|  4 | [1, 2, 7, 8] |
+----+--------------+
4 rows in set (0.00 sec)

先不创建多值索引,使用JSON_CONTAINS语句EXPLAIN查看执行计划

mysql> EXPLAIN SELECT * FROM student WHERE JSON_CONTAINS(info, '[5, 3]');
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |   100.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

Type为全表扫描,key为NULL,下一步给表增加多值索引,注意如果这个地方改为idx_info((cast((info->"$") as unsigned array))),则后续所有的函数都要是info->"$",否则走不了索引

mysql> ALTER TABLE student ADD INDEX idx_info((cast(info as unsigned array)));
Query OK, 0 rows affected (0.57 sec)
Records: 0  Duplicates: 0  Warnings: 0

查看建立多值索引之后的EXPLAIN计划,可见现在JSON操作函数都走了索引

mysql> EXPLAIN SELECT * FROM student WHERE 5 MEMBER OF(info);
+----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key      | key_len | ref   | rows | filtered | Extra       |
+----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | ref  | idx_info      | idx_info | 9       | const |    1 |   100.00 | Using where |
+----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM student WHERE JSON_CONTAINS(info, '[5, 3]');
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | range | idx_info      | idx_info | 9       | NULL |    4 |   100.00 | Using where |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT * FROM student WHERE JSON_OVERLAPS(info, '[5, 3]');
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
| id | select_type | table   | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra       |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | student | NULL       | range | idx_info      | idx_info | 9       | NULL |    4 |   100.00 | Using where |
+----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

(9)基于JSON类型的用户画像设计

下面基于30万级别的数据,以MySQL的JSON做用户画像的存储和查询测试,画像值已经全部处理成枚举值,不加入数值型的字段,第一步梳理用户标签,定义一张标签表,记录标签值和标签号

mysql> CREATE TABLE tags_info (
    ->     tag_id bigint auto_increment primary key,
    ->     tag_name varchar(255) not null,
    ->     tag_value varchar(255) not null
    -> );
Query OK, 0 rows affected (0.80 sec)

插入所有的标签,预览表如下

mysql> select * from tags_info order by rand() limit 10;
+--------+--------------+--------------------+
| tag_id | tag_name     | tag_value          |
+--------+--------------+--------------------+
|     24 | 渠道名称     | 自助收银           |
|     38 | 类目范围     | 类目多样           |
|      3 | 生日月份     | 1                  |
|     35 | RFM          | 重要保持会员       |
|     11 | 生日月份     | 9                  |
|     44 | 时间偏好     | 常客               |
|     16 | 会员等级     | 员工卡             |
|     27 | 会员状态     | 沉默               |
|     25 | 渠道名称     | 闪电购             |
|     43 | 时间偏好     | 周末客             |
+--------+--------------+--------------------+

下一步构建用户画像表,标签值替换为标签ID,将一个用户的所有标签值存储为JSON ARRAY,先构建一张用户画像结果表

mysql> CREATE TABLE user_tag (
    ->     user_id bigint not null primary key,
    ->     user_tags JSON
    -> );
Query OK, 0 rows affected (0.41 sec)

插入数据查看预览结果

mysql> select * from user_tag order by rand() limit 10;
+---------+--------------------------------------+
| user_id | user_tags                            |
+---------+--------------------------------------+
|   36978 | [14, 19, 23, 25, 32, 42]             |
|   28515 | [14, 19, 23, 27, 36, 42]             |
|   28683 | [14, 19, 28]                         |
|   39368 | [14, 19, 24, 26, 36, 43]             |
|   22269 | [14, 19, 28]                         |
|   42160 | [14, 19, 23, 26, 32, 43]             |
|   22321 | [14, 17, 27, 33, 40, 43, 44, 47, 49] |
|   20407 | [14, 19, 20, 25, 34, 38, 43]         |
|   26167 | [14, 17, 23, 27, 35, 38, 43]         |
|   38082 | [14, 19, 25, 32, 42]                 |
+---------+--------------------------------------+
10 rows in set (0.00 sec)

下一步对用户画像进行标签筛选指定的人群做营销,实际上转化为SQL就是多个条件的与或非组合,看几个案例
(1)取用户等级是白银卡(17)或者银卡(20),且用户状态是活跃的(28),逻辑是(17∪20)∩28,33万里面筛选出265个人

mysql> SELECT * FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[18, 17, 20]') AND JSON_CONTAINS(user_tags, '[28]') limit 5;
+---------+------------------+
| user_id | user_tags        |
+---------+------------------+
|    3201 | [14, 19, 20, 28] |
|    4183 | [14, 19, 20, 28] |
|    4554 | [14, 19, 20, 28] |
|    4890 | [14, 19, 20, 28] |
|    6334 | [14, 17, 28]     |
+---------+------------------+

mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[18, 17, 20]') AND JSON_CONTAINS(user_tags, '[28]');
+----------+
| count(*) |
+----------+
|      265 |
+----------+
1 row in set (0.66 sec)

(2)筛选母婴客群(47)且是周末客(43),但是过滤掉流失会员(29),逻辑是47∩43∩(not 29),最终结果筛选出11678人

mysql> SELECT * FROM user_tag WHERE JSON_CONTAINS(user_tags, '[43, 47]') AND NOT JSON_CONTAINS(user_tags, '[29]') limit 10;
+---------+----------------------------------------------+
| user_id | user_tags                                    |
+---------+----------------------------------------------+
|    3036 | [14, 17, 24, 27, 33, 37, 40, 43, 44, 47, 48] |
|    3049 | [14, 17, 27, 33, 38, 40, 43, 44, 47, 48]     |
|    3072 | [14, 19, 20, 27, 33, 37, 40, 43, 44, 47, 48] |
|    3099 | [14, 19, 21, 27, 33, 37, 40, 43, 44, 47]     |
|    3110 | [14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] |
|    3127 | [14, 19, 20, 27, 33, 37, 40, 43, 44, 46, 47] |
|    3132 | [14, 19, 27, 33, 40, 41, 43, 44, 47]         |
|    3147 | [14, 17, 27, 34, 40, 43, 46, 47, 48]         |
|    3157 | [14, 19, 23, 26, 34, 43, 47, 48]             |
|    3160 | [14, 19, 23, 27, 34, 40, 43, 44, 47, 48]     |
+---------+----------------------------------------------+
10 rows in set (0.00 sec)

mysql> SELECT count(*) FROM user_tag WHERE JSON_CONTAINS(user_tags, '[43, 47]') AND NOT JSON_CONTAINS(user_tags, '[29]');
+----------+
| count(*) |
+----------+
|    11678 |
+----------+
1 row in set (0.55 sec)

(3)筛选RFM为重要开头的高价值客户(34,35,36,37),且是微信会员(15),逻辑是(34∪35∪36∪37)∩15,最终筛选出148人

mysql> SELECT * FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]') limit 10;
+---------+--------------------------------------------------------+
| user_id | user_tags                                              |
+---------+--------------------------------------------------------+
|    3110 | [14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48]           |
|    6990 | [14, 15, 23, 27, 34, 37, 40, 43, 44, 47, 48, 49]       |
|   11214 | [14, 15, 24, 27, 33, 37, 40, 43, 44, 47]               |
|   13447 | [0, 2, 14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48]     |
|   18356 | [14, 15, 25, 34, 38, 43]                               |
|   22016 | [1, 2, 14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48, 49] |
|   22392 | [14, 15, 23, 27, 33, 37, 40, 41, 43, 44, 48, 49]       |
|   22721 | [1, 7, 14, 15, 27, 33, 37, 40, 42, 44, 48]             |
|   22800 | [14, 15, 23, 27, 33, 37, 40, 43, 44, 48]               |
|   25122 | [14, 15, 26, 34, 37, 40, 43, 47, 48, 49]               |
+---------+--------------------------------------------------------+
10 rows in set (0.06 sec)

mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]');
+----------+
| count(*) |
+----------+
|      148 |
+----------+
1 row in set (0.75 sec)

给这张用户画像表增加一下多值索引

ALTER TABLE user_tag ADD INDEX idx_info((cast(user_tags as unsigned array)));

结果是SQL不能得到正确结果,之前能检索到人群现在检索结果为符合条件的人为0,可能是在多值索引的情况下,不能组合多个JSON函数的原因

mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]');
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

另外在多次测试下,就算不使用JSON函数组合,单个使用JSON_CONTAINS,JSON_OVERLAPS也可能会导致索引失效,具体原因不明,如果使用NOT条件多值索引直接失效

相关文章

网友评论

    本文标题:MySQL:MySQL8.0 JSON类型使用整理,基于用户画像

    本文链接:https://www.haomeiwen.com/subject/rrkwsltx.html