1、语法
子查询有IN
和EXISTS
两种,哪种速度更快呢?主要取决于两张表的大小关系。
SELECT * FROM A WHERE cc IN (SELECT cc FROM B);
SELECT * FROM A WHERE EXISTS (SELECT cc FROM B WHERE B.cc=A.cc);
IN
会先查询B,然后把查询B的结果,做为输入,再查询A,当 A 记录多时,可以很好的利用 A 的索引,时间复杂度是O(b*log a)
。
EXISTS
会先查询A,然后把查询A的结果,做为输入,再查询B,当 B 记录多时,可以很好的利用 B 的索引,时间复杂度是O(a*log b)
。
2、下面来做个实验
(1)先执行 source first.sql;
来创建 2 张数据表
DROP TABLE IF EXISTS `t_big`;
CREATE TABLE `t_big` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
`phone` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY idx_name(`name`),
KEY idx_phone(`phone`)
) ENGINE = InnoDB;
DROP TABLE IF EXISTS `t_small`;
CREATE TABLE `t_small` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(30) NOT NULL,
`nickname` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY idx_name(`name`),
KEY idx_nickname(`nickname`)
) ENGINE = InnoDB;
(2)执行如下 shell
脚本bash rand.sh
来生成随机数据的 sql 插入语句的 second.sql 文件,再执行 source second.sql;
来插入数据:
#!/bin/bash
small=50
big=500000
rm -f second.sql
for((i=0;i<${big};i++));
do
printf "INSERT INTO \`t_big\` (\`name\`, \`phone\`) VALUES ('zhang%d', '%08d');\n" $[$RANDOM%1000000] $[$RANDOM%100000000] >> second.sql
done
for((i=0;i<${small};i++));
do
printf "INSERT INTO \`t_small\` (\`name\`, \`nickname\`) VALUES ('zhang%d', '%s');\n" $[$RANDOM%1000000] $(tr -dc "a-z"</dev/urandom|head -c 8) >> second.sql
done
echo "create second.sql success!"
(3)此时大表 t_big 的记录数量多,小表 t_small 的记录少,可以看到使用 EXISTS
比 IN
慢很多。
mysql> set profiling = 1;
SELECT * FROM t_big WHERE `name` IN (SELECT `name` FROM t_small);
mysql> show profiles;
+----------+------------+------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+------------------------------------------------------------------+
| 1 | 0.01406125 | SELECT * FROM t_big WHERE `name` IN (SELECT `name` FROM t_small) |
+----------+------------+------------------------------------------------------------------+
+----------+------------+---------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+---------------------------------------------------------------------------------------------+
| 1 | 1.65336600 | SELECT * FROM t_big WHERE EXISTS (SELECT `name` FROM t_small WHERE t_big.name=t_small.name) |
+----------+------------+---------------------------------------------------------------------------------------------+
(4)再做一个相反的实验让两个表的位置互换一下,可以看到 EXISTS
比 IN
快。
+----------+------------+------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+------------------------------------------------------------------+
| 1 | 0.00277950 | SELECT * FROM t_small WHERE `name` IN (SELECT `name` FROM t_big) |
+----------+------------+------------------------------------------------------------------+
+----------+------------+---------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+---------------------------------------------------------------------------------------------+
| 1 | 0.00050700 | SELECT * FROM t_small WHERE EXISTS (SELECT `name` FROM t_big WHERE t_small.name=t_big.name) |
+----------+------------+---------------------------------------------------------------------------------------------+
3、结论
当 A > B,用 IN:
SELECT * FROM A WHERE cc IN (SELECT cc FROM B);
当 A < B,用 EXISTS:
SELECT * FROM A WHERE EXISTS (SELECT cc FROM B WHERE B.cc=A.cc);
网友评论