美文网首页
pt-archiver的"ascending-inde

pt-archiver的"ascending-inde

作者: 刀尖红叶 | 来源:发表于2017-08-10 10:56 被阅读225次

    最近在用pt-archiver归档一张1.4T的表,使用如下命令:

    pt-archiver --no-check-charset  --primary-key-only  --bulk-delete --commit-each --limit 1000  --statistics --progress 1000 --source h=localhost,p=xxx,D=xxx,t=xxx --where "collect_time <= \"1502072807\""  --purge
    

    但发现pt-archiver迟迟不输出结果,MySQL的process里显示如下查询运行了几百秒:

    SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` FORCE INDEX(`PRIMARY`) WHERE (collect_time <= "1501725651") AND (`id` < '159695169') LIMIT 1000;
    

    explain发现是个慢查询:

    +----+-------------+---------------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+
    | id | select_type | table         | partitions | type  | possible_keys | key     | key_len | ref  | rows     | filtered | Extra       |
    +----+-------------+---------------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+
    |  1 | SIMPLE      | xxx | NULL       | range | PRIMARY       | PRIMARY | 8       | NULL | 77493125 |    33.33 | Using where |
    +----+-------------+---------------+------------+-------+---------------+---------+---------+------+----------+----------+-------------+
    1 row in set, 1 warning (0.00 sec)
    

    先加上--dry-run选项验证下是不是pt-archiver产生的慢查询:

    > pt-archiver --no-check-charset  --primary-key-only  --bulk-delete --commit-each --limit 1000  --statistics --progress 1000 --source h=localhost,p=xxx,D=xxx,t=xxx --where "collect_time <= \"1502072807\""  --purge --dry-run
    SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` FORCE INDEX(`PRIMARY`) WHERE (collect_time <= "1502072807") AND (`id` < '160458402') LIMIT 1000
    SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` FORCE INDEX(`PRIMARY`) WHERE (collect_time <= "1502072807") AND (`id` < '160458402') AND ((`id` >= ?)) LIMIT 1000
    DELETE FROM `xxx`.`xxx` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (collect_time <= "1502072807") LIMIT 1000
    

    果然是的

    原来pt-archiver默认开启了ascending-index optimization

    The default ascending-index optimization causes pt-archiver to optimize repeated SELECT queries so they seek into the index where the previous query ended, then scan along it, rather than scanning from the beginning of the table every time. This is enabled by default because it is generally a good strategy for repeated accesses.

    那解决办法也就清晰明了了,有2个:
    1.加上--no-ascend选项禁用ascending-index optimization

    > pt-archiver --no-check-charset  --primary-key-only  --bulk-delete --commit-each --limit 1000  --statistics --progress 1000 --source h=localhost,p=xxx,D=xxx,t=xxx --where "collect_time <= \"1502072807\""  --purge --dry-run --no-ascend
    SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` WHERE (collect_time <= "1502072807") LIMIT 1000
    SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `xxx`.`xxx` WHERE (collect_time <= "1502072807") LIMIT 1000
    DELETE FROM `xxx`.`xxx` WHERE (((`id` >= ?))) AND (((`id` <= ?))) AND (collect_time <= "1502072807") LIMIT 1000
    

    2.在配置source的DSN那加i=索引名强制指定我们希望用的索引

    > pt-archiver --no-check-charset  --primary-key-only  --bulk-delete --commit-each --limit 1000  --statistics --progress 1000 --source h=localhost,p=xxx,D=xxx,t=xxx,i=ix_collect_time --where "collect_time <= \"1502072807\""  --purge --dry-run 
    SELECT /*!40001 SQL_NO_CACHE */ `id`,`collect_time` FROM `xxx`.`xxx` FORCE INDEX(`ix_collect_time`) WHERE (collect_time <= "1502072807") LIMIT 1000
    SELECT /*!40001 SQL_NO_CACHE */ `id`,`collect_time` FROM `xxx`.`xxx` FORCE INDEX(`ix_collect_time`) WHERE (collect_time <= "1502072807") AND ((`collect_time` >= ?)) LIMIT 1000
    DELETE FROM `xxx`.`xxx` WHERE (((`collect_time` >= ?))) AND (((`collect_time` <= ?))) AND (collect_time <= "1502072807") LIMIT 1000
    

    相关文章

      网友评论

          本文标题:pt-archiver的"ascending-inde

          本文链接:https://www.haomeiwen.com/subject/rsocrxtx.html