美文网首页
ClickHouse表数据清理

ClickHouse表数据清理

作者: 淡淡的小番茄 | 来源:发表于2021-09-09 09:31 被阅读0次

背景

平台上接入的设备越来越多,表的数据也越来越大,前期表没有设置相应的TTL(无明确的业务需求)。本身我们磁盘又不大,只有1T,目前已经占用一半多点。所以需要对表数据进行清理。

查看整体的表数据情况

各版本的ClickHouse系统表字段可能存在细微差别,我们使用的版本是20.3。

select

  database,

  table,

  formatReadableSize(size) as size,

  formatReadableSize(bytes_on_disk) as bytes_on_disk,

  formatReadableSize(data_uncompressed_bytes) as data_uncompressed_bytes,

  formatReadableSize(data_compressed_bytes) as data_compressed_bytes,

  compress_rate,

  rows,

  days,

  formatReadableSize(avgDaySize) as avgDaySize

from

 (

   select

      database,

      table,

      sum(bytes) as size,

      sum(rows) as rows,

      min(min_date) as min_date,

      max(max_date) as max_date,

      sum(bytes_on_disk) as bytes_on_disk,

      sum(data_uncompressed_bytes) as data_uncompressed_bytes,

      sum(data_compressed_bytes) as data_compressed_bytes,

      (data_compressed_bytes / data_uncompressed_bytes) * 100 as compress_rate,

      max_date - min_date as days,

      size / (max_date - min_date) as avgDaySize

    from system.parts

    where active

    group by

      database,

      table

    order by data_compressed_bytes desc

);

我查询的是整个实例的,当然你也可以指定数据库或者表名。

发现系统日志表占用空间太大,前期采用的默认配置,并没有配置日志表的TTL。

查看系统日志表

select

  database,

  table,

  formatReadableSize(size) as size,

  formatReadableSize(bytes_on_disk) as bytes_on_disk,

  formatReadableSize(data_uncompressed_bytes) as data_uncompressed_bytes,

  formatReadableSize(data_compressed_bytes) as data_compressed_bytes,

  compress_rate,

  rows,

  days,

  formatReadableSize(avgDaySize) as avgDaySize

from

 (

   select

      database,

      table,

      sum(bytes) as size,

      sum(rows) as rows,

      min(min_date) as min_date,

      max(max_date) as max_date,

      sum(bytes_on_disk) as bytes_on_disk,

      sum(data_uncompressed_bytes) as data_uncompressed_bytes,

      sum(data_compressed_bytes) as data_compressed_bytes,

      (data_compressed_bytes / data_uncompressed_bytes) * 100 as compress_rate,

      max_date - min_date as days,

      size / (max_date - min_date) as avgDaySize

    from system.parts

    where active

    and database = 'system'

    and `table` in('query_thread_log','query_log')

    and `partition` ='202108'

    group by

      database,

      table

    order by data_compressed_bytes desc

);

日志表采用的是默认配置,是按照月进行分区的。一个月压缩后的数据3g左右,可以直接以删除分区的方式进行数据清理。

删除分区

只需留最近的一个分区,可以写一个循环shell脚本,进行自动清理。删除单分区的脚本如下:

alter table query_thread_log drop partition '202101';

alter table query_log drop partition '202101';

ClickHouse对于删除数据会有个保护机制,如果本次删除的数据大于50G,会不允许删除的。一般来说我们不修改这个配置项,直接通过新建标识文件的方式来临时允许它删除(/data/clickhouse/flags/force_drop_table),需要注意的是一旦删除成功force_drop_table文件也会一并被删除,如果想再次删除需要再一次新建force_drop_table文件。文件的权限和data文件夹里面的其他文件保持一致即可。

SQL Error [359]: ClickHouse exception, code: 359, host: 172.30.208.119, port: 8123; Code: 359, e.displayText() = DB::Exception: Table or Partition in system.query_thread_log was not dropped.

Reason:

1. Size (82.05 GB) is greater than max_[table/partition]_size_to_drop (50.00 GB)

2. File '/data/clickhouse/flags/force_drop_table' intended to force DROP doesn't exist

How to fix this:

1. Either increase (or set to zero) max_[table/partition]_size_to_drop in server config and restart ClickHouse

2. Either create forcing file /data/clickhouse/flags/force_drop_table and make sure that ClickHouse has write permission for it.

Example:

设置日志表TTL

保存一个月的数据。

ALTER TABLE query_log MODIFY TTL event_date + toIntervalMonth(1);

ALTER TABLE query_thread_log MODIFY TTL event_date + toIntervalMonth(1);

保存15天的数据

ALTER TABLE query_log MODIFY TTL event_date + toIntervalDay(15);

ALTER TABLE query_thread_log MODIFY TTL event_date + toIntervalDay(15);

相关文章

网友评论

      本文标题:ClickHouse表数据清理

      本文链接:https://www.haomeiwen.com/subject/ulkiwltx.html