水平分表--使用postgresql继承

作者: 醉里挑灯看剑422 | 来源:发表于2018-10-16 22:58 被阅读2次

水平分表--使用postgresql继承
PostgreSQL 内置分区
pg inherit
使用Kettle导入数据到ADB for PostgreSQL
数据库篇
PostgreSQL配置与单表查询
postgresql----COPY之表与文件之间的拷贝
postgresql的update inner join
我去你妈的JDBC
PostgreSQL如何设置自增主键ID

介绍

自己负责的项目的一张表access_log，条目数达到千万级，sql查询时已经显现出较慢的迹象，于是开始折腾分表，优化查询时间。就从最简单的水平分表入手。access_log表结构：

勘误：季度的单词是quarter，下面拼写quater是错误的

column	type	modifiers
id	integer	not null default nextval('access_log_id_seq'::regclass)
userid	integer
wxopenid	character varying(50)	not null
url	character varying(200)	not null
useragent	character varying(500)
accessdate	timestamp(0) without time zone
createip	character varying(25)
scene	character varying(50)

思路

按创建日期accessdate将access_log表分为若干子表，开启约束排除，按日期查询时将直接进入符合条件的子表，提高查询效率。这种做法无法提高插入效率，查询条件与日期无关时也无能为力。考虑到按日期查询是经常发生的操作，该分表方案优化效果明显。

1. 创建主表

create table access_log_main (
id int4 default nextval('access_log_id_seq'::regclass) primary key,
userid int4,
wxopenid varchar(50)  NOT NULL,
url varchar(200)  NOT NULL,
useragent varchar(500) ,
accessdate timestamp(0),
createip varchar(25) ,
scene varchar(50)
);

2. 创建分区表

--继承主表
create table access_log_2017_quater4 (
check (accessdate >= date '2017-10-01' and accessdate < date '2018-01-01')
) inherits (access_log_main);
create table access_log_2018_quater1 (
    check ( accessdate >= date '2018-01-01' and accessdate < date '2018-04-01')
) inherits (access_log_main);
--other表在约束都不匹配时存入，防止丢失数据
create table access_log_other () inherits (access_log_main);

--创建索引
create index access_log_2017_quater4_accessdate on access_log_2017_quater4 (accessdate);
create index access_log_2018_quater1_accessdate on access_log_2018_quater1 (accessdate);
create index access_log_other_accessdate on access_log_other (accessdate);

3. 创建触发器，实现智能insert

create or replace function access_log_insert_trigger()
returns trigger as $$
begin 
    if(new.accessdate < date '2017-10-01') THEN
    insert into access_log values (NEW.*);
    elseif (NEW.accessdate >= date '2017-10-01' and NEW.accessdate < date '2018-01-01') THEN
    insert into access_log_2017_quater4 values (NEW.*);
elseif (NEW.accessdate >= date '2018-01-01' and NEW.accessdate < date '2018-04-01') THEN
    insert into access_log_2018_quater1 values (NEW.*);
    else 
    insert into access_log_other values (NEW.*);
end if;
return null;
END;
$$
language plpgsql;

--创建调用触发器函数的触发器
create trigger insert_on_access_log before insert on access_log_main for each row execute procedure access_log_insert_trigger();

4. 动态添加继承关系：access_log继承主表access_log_main

alter table access_log add constraint access_log_accessdate_check check (accessdate < date '2017-10-01');
create index access_log_accessdate on access_log (accessdate);
alter table access_log inherit access_log_main;

这里简单提一下，创建表时继承使用inherits关键字，而已有的表继承其他表使用inherit关键字。已创建的表继承时，要求子表必须已经包含新父表的所有字段且类型一致。脱离继承关系可以用带有no inderit的alter命令。
另外值得注意的是：继承的索引（包括唯一约束）和外键约束只能用于单个表，而不能包括它们的字表。

查看表结构

到现在准备工作基本完成了，你可查看继承表结构，或者插入记录，验证上面操作是否生效。

---查看accessl_log_main及其所有子表
\d+ access_log_main
---插入记录
insert into access_log_main(userid, wxopenid, url, accessdate, createip, scene) 
values(-1, 'test', '/page/test', '2017-10-08', '127.0.0.1', '1001');
--查询发现上面记录确实插入了表access_log_2017_quater4
select * from access_log_2017_quater4;

更新维护

继承结构非常灵活，你可以随时动态增加/删除子表；添加/解除继承关系，来扩展或者优化分表结构。当然这些操作一般需要修改触发器

--删除子表
drop table access_log_other;
--删除继承关系
alter table access_log no inherit access_log_main;

测试优化效果

1. 首先你需要大量数据，可以写一个如下的函数循环插入数据

create or replace function insertACLOG() returns void as
$$
declare 
i int :=0;
tempTime timestamp;
j int;
begin
while i < 10000000 loop
    j:=round(random()*80);
    tempTime:=date '2017-01-01' + j;
    insert into access_log(userid, wxopenid, url, accessdate, createip, scene) 
    values(round(random() * 1000000), 'test', '/page/test', tempTime, '127.0.0.1', '1001');
    i:= i+1;
raise notice 'holy shit%', i;
end loop;
end;
$$ language plpgsql;

2. 约束排除

开启约束排除后，规划器将查询条件和check约束进行比较，在查询条件与约束冲突的情况下，忽略对表的扫描。分区键和约束设置合理的情况下，将会大大提高效率。
修改{pgsql安装目录}/main/postgresql.conf中的constraint_exclusion参数:

constraint_exclusion值	含义
partition	检查仅用于继承子表和UNION ALL子查询的约束
on	检查所有约束
off	不检查约束

partition是默认设置。它往往使用继承和分表提高性能。
去掉注释，值改为partition或on，重启postgresql服务：service postgresql restart

--查看constraint_exclusion状态
show constraint_exclusion;
--开启约束排除后，你仍然可以在事务里临时关闭，观察其关闭与开启的效率改变
set constraint_exclusion = off;
explain select count(1) from access_log_main  where accessdate > '2017-10-01';

3. 优化效果

数据总量：2000w；目标分表数据量300w；查询在不同的数据库以作对比，防止干扰。

select * from access_log/access_log_main where userid = xxxx and accessdate > '2018-01-01';

分表方案	1	2	3	4	5	6	average
access_log	2.67s	2.68s	2.59s	2.70s	2.68s	2.63s	约2.6s
access_log_main	0.53s	0.51s	0.48s	0.49s	0.43s	0.47s	约0.48s

扩展

随着时间增加，分表也要不断更新。写一个脚本自动生成需要的DDL是明智的。

--access_log_parttable.sql
do $$
declare 
nexttime timestamp;
yearnum int4;
quarter int4;
starttime timestamp;
endtime timestamp;
tablename varchar;
indexname varchar;
triggerfuncname varchar;
tempsql varchar;
begin
nexttime := now() + interval '3 months';
yearnum := date_part('year', nexttime);
quarter := date_part('quarter', nexttime);
starttime := date_trunc('quarter', nexttime);
endtime := nexttime + interval '3 months';
endtime := date_trunc('quarter', endtime);
tablename := 'access_log_' || yearnum || '_quarter' || quarter ;
indexname := tablename || '_accessdate' ; 
triggerfuncname := 'access_log_other_' || yearnum || '_quarter' || quarter ;

raise notice 'nexttime: %', nexttime;
raise notice 'year: %', yearnum;
raise notice 'quarter:  %', quarter;
raise notice 'starttime:    %', starttime;
raise notice 'endtime:  %', endtime;
raise notice 'tablename:    %', tablename;
raise notice 'indexname:    %', indexname;
raise notice 'triggerfuncname:  %', triggerfuncname;

tempsql := 'create table ' || tablename || '(
check (accessdate >= date ''' || starttime || ''' and accessdate < date ''' || endtime || ''' )
) inherits (access_log_main);';
execute(tempsql);
 
tempsql := 'create index ' || indexname || ' on ' || tablename || ' (accessdate)';
execute(tempsql);

tempsql := 'create or replace function ' || triggerfuncname || '()
returns trigger as
$body$ 
begin
    if(new.accessdate >= date ''' || starttime || ''' and new.accessdate < date ''' || endtime || ''' ) then
    insert into ' || tablename || ' values(new.*);
return null;
    else
    return new;
    end if;
end;
$body$ 
language plpgsql';
execute(tempsql);

tempsql := 'drop trigger if exists inseret_on_' || triggerfuncname || ' on access_log_other';
execute(tempsql);

tempsql := 'create trigger inseret_on_' || triggerfuncname || ' before insert on access_log_other
 for each row execute procedure ' || triggerfuncname || '()';
execute(tempsql);

end 
$$ LANGUAGE plpgsql;

简单罗列一下上面脚本涉及的知识点，学到蛮多：

DO [ LANGUAGE lang_name ] code $$ 执行匿名代码块
各种时间/日期的处理函数，尤其是date_tranc，非常好用
拼接sql，并用execute执行
多触发器时，按字母顺序执行。返回null后续皆不执行，返回new后续继续执行。详细。

坑：

上面代码43行，必须用 $body$ ，而不是$$，否则一直报错：error at or near 'begin'(line 44)

接下来便是在linux上定时执行下面命令：

psql -f 'access_log_parttable.sql' "user=xxxx password=xxxx dbname=xxxx";

至于linux设置自动执行脚本，参考这里。

参考

网友评论

本文标题：水平分表--使用postgresql继承

本文链接：https://www.haomeiwen.com/subject/vfikzftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！