美文网首页PostgreSQL技术干货
PostgreSQL DBA(4) - PG 11 New Fe

PostgreSQL DBA(4) - PG 11 New Fe

作者: EthanHe | 来源:发表于2018-09-19 17:00 被阅读8次

    PG 11即将正式发布,本节简单了PG 11的一些新特性,包括并行查询的性能提升和数据表分区的功能增强。

    一、并行查询

    Parallel Hash
    Hash Join执行时,在构造Hash表和进行Hash连接时,PG 11可使用并行的方式执行。
    测试脚本:

    testdb=# create table t1 (c1 int,c2 varchar(40),c3 varchar(40));
    CREATE TABLE
    testdb=# 
    testdb=# insert into t1 select generate_series(1,5000000),'TEST'||generate_series(1,1000000),generate_series(1,1000000)||'TEST';
    INSERT 0 5000000
    
    testdb=# drop table if exists t2;
    DROP TABLE
    testdb=# create table t2 (c1 int,c2 varchar(40),c3 varchar(40));
    CREATE TABLE
    testdb=# 
    testdb=# insert into t2 select generate_series(1,1000000),'T2'||generate_series(1,1000000),generate_series(1,1000000)||'T2';
    INSERT 0 1000000
    
    testdb=# explain verbose
    testdb-# select t1.c1,t2.c1 
    testdb-# from t1 inner join t2 on t1.c1 = t2.c1;
                                             QUERY PLAN                                          
    ---------------------------------------------------------------------------------------------
     Gather  (cost=18372.00..107975.86 rows=101100 width=8)
       Output: t1.c1, t2.c1
       Workers Planned: 2 -- 2 Workers
       ->  Parallel Hash Join  (cost=17372.00..96865.86 rows=42125 width=8) -- Parallel Hash Join
             Output: t1.c1, t2.c1
             Hash Cond: (t1.c1 = t2.c1)
             ->  Parallel Seq Scan on public.t1  (cost=0.00..45787.33 rows=2083333 width=4)
                   Output: t1.c1
             ->  Parallel Hash  (cost=10535.67..10535.67 rows=416667 width=4) -- Parallel Hash
                   Output: t2.c1
                   ->  Parallel Seq Scan on public.t2  (cost=0.00..10535.67 rows=416667 width=4)
                         Output: t2.c1
    
    

    除了Parallel Hash外,PG 11在执行Parallel Append(执行UNION ALL等集合操作)/CREATE TABLE AS SELECT/CREATE MATERIALIZED VIEW/SELECT INTO/CREATE INDEX等操作时以并行的方式执行.

    二、数据表分区

    Hash Partition
    PG 在11.x引入了Hash分区,关于Hash分区,官方文档有如下说明:

    The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.

    每个Hash分区需指定"模"(modulus)和"余"(remainder),数据在哪个分区(partition index)的计算公式:
    partition index = abs(hashfunc(key)) % modulus

    drop table if exists t_hash1;
    create table t_hash1 (c1 int,c2  varchar(40),c3 varchar(40)) partition by hash(c1);
    create table t_hash1_1 partition of t_hash1 for values with (modulus 6,remainder 0);
    create table t_hash1_2 partition of t_hash1 for values with (modulus 6,remainder 1);
    create table t_hash1_3 partition of t_hash1 for values with (modulus 6,remainder 2);
    create table t_hash1_4 partition of t_hash1 for values with (modulus 6,remainder 3);
    create table t_hash1_5 partition of t_hash1 for values with (modulus 6,remainder 4);
    create table t_hash1_6 partition of t_hash1 for values with (modulus 6,remainder 5);
    
    testdb=# insert into t_hash1 
    testdb-# select generate_series(1,1000000),'HASH'||generate_series(1,1000000),generate_series(1,1000000)||'HASH';
    INSERT 0 1000000
    

    数据在各分区上的分布大体均匀.
    2018-9-19 注:由于插入数据时语句出错,昨天得出的结果有误(但数据在各个分区的分布上不太均匀,t_hash1_1分区行数明显的比其他分区的要多很多),请忽略

    testdb=# select count(*) from only t_hash1;
    ; count 
    -------
         0
    (1 row)
    
    testdb=# select count(*) from only t_hash1_1;
     count  
    --------
     166480
    (1 row)
    
    testdb=# select count(*) from only t_hash1_2;
     count  
    --------
     166904
    (1 row)
    
    testdb=# select count(*) from only t_hash1_3;
     count  
    --------
     166302
    (1 row)
    
    testdb=# select count(*) from only t_hash1_4;
     count  
    --------
     166783
    (1 row)
    
    testdb=# select count(*) from only t_hash1_5;
     count  
    --------
     166593
    (1 row)
    
    testdb=# select count(*) from only t_hash1_6;
     count  
    --------
     166938
    (1 row)
    

    Hash分区键亦可以创建在字符型字段上

    testdb=# drop table if exists t_hash3;
    DROP TABLE
    testdb=# create table t_hash3 (c1 int,c2  varchar(40),c3 varchar(40)) partition by hash(c2);
    CREATE TABLE
    
    -- 需创建相应的"Partition"用于存储相应的数据
    testdb=# insert into t_hash3 
    testdb-# select generate_series(1,100000),'HASH'||generate_series(1,1000000),generate_series(1,1000000)||'HASH';
    ERROR:  no partition of relation "t_hash3" found for row
    DETAIL:  Partition key of the failing row contains (c2) = (HASH1).
    
    -- 6个分区,3个sub-table,插入数据会出错
    testdb=# 
    testdb=# create table t_hash3_1 partition of t_hash3 for values with (modulus 6,remainder 0);
    CREATE TABLE
    testdb=# create table t_hash3_2 partition of t_hash3 for values with (modulus 6,remainder 1);
    CREATE TABLE
    testdb=# create table t_hash3_3 partition of t_hash3 for values with (modulus 6,remainder 2);
    CREATE TABLE
    testdb=# insert into t_hash3 
    testdb-# select generate_series(1,10000),'HASH'||generate_series(1,10000),generate_series(1,10000)||'HASH';
    ERROR:  no partition of relation "t_hash3" found for row
    DETAIL:  Partition key of the failing row contains (c2) = (HASH1).
    
    -- 3个分区,3个sub-table,正常
    testdb=# drop table if exists t_hash3;
    DROP TABLE
    testdb=# create table t_hash3 (c1 int,c2  varchar(40),c3 varchar(40)) partition by hash(c2);
    CREATE TABLE
    testdb=# create table t_hash3_1 partition of t_hash3 for values with (modulus 3,remainder 0);
    CREATE TABLE
    testdb=# create table t_hash3_2 partition of t_hash3 for values with (modulus 3,remainder 1);
    CREATE TABLE
    testdb=# create table t_hash3_3 partition of t_hash3 for values with (modulus 3,remainder 2);
    CREATE TABLE
    testdb=# insert into t_hash3 
    testdb-# select generate_series(1,10000),'HASH'||generate_series(1,10000),generate_series(1,10000)||'HASH';
    INSERT 0 10000
    
    

    考察分区的数据分布,还比较均匀:

    testdb=# 
    testdb=# select count(*) from only t_hash3;
     count 
    -------
         0
    (1 row)
    
    testdb=# select count(*) from only t_hash3_1;
     count 
    -------
      3378
    (1 row)
    
    testdb=# select count(*) from only t_hash3_2;
     count 
    -------
      3288
    (1 row)
    
    testdb=# select count(*) from only t_hash3_3;
     count 
    -------
      3334
    (1 row)
    

    Default Partition
    List和Range分区可指定Default Partition(Hash分区不支持).

    Update partition key
    PG 11可Update分区键,这会导致数据的"迁移".

    Create unique constraint
    PG 11在分区表上创建主键和唯一索引(注:Oracle在很早的版本已支持此特性).
    在普通字段上可以创建BTree索引.

    testdb=# alter table t_hash1 add primary key(c1);
    ALTER TABLE
    testdb=# create index idx_t_hash1_c2 on t_hash1(c2);
    CREATE INDEX
    

    FOREIGN KEY support
    PG 11支持在分区上创建外键.

    除了上述几个新特性外,分区上面,PG 11在Automatic index creation/INSERT ON CONFLICT/Partition-Wise Join / Partition-Wise Aggregate/FOR EACH ROW trigger/Dynamic Partition Elimination/Control Partition Pruning上均有所增强.

    三、参考资料

    PostgreSQL 11 New Features With Examples(Beta 1)
    PostgreSQL 11 Table Partitioning

    相关文章

      网友评论

        本文标题:PostgreSQL DBA(4) - PG 11 New Fe

        本文链接:https://www.haomeiwen.com/subject/tkginftx.html