PostgreSQL 多个数组聚合为一维数组加速(array_a

作者: rewq123 | 来源:发表于2018-07-10 15:59 被阅读27次

PostgreSQL 多个数组聚合为一维数组加速(array_a
学习Java第五天
01-JS-04
10-Python-NumPy数组分割
JavaScript 数组对象原型方法（一）
PHP实用函数库
数组方法
ES6 扩展运算符常用场景
shell-数组
面试题-多线程Demo

标签

PostgreSQL , array_agg , arragg

背景

多个数组聚合为一维数组，求PC。业务背景见：

《PostgreSQL APP海量FEED LOG实时质量统计CASE(含percentile_disc)》

由于PostgreSQL内置的聚合函数array_agg支持的数组聚合实际上是将多个数组聚合为多维数组。并不是一维数组。

例如：

postgres=# select array_agg(arr) from (values(array[1,2,3]), (array[4,5,6])) t(arr); array_agg ------------------- {{1,2,3},{4,5,6}} (1row)

而实际上我们要的是一维数组的结果

{1,2,3,4,5,6}

此时需要自定义一个聚合函数

createaggregatearragg (anyarray) (sfunc = array_cat, stype=anyarray,PARALLEL=safe);

效果如下

postgres=# select arragg(arr) from (values(array[1,2,3]), (array[4,5,6])) t(arr); arragg --------------- {1,2,3,4,5,6} (1row)

但是这个新加的聚合用到了array_cat，大量的memcpy导致性能并不好。

array_agg性能对比arragg

聚合100万个元素.

1、array_agg，耗时0.14秒

postgres=# explain (analyze,verbose,timing,costs,buffers) select array_agg(array[1,2,3,4,5,6,7,8,9,10]) from generate_series(1,100000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=12.50..12.51rows=1width=32) (actual time=113.134..113.134rows=1loops=1) Output: array_agg('{1,2,3,4,5,6,7,8,9,10}'::integer[]) ->FunctionScanonpg_catalog.generate_series(cost=0.00..10.00rows=1000width=0)(actual time=53.585..66.200rows=100000loops=1)Output:generate_seriesFunctionCall:generate_series(1,100000)Planningtime: 0.064msExecutiontime: 143.075ms(7rows)

2、arragg(use array_cat)，耗时108.15秒

postgres=# explain (analyze,verbose,timing,costs,buffers) select arragg(array[1,2,3,4,5,6,7,8,9,10]) from generate_series(1,100000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=12.50..12.51rows=1width=32) (actual time=108081.186..108081.186rows=1loops=1) Output: arragg('{1,2,3,4,5,6,7,8,9,10}'::integer[]) ->FunctionScanonpg_catalog.generate_series(cost=0.00..10.00rows=1000width=0)(actual time=11.121..81.467rows=100000loops=1)Output:generate_seriesFunctionCall:generate_series(1,100000)Planningtime: 0.148msExecutiontime: 108154.846ms