这是Star Schema Benchmark 标准测试集优化的第三篇,前一篇我们分析了下表数据,这一篇是最后一篇了。
一、分析算法路径
更新到前一篇的时候,其实专利技术已经集成到数据库中了,这个算法路径,主要是验证一下:在测试环境中的算法路径,是否和开发环境中一致。实际结果如下,13 条SQL语句的算法路径和开发环境中的算法路径,经过验证是完全一致的。
2022-10-20 01:39:53.344 - SQL2: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25; ;
Status: PASS, Elapsed: 3.923, Affected: 1
Info: Job[67:RESTRICT, tasks:193, time:3,591, size: 3,576,178, [0|193|3,591] ]
Job[69:DIMENSION_JOIN, tasks:193, time:16, size: 3,576,178, [0|193|15] ]
Job[71:OUTPUT_HASH_GROUP_BY, tasks:6, time:166, size: 6, [0|6|167] ]
Job[73:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:3, size: 1, [0|1|3] ]
Job[75:PROJECT, tasks:1, time:2, size: 1, [0|1|1] ]
total time: 3,788, record count: 1
restrict的时间/整体时间: 3591/3788
result:
REVENUE |
- - - - - - -
6568512155417 |
2022-10-20 01:39:59.707 - SQL4: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35; ;
Status: PASS, Elapsed: 1.767, Affected: 1
Info: Job[99:RESTRICT, tasks:193, time:1,727, size: 126,610, [0|193|1,728] ]
Job[101:DIMENSION_JOIN, tasks:193, time:8, size: 126,610, [0|193|8] ]
Job[103:OUTPUT_HASH_GROUP_BY, tasks:2, time:22, size: 2, [0|2|22] ]
Job[105:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:1, size: 1, [0|1|1] ]
Job[107:PROJECT, tasks:1, time:1, size: 1, [0|1|1] ]
total time: 1,764, record count: 1
restrict的时间/整体时间: 1727/1764
result:
REVENUE |
- - - - - - -
550150245374 |
2022-10-20 01:40:04.552 - SQL6: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_weeknuminyear = 6 and d_year = 1994 and lo_discount between 5 and 7 and lo_quantity between 26 and 35; ;
Status: PASS, Elapsed: 2.138, Affected: 1
Info: Job[131:RESTRICT, tasks:193, time:2,108, size: 28,441, [0|193|2,109] ]
Job[133:DIMENSION_JOIN, tasks:193, time:7, size: 28,441, [0|193|7] ]
Job[135:OUTPUT_HASH_GROUP_BY, tasks:1, time:15, size: 1, [0|1|15] ]
Job[137:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:1, size: 1, [0|1|1] ]
Job[139:PROJECT, tasks:1, time:1, size: 1, [0|1|1] ]
total time: 2,136, record count: 1
restrict的时间/整体时间: 2108/2136
result:
REVENUE |
- - - - - - -
122223605792 |
2022-10-20 01:40:31.923 - SQL8: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder ,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand order by d_year, p_brand; ;
Status: PASS, Elapsed: 4.714, Affected: 280
Info: Job[193:RESTRICT, tasks:193, time:4,342, size: 1,442,564, [0|193|4,342] ]
Job[195:DIMENSION_JOIN, tasks:193, time:9, size: 1,442,564, [0|193|10] ]
Job[197:DIMENSION_JOIN, tasks:193, time:5, size: 1,442,564, [0|193|5] ]
Job[199:DIMENSION_JOIN, tasks:193, time:5, size: 1,442,564, [0|193|5] ]
Job[202:OUTPUT_HASH_GROUP_BY, tasks:6, time:332, size: 1,680, [0|6|332] ]
Job[204:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:4, size: 280, [0|7|4] ]
Job[206:PROJECT, tasks:7, time:3, size: 280, [0|7|3] ]
Job[208:RANGE_SORT, tasks:7, time:2, size: 280, [0|7|2] ]
Job[212:PARTITION_TABLET, tasks:7, time:0, size: 280, [0|7|0] ]
Job[214:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 70, [0|1|1] ]
total time: 4,711, record count: 280
restrict的时间/整体时间: 4342/4711
result:
LO_REVENUE |D_YEAR |P_BRAND |
- - - - - - - - - - - - - - - - - -
18712903257 |1992 |MFGR#121 |
20576919851 |1992 |MFGR#1210 |
20452654696 |1992 |MFGR#1211 |
2022-10-20 01:40:37.664 - SQL10: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA' group by d_year, p_brand order by d_year, p_brand; ;
Status: PASS, Elapsed: 1.24, Affected: 56
Info: Job[292:RESTRICT, tasks:193, time:1,017, size: 287,885, [0|193|1,017] ]
Job[294:DIMENSION_JOIN, tasks:193, time:5, size: 287,885, [0|193|5] ]
Job[296:DIMENSION_JOIN, tasks:193, time:5, size: 287,885, [0|193|5] ]
Job[298:DIMENSION_JOIN, tasks:193, time:4, size: 287,885, [0|193|4] ]
Job[301:OUTPUT_HASH_GROUP_BY, tasks:3, time:193, size: 168, [0|3|193] ]
Job[303:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 56, [0|7|2] ]
Job[305:PROJECT, tasks:7, time:1, size: 56, [0|7|1] ]
Job[307:RANGE_SORT, tasks:7, time:1, size: 56, [0|7|1] ]
Job[311:PARTITION_TABLET, tasks:7, time:0, size: 56, [0|7|0] ]
Job[313:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 12, [0|1|0] ]
total time: 1,235, record count: 56
restrict的时间/整体时间: 1017/1235
result:
LO_REVENUE |D_YEAR |P_BRAND |
- - - - - - - - - - - - - - - - - -
19803695538 |1992 |MFGR#2221 |
19639734537 |1992 |MFGR#2222 |
19945070508 |1992 |MFGR#2223 |
2022-10-20 01:40:39.888 - SQL12: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand = 'MFGR#2239' and s_region = 'EUROPE' group by d_year, p_brand order by d_year, p_brand; ;
Status: PASS, Elapsed: 0.862, Affected: 7
Info: Job[397:RESTRICT, tasks:193, time:785, size: 35,599, [0|193|785] ]
Job[399:DIMENSION_JOIN, tasks:193, time:4, size: 35,599, [0|193|4] ]
Job[401:DIMENSION_JOIN, tasks:193, time:3, size: 35,599, [0|193|3] ]
Job[403:DIMENSION_JOIN, tasks:193, time:3, size: 35,599, [0|193|3] ]
Job[406:OUTPUT_HASH_GROUP_BY, tasks:1, time:49, size: 7, [0|1|49] ]
Job[408:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:8, size: 7, [0|7|8] ]
Job[410:PROJECT, tasks:7, time:0, size: 7, [0|7|0] ]
Job[412:RANGE_SORT, tasks:5, time:1, size: 7, [0|5|1] ]
Job[416:PARTITION_TABLET, tasks:5, time:0, size: 7, [0|5|0] ]
Job[418:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 2, [0|1|0] ]
total time: 859, record count: 7
restrict的时间/整体时间: 785/859
result:
LO_REVENUE |D_YEAR |P_BRAND |
- - - - - - - - - - - - - - - - - -
19700225276 |1992 |MFGR#2239 |
19306484466 |1993 |MFGR#2239 |
19398411013 |1994 |MFGR#2239 |
2022-10-20 01:41:02.935 - SQL14: select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, lo_revenue desc; ;
Status: PASS, Elapsed: 5.015, Affected: 150
Info: Job[502:RESTRICT, tasks:193, time:3,856, size: 6,570,093, [0|193|3,857] ]
Job[504:DIMENSION_JOIN, tasks:193, time:5, size: 6,570,093, [0|193|4] ]
Job[506:DIMENSION_JOIN, tasks:193, time:3, size: 6,570,093, [0|193|3] ]
Job[508:DIMENSION_JOIN, tasks:193, time:3, size: 6,570,093, [0|193|3] ]
Job[511:OUTPUT_HASH_GROUP_BY, tasks:6, time:1,132, size: 900, [0|6|1,132] ]
Job[513:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:3, size: 150, [0|7|3] ]
Job[515:PROJECT, tasks:7, time:1, size: 150, [0|7|1] ]
Job[517:RANGE_SORT, tasks:7, time:1, size: 150, [0|7|1] ]
Job[521:PARTITION_TABLET, tasks:7, time:0, size: 150, [0|7|0] ]
Job[523:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 22, [0|1|1] ]
total time: 5,012, record count: 150
restrict的时间/整体时间: 3856/5012
result:
C_NATION |S_NATION |D_YEAR |LO_REVENUE |
- - - - - - - - - - - - - - - - - - - - - - - - -
JAPAN |INDIA |1992 |163691240866 |
CHINA |INDONESIA |1992 |163434081261 |
CHINA |INDIA |1992 |163430796231 |
2022-10-20 01:41:11.469 - SQL16: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES' and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ;
Status: PASS, Elapsed: 3.478, Affected: 600
Info: Job[589:RESTRICT, tasks:193, time:3,262, size: 264,531, [0|193|3,262] ]
Job[591:DIMENSION_JOIN, tasks:193, time:4, size: 264,531, [0|193|4] ]
Job[593:DIMENSION_JOIN, tasks:193, time:4, size: 264,531, [0|193|4] ]
Job[595:DIMENSION_JOIN, tasks:193, time:3, size: 264,531, [0|193|3] ]
Job[598:OUTPUT_HASH_GROUP_BY, tasks:3, time:189, size: 1,800, [0|3|189] ]
Job[600:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:5, size: 600, [0|7|5] ]
Job[602:PROJECT, tasks:7, time:1, size: 600, [0|7|1] ]
Job[604:RANGE_SORT, tasks:7, time:1, size: 600, [0|7|1] ]
Job[608:PARTITION_TABLET, tasks:7, time:0, size: 600, [0|7|0] ]
Job[610:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 100, [0|1|0] ]
total time: 3,475, record count: 600
restrict的时间/整体时间: 3262/3475
result:
C_CITY |S_CITY |D_YEAR |LO_REVENUE |
- - - - - - - - - - - - - - - - - - - - - - - -
UNITED ST3 |UNITED ST4 |1992 |1915435842 |
UNITED ST8 |UNITED ST0 |1992 |1910327375 |
UNITED ST5 |UNITED ST0 |1992 |1893024189 |
2022-10-20 01:41:16.505 - SQL18: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ;
Status: PASS, Elapsed: 1.588, Affected: 24
Info: Job[679:RESTRICT, tasks:193, time:1,525, size: 10,616, [0|193|1,525] ]
Job[681:DIMENSION_JOIN, tasks:193, time:6, size: 10,616, [0|193|6] ]
Job[683:DIMENSION_JOIN, tasks:193, time:4, size: 10,616, [0|193|4] ]
Job[685:DIMENSION_JOIN, tasks:193, time:3, size: 10,616, [0|193|3] ]
Job[688:OUTPUT_HASH_GROUP_BY, tasks:1, time:35, size: 24, [0|1|35] ]
Job[690:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 24, [0|7|2] ]
Job[692:PROJECT, tasks:7, time:0, size: 24, [0|7|0] ]
Job[694:RANGE_SORT, tasks:7, time:1, size: 24, [0|7|1] ]
Job[698:PARTITION_TABLET, tasks:7, time:0, size: 24, [0|7|0] ]
Job[700:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 3, [0|1|1] ]
total time: 1,585, record count: 24
restrict的时间/整体时间: 1525/1585
result:
C_CITY |S_CITY |D_YEAR |LO_REVENUE |
- - - - - - - - - - - - - - - - - - - - - - - -
UNITED KI1 |UNITED KI1 |1992 |1786080690 |
UNITED KI5 |UNITED KI1 |1992 |1705128984 |
UNITED KI1 |UNITED KI5 |1992 |1620054330 |
2022-10-20 01:41:19.166 - SQL20: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth = 'Dec1997' group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ;
Status: PASS, Elapsed: 1.046, Affected: 4
Info: Job[775:RESTRICT, tasks:193, time:1,012, size: 151, [0|193|1,012] ]
Job[777:DIMENSION_JOIN, tasks:109, time:4, size: 151, [0|109|4] ]
Job[779:DIMENSION_JOIN, tasks:109, time:4, size: 151, [0|109|4] ]
Job[781:DIMENSION_JOIN, tasks:109, time:3, size: 151, [0|109|3] ]
Job[784:OUTPUT_HASH_GROUP_BY, tasks:1, time:13, size: 4, [0|1|13] ]
Job[786:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:1, size: 4, [0|7|1] ]
Job[788:PROJECT, tasks:7, time:1, size: 4, [0|7|1] ]
Job[790:RANGE_SORT, tasks:2, time:0, size: 4, [0|2|0] ]
Job[794:PARTITION_TABLET, tasks:2, time:0, size: 4, [0|2|0] ]
Job[796:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 2, [0|1|1] ]
total time: 1,044, record count: 4
restrict的时间/整体时间: 1012/1044
result:
C_CITY |S_CITY |D_YEAR |LO_REVENUE |
- - - - - - - - - - - - - - - - - - - - - - - -
UNITED KI5 |UNITED KI1 |1997 |168840628 |
UNITED KI1 |UNITED KI5 |1997 |140264663 |
UNITED KI1 |UNITED KI1 |1997 |135684305 |
2022-10-20 01:41:29.956 - SQL22: select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation; ;
Status: PASS, Elapsed: 2.833, Affected: 35
Info: Job[878:RESTRICT, tasks:193, time:2,137, size: 2,882,137, [0|193|2,138] ]
Job[880:DIMENSION_JOIN, tasks:193, time:8, size: 2,882,137, [0|193|8] ]
Job[882:DIMENSION_JOIN, tasks:193, time:3, size: 2,882,137, [0|193|3] ]
Job[884:DIMENSION_JOIN, tasks:193, time:2, size: 2,882,137, [0|193|2] ]
Job[886:DIMENSION_JOIN, tasks:193, time:3, size: 2,882,137, [0|193|2] ]
Job[889:OUTPUT_HASH_GROUP_BY, tasks:6, time:663, size: 210, [0|6|662] ]
Job[891:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:1, size: 35, [0|7|1] ]
Job[893:PROJECT, tasks:7, time:1, size: 35, [0|7|1] ]
Job[895:PROJECT, tasks:7, time:0, size: 35, [0|7|0] ]
Job[897:RANGE_SORT, tasks:7, time:1, size: 35, [0|7|1] ]
Job[901:PARTITION_TABLET, tasks:7, time:0, size: 35, [0|7|0] ]
Job[903:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 1, [0|1|0] ]
total time: 2,830, record count: 35
restrict的时间/整体时间: 2137/2830
result:
D_YEAR |C_NATION |PROFIT |
- - - - - - - - - - - - - - - - - - - -
1992 |ARGENTINA |312585625436 |
1992 |BRAZIL |312719709853 |
1992 |CANADA |307040911677 |
2022-10-20 01:41:37.244 - SQL24: select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_region = 'AMERICA' and (d_year = 1997 or d_year = 1998) and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, s_nation, p_category order by d_year, s_nation, p_category; ;
Status: PASS, Elapsed: 4.462, Affected: 100
Info: Job[1005:RESTRICT, tasks:193, time:4,155, size: 694,402, [0|193|4,155] ]
Job[1007:DIMENSION_JOIN, tasks:193, time:4, size: 694,402, [0|193|4] ]
Job[1009:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|3] ]
Job[1011:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|3] ]
Job[1013:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|2] ]
Job[1016:OUTPUT_HASH_GROUP_BY, tasks:6, time:272, size: 600, [0|6|271] ]
Job[1018:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 100, [0|7|2] ]
Job[1020:PROJECT, tasks:7, time:0, size: 100, [0|7|0] ]
Job[1022:PROJECT, tasks:7, time:1, size: 100, [0|7|0] ]
Job[1024:RANGE_SORT, tasks:7, time:2, size: 100, [0|7|1] ]
Job[1028:PARTITION_TABLET, tasks:7, time:0, size: 100, [0|7|0] ]
Job[1030:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 10, [0|1|0] ]
total time: 4,460, record count: 100
restrict的时间/整体时间: 4155/4460
result:
D_YEAR |S_NATION |P_CATEGORY |PROFIT |
- - - - - - - - - - - - - - - - - - - - - - - - - -
1997 |ARGENTINA |MFGR#11 |31668757023 |
1997 |ARGENTINA |MFGR#12 |31315629143 |
1997 |ARGENTINA |MFGR#13 |31899989093 |
我这边主要做了一下内容:
- 将数据库执行结果通过Linux脚本下载下来
cat ../conf/ssb_test.sql | ./cplus.sh > ssb30_record_result.txt
- 将数据库日志中的 job 信息通过Linux脚本下载下来
./cplus.sh <<EOF
desc history;
EOF
- 编写python脚本提取 RESTRICT 时间,并组合前两步骤的结果
a = """
{具体的 job 信息}
"""
b = """
{具体的 查询结果 信息}
"""
if __name__ == '__main__':
result_a = []
# print(a.split("\n\n"))
for item in a.split("\n\n"):
result_item = '\nrestrict的时间/整体时间: '
flag = False
for item_item in item.split("\n"):
if "RESTRICT" in item_item:
flag = True
result_item += item_item.split("time:")[1].split(", ")[0] + "/"
elif "total time" in item_item:
result_item += item_item.split("total time: ")[1].split(", ")[0] + "\n"
if not flag:
continue
result_item = result_item.replace(",", "")
result_a.append(item + result_item)
result_a.reverse()
# for item in result_a:
# print(item)
b_1 = ["\n".join(item.split("\n")[1:6]) for item in b.split("\nSQL")]
result_b = []
for item in b_1:
if "|" in item:
if "\nSelects" in item:
result_b.append("result: \n" + "\n".join(item.split("\n")[0:-2]) + "\n\n")
else:
result_b.append("result: \n" + item + "\n")
if len(result_a) != 26 or len(result_b) != 26:
raise Exception("result'len should =26 and result_a'len should =26")
result = ["\n".join(item) for item in list(zip(result_a, result_b))]
result.insert(0, "d")
result = result[0:-1:2][1:]
print("\n".join(result))
二、解决线程 bug
我们知道,在 windows 中,通过任务管理器可以看CPU信息,比如下面是我在windows上的CPU信息截图:
现在问题来了,我的数据库系统到底是启用 8 线程好呢?还是 16线程好呢?
答案是 16 线程。以逻辑处理器为准!
数据库原先启用的是8线程,这就是问题所在。这块优化完毕之后,“Star Schema Benchmark 标准测试集优化”基本已终结。
三、测试结果
左边是咱们数据库,右边是 Starrocks 数据库
相关阅读:
443.【数据库】Star Schema Benchmark 标准测试集优化(一)
463.【数据库】Star Schema Benchmark 标准测试集优化(二)
网友评论