-
-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: 执行外连接的查询过于耗时, tianmu需要执行十四分钟, innodb执行外连接仅需要执行一分钟 #1777
Comments
ACK |
|
The problem is that tianmu executes LEFT_JOIN_ON. Changing the table order of the execution plan has little impact on the query time +----+-------------+---------------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | PRIMARY | a | NULL | ALL | NULL | NULL | NULL | NULL | 1016 | 100.00 | NULL |
| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 325120 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | xx | NULL | ALL | NULL | NULL | NULL | NULL | 3251200 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
| 3 | DERIVED | c1am_acct_day | NULL | ALL | NULL | NULL | NULL | NULL | 3251200 | 10.00 | Using where with pushed condition (`mbs`.`c1am_acct_day`.`DELETED_FLAG` = '0')(t0) Pckrows: 50, susp. 50 (0 empty 0 full). Conditions: 1; Using temporary; Using filesort |
+----+-------------+---------------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
But for the slow left join, we need to find a calm point. If the problem is nested loop, it needs to be redesigned |
同样是客户POC的问题, 影响客户是否使用天幕的决策,必须要尽快解决,本问题单继续对该问题做分析, 尽快处理该问题以满足客户的需求 |
对比该SQL的外连接,进行内连接则具有显著的查询性能提升,此因素需要被操作符的物理计划上的逐个分析 |
可以对比下内连接的执行计划,几乎所有的条件都被上提,而外连接却是在执行nested loop, 由于物化表没有聚集处理,导致时间复杂度暴增 |
我想想,外连接和内连接差别如此巨大,有点意思,代码写的有点屌 |
由于为客户POC拿下合同的紧迫性,在客户的现场,建议客户在自己的业务SQL中添加空值拒绝的语义以转为内连接,以在客户现场演示环节向客户展现连接查询的性能 |
在进行POC以拿下客户的同时,对外连接进行彻底的分析 |
构建的查询计划倒是符合预期,重点在于物理计划的执行 void CompiledQuery::LeftJoinOn(const TabID &temp_table, std::vector<TabID> &left_tables,
std::vector<TabID> &right_tables, const CondID &cond_id) {
CompiledQuery::CQStep s;
s.type = StepType::LEFT_JOIN_ON;
s.t1 = temp_table;
s.c1 = cond_id;
s.tables1 = left_tables;
s.tables2 = right_tables;
steps.push_back(s);
}
|
此处尤其要对比内连接的表结构见的关系, 与外连接不同,内连接对于表做了不同的处理 void CompiledQuery::InnerJoinOn(const TabID &temp_table, std::vector<TabID> &left_tables,
std::vector<TabID> &right_tables, const CondID &cond_id) {
CompiledQuery::CQStep s;
s.type = StepType::INNER_JOIN_ON;
s.t1 = temp_table;
s.c1 = cond_id;
s.tables1 = left_tables;
s.tables1.insert(s.tables1.end(), right_tables.begin(), right_tables.end());
steps.push_back(s);
}
|
这段代码写的有点水平,有点意思,虽说关系型数据库的关系不是标准的集合论,是采用的包模型,不过这块代码确实出彩,干开源的弟兄们看代码时候,可以多看看这块,有两下子 |
此连接操作在物理执行的时候在执行多重循环,更为麻烦的地方在于内连接不是一个连接操作,而是多个列组成的虚拟表的扫描过滤 |
void TempTable::AddInnerConds(Condition *cond, std::vector<TabID> &dim_aliases) {
for (uint i = 0; i < cond->Size(); i++)
for (uint j = 0; j < dim_aliases.size(); j++) (*cond)[i].left_dims[GetDimension(dim_aliases[j])] = true;
filter.AddConditions(cond, CondType::ON_INNER_FILTER);
}
|
void TempTable::AddLeftConds(Condition *cond, std::vector<TabID> &left_aliases, std::vector<TabID> &right_aliases) {
for (uint i = 0; i < cond->Size(); i++) {
for (int d = 0; d < (*cond)[i].left_dims.Size(); d++) (*cond)[i].left_dims[d] = (*cond)[i].right_dims[d] = false;
for (uint j = 0; j < left_aliases.size(); j++) (*cond)[i].left_dims[GetDimension(left_aliases[j])] = true;
for (uint j = 0; j < right_aliases.size(); j++) (*cond)[i].right_dims[GetDimension(right_aliases[j])] = true;
}
filter.AddConditions(cond, CondType::ON_LEFT_FILTER);
}
|
if (false_desc_found && outer_dims.IsEmpty()) {
all_dims.Plus(cond[0].left_dims); // for FALSE join condition
// DimensionUsed() does not mark anything
for (int i = 0; i < mind->NumOfDimensions(); i++)
if (all_dims[i])
mind->Empty(i);
return; // all done
}
|
可以看出当将T4表和T5表都与T2表做了内连接后,随后的条件过滤都只针对T2表进行 |
之所以能这么做,涉及到集合论中的语义,可以参考韦恩图中的示意 |
通过将join操作符转换为限制的做法,成功的避免了对join的耗时处理 |
traversed_dims_size: 3251200 matched_dims_size: 1016 joined_tuples: 1032256 outer_tuples_: 0 |
mysql> select count(1) from c1md_bank_acct;
+----------+
| count(1) |
+----------+
| 1016 |
+----------+
1 row in set (0.00 sec)
mysql> select count(1) from c1am_acct_day;
+----------+
| count(1) |
+----------+
| 3251200 |
+----------+
1 row in set (0.00 sec)
|
--Type for more, q to quit, c to continue without paging-- |
散列构建过程中已经出错 if (traversed_dims_.Intersects(matched_dims_) || !compatible) { |
1 similar comment
散列构建过程中已经出错 if (traversed_dims_.Intersects(matched_dims_) || !compatible) { |
(gdb) p traversed_dims_ |
[2023-05-30 03:57:03.372153] [471180] [ERROR] [parallel_hash_join.cpp:376] MSG: PrepareBeforeJoin fail, traversed_dims_[3].Intersects(matched_dims_[3]) || !compatible[true] |
对于这个问题有一些不可思议的地方, 其中一个需要引起重视的地方就是对于物化的极为苛刻的推迟,这里做下记录 |
递归处理连接的表,将所有的表都附加到一起,中间一点物化都不做 |
1 similar comment
递归处理连接的表,将所有的表都附加到一起,中间一点物化都不做 |
对浓密树做物化,再与左节点做连接 (gdb) p aliases
$6 = std::vector of length 2, capacity 2 = {-4, -4}
(gdb) p dim_aliases
$7 = std::vector of length 2, capacity 2 = {{n = -3}, {n = -4}}
|
设计的目标有这么几个:
|
(gdb) p v |
有以下几个因素被涉及:
|
对于条件的分析: #4 0x00000000030097f0 in Tianmu::core::CompiledQuery::FindSourceOfParameter (this=0x7fc7fe3aa700, tab_id=..., tmp_table=..., is_group_by=@0x7fc7fe3a7273: false)
at /root/work/stonedb-dev-202305026/storage/tianmu/optimizer/compile/compiled_query.cpp:846
#5 0x0000000002d23e5c in Tianmu::core::Query::WrapMysqlExpression (this=0x7fc7fe3aa7d0, item=0x7fc538013da0, tmp_table=..., expr=@0x7fc7fe3a7460: 0x7fc7fe3a74a0, in_where=true,
aggr_used=false) at /root/work/stonedb-dev-202305026/storage/tianmu/core/query_compile.cpp:808
#6 0x0000000002d02737 in Tianmu::core::Query::Item2CQTerm (this=0x7fc7fe3aa7d0, an_arg=0x7fc538013da0, term=..., tmp_table=..., filter_type=Tianmu::core::CondType::ON_INNER_FILTER,
negative=false, left_expr_for_subselect=0x0, oper_for_subselect=0x7fc7fe3a7a2c, base_table=...) at /root/work/stonedb-dev-202305026/storage/tianmu/core/query.cpp:1125
#7 0x0000000002d03969 in Tianmu::core::Query::ConditionNumberFromComparison (this=0x7fc7fe3aa7d0, conds=0x7fc538902590, tmp_table=..., filter_type=Tianmu::core::CondType::ON_INNER_FILTER,
and_me_filter=0x0, is_or_subtree=false, negative=false, can_cond_push=false) at /root/work/stonedb-dev-202305026/storage/tianmu/core/query.cpp:1336
#8 0x0000000002d05f31 in Tianmu::core::Query::ConditionNumber (this=0x7fc7fe3aa7d0, conds=0x7fc538902590, tmp_table=..., filter_type=Tianmu::core::CondType::ON_INNER_FILTER,
and_me_filter=0x0, is_or_subtree=false, can_cond_push=false) at /root/work/stonedb-dev-202305026/storage/tianmu/core/query.cpp:1587
#9 0x0000000002d04a2f in Tianmu::core::Query::ConditionNumber (this=0x7fc7fe3aa7d0, conds=0x7fc538006960, tmp_table=..., filter_type=Tianmu::core::CondType::ON_INNER_FILTER,
and_me_filter=0x0, is_or_subtree=false, can_cond_push=false) at /root/work/stonedb-dev-202305026/storage/tianmu/core/query.cpp:1440
#10 0x0000000002d0698c in Tianmu::core::Query::BuildConditions (this=0x7fc7fe3aa7d0, conds=0x7fc538006960, cond_id=..., cq=0x7fc7fe3aa700, tmp_table=...,
filter_type=Tianmu::core::CondType::ON_INNER_FILTER, is_zero_result=false, join_type=Tianmu::core::JoinType::JO_INNER, can_cond_push=false)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/query.cpp:1635
#11 0x0000000002d07ce7 in Tianmu::core::Query::BuildCondsIfPossible (this=0x7fc7fe3aa7d0, conds=0x7fc538006960, cond_id=..., tmp_table=..., join_type=Tianmu::core::JoinType::JO_INNER)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/query.cpp:1928
#12 0x0000000002d21ac9 in Tianmu::core::Query::AddJoins (this=0x7fc7fe3aa7d0, join=..., tmp_table=..., left_tables=std::vector of length 1, capacity 1 = {...},
right_tables=std::vector of length 0, capacity 0, in_subquery=false, first_table=@0x7fc7fe3aa4af: false, for_subq_in_where=false, use_tmp_when_no_join=false)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/query_compile.cpp:423
#13 0x0000000002d20f9b in Tianmu::core::Query::AddJoins (this=0x7fc7fe3aa7d0, join=..., tmp_table=..., left_tables=std::vector of length 1, capacity 1 = {...},
right_tables=std::vector of length 0, capacity 0, in_subquery=false, first_table=@0x7fc7fe3aa4af: false, for_subq_in_where=false, use_tmp_when_no_join=false)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/query_compile.cpp:341
#14 0x0000000002d26227 in Tianmu::core::Query::Compile (this=0x7fc7fe3aa7d0, compiled_query=0x7fc7fe3aa700, selects_list=0x7fc538005570, last_distinct=0x0, res_tab=0x0, ignore_limit=false,
left_expr_for_subselect=0x0, oper_for_subselect=0x0, ignore_minmax=false, for_subq_in_where=false) at /root/work/stonedb-dev-202305026/storage/tianmu/core/query_compile.cpp:1245
#15 0x0000000002ceb722 in Tianmu::core::Engine::Execute (this=0x61a3eb0, thd=0x7fc538000e10, lex=0x7fc538003138, result_output=0x7fc538016608, unit_for_union=0x0)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/engine_execute.cpp:472
#16 0x0000000002cea8ab in Tianmu::core::Engine::HandleSelect (this=0x61a3eb0, thd=0x7fc538000e10, lex=0x7fc538003138, result=@0x7fc7fe3aadc8: 0x7fc538016608, setup_tables_done_option=0,
res=@0x7fc7fe3aadc4: 0, is_optimize_after_tianmu=@0x7fc7fe3aadbc: 1, tianmu_free_join=@0x7fc7fe3aadc0: 1, with_insert=0)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/engine_execute.cpp:243
#17 0x0000000003084b08 in Tianmu::DBHandler::ha_my_tianmu_query (thd=0x7fc538000e10, lex=0x7fc538003138, result_output=@0x7fc7fe3aadc8: 0x7fc538016608, setup_tables_done_option=0,
res=@0x7fc7fe3aadc4: 0, is_optimize_after_tianmu=@0x7fc7fe3aadbc: 1, tianmu_free_join=@0x7fc7fe3aadc0: 1, with_insert=0)
at /root/work/stonedb-dev-202305026/storage/tianmu/sql/ha_my_tianmu.cpp:95
|
可以理解为在执行操作序列前对表和条件之间进行重新构建 |
对比下固化的执行序列 | 36 | X_1=0:int := sql.mvc(); |
| 27 | C_2=[3]:bat[:oid] := sql.tid(X_1=0:int, "sys":str, "b":str); |
| 22 | C_4=[4]:bat[:oid] := sql.tid(X_1=0:int, "sys":str, "c":str); |
| 5803 | X_5=[2]:bat[:int] := bat.pack(32:int, 32:int); |
| 21 | X_6=[2]:bat[:str] := bat.pack(".b":str, ".x":str); |
| 3902 | X_3=[2]:bat[:int] := bat.pack(0:int, 0:int); |
| 428 | X_7=[0]:bat[:int] := bat.new(nil:int); |
| 255 | X_8=[0]:bat[:int] := bat.new(nil:int); |
| 4474 | X_12=[3]:bat[:int] := sql.bind(X_1=0:int, "sys":str, "b":str, "b1":str, 0:int); |
| 7059 | X_11=[2]:bat[:str] := bat.pack("b1":str, "c1":str); |
| 5066 | X_9=[2]:bat[:int] := sql.bind(X_1=0:int, "sys":str, "a":str, "a1":str, 0:int); |
| 2933 | C_10=[2]:bat[:oid] := sql.tid(X_1=0:int, "sys":str, "a":str); |
| 8466 | X_13=[2]:bat[:str] := bat.pack("int":str, "int":str); |
| 32 | X_14=[3]:bat[:int] := algebra.projection(C_2=[3]:bat[:oid], X_12=[3]:bat[:int]); |
| 324 | C_15=[3]:bat[:oid] := bat.mirror(X_14=[3]:bat[:int]); |
| 1010 | X_16=[4]:bat[:int] := sql.bind(X_1=0:int, "sys":str, "c":str, "c1":str, 0:int); |
| 1554 | C_17=[2]:bat[:oid] := algebra.thetaselect(X_9=[2]:bat[:int], C_10=[2]:bat[:oid], 1:int, ">=":str); # select: sorted |
| 28 | X_18=[2]:bat[:int] := algebra.projection(C_17=[2]:bat[:oid], X_9=[2]:bat[:int]); |
| 6 | X_19=0@0:void := language.pass(X_9=[2]:bat[:int]); |
| 1259 | C_20=[3]:bat[:oid] := algebra.thetaselect(X_16=[4]:bat[:int], C_4=[4]:bat[:oid], 3:int, "<=":str); # select: densescan v >= vl && v <= vh (canditer_next_dense) |
| 23 | X_21=[3]:bat[:int] := algebra.projection(C_20=[3]:bat[:oid], X_16=[4]:bat[:int]); |
| 5 | X_22=0@0:void := language.pass(X_16=[4]:bat[:int]); |
| 1912 | X_23=[1]:bat[:oid] := algebra.join(X_21=[3]:bat[:int], X_18=[2]:bat[:int], nil:BAT, nil:BAT, false:bit, nil:lng); # mergejoin_int |
| 21 | X_24=[1]:bat[:int] := algebra.projection(X_23=[1]:bat[:oid], X_21=[3]:bat[:int]); |
| 8 | X_25=0@0:void := language.pass(X_21=[3]:bat[:int]); |
| 2852 | (X_26=[1]:bat[:oid], X_27=[1]:bat[:oid]) := algebra.join(X_14=[3]:bat[:int], X_24=[1]:bat[:int], nil:BAT, nil:BAT, false:bit, nil:lng); # selectjoin; select: sorted |
| 2631 | C_28=[2]:bat[:oid] := algebra.difference(C_15=[3]:bat[:oid], X_26=[1]:bat[:oid], nil:BAT, nil:BAT, false:bit, false:bit, nil:lng); # leftjoin; mergejoin_void; select: dense |
| 29 | X_29=[2]:bat[:int] := algebra.projection(C_28=[2]:bat[:oid], X_14=[3]:bat[:int]); |
| 170 | X_30=[1]:bat[:int] := algebra.projection(X_26=[1]:bat[:oid], X_14=[3]:bat[:int]); |
| 11 | X_31=0@0:void := language.pass(X_26=[1]:bat[:oid]); |
| 984 | X_32=[2]:bat[:int] := algebra.project(C_28=[2]:bat[:oid], nil:int); |
| 6 | X_33=0@0:void := language.pass(C_28=[2]:bat[:oid]); |
| 14 | X_35=0@0:void := language.pass(X_14=[3]:bat[:int]); |
| 587 | X_34=[1]:bat[:int] := algebra.projection(X_27=[1]:bat[:oid], X_24=[1]:bat[:int]); |
| 14 | X_36=0@0:void := language.pass(X_24=[1]:bat[:int]); |
| 299 | X_37=[1]:bat[:int] := bat.append(X_8=[1]:bat[:int], X_34=[1]:bat[:int], true:bit); |
| 18 | X_38=[3]:bat[:int] := bat.append(X_37=[3]:bat[:int], X_32=[2]:bat[:int], true:bit); |
| 2123 | X_39=[1]:bat[:int] := bat.append(X_7=[1]:bat[:int], X_30=[1]:bat[:int], true:bit); |
| 24 | X_40=[3]:bat[:int] := bat.append(X_39=[3]:bat[:int], X_29=[2]:bat[:int], true:bit); |
| 28377 | barrier X_41=false:bit := language.dataflow(); |
| 28 | X_42=1:int := sql.resultSet(X_6=[2]:bat[:str], X_11=[2]:bat[:str], X_13=[2]:bat[:str], X_5=[2]:bat[:int], X_3=[2]:bat[:int], X_40=[3]:bat[:int], X_38=[3]:bat[:int]);
|
对于嵌套内连接直接物化后进行连接 X_23=[1]:bat[:oid] := algebra.join(X_21=[3]:bat[:int], X_18=[2]:bat[:int], nil:BAT, nil:BAT, false:bit, nil:lng); # mergejoin_int
|
这里对条件的处理存在不同,stonedb完全是以条件为中心进行关联,而固化的操作则是以列的向量处理为依据 | 1554 | C_17=[2]:bat[:oid] := algebra.thetaselect(X_9=[2]:bat[:int], C_10=[2]:bat[:oid], 1:int, ">=":str); # select: sorted |
|
[2023-06-01 04:59:39.838678] [657043] [ERROR] [parallel_hash_join.cpp:376] MSG: PrepareBeforeJoin fail, traversed_dims_[3].Intersects(matched_dims_[3]) || !compatible[true] |
这里需要注意,当使用内连接物化的操作时,构建散列探测的过程将会出错 |
可以看出是先将条件从连接上推,然后是尽可能的下推到内连接的表中 |
修复完客户上一个紧急的查询错误,继续回到正题处理这个事情 |
在列存储的数据库中对于此处场景的处理与mysql/sql的处理逻辑有诸多不同, 这里仅关注比较经典的列存储数据库的设计策略,比较明显的地方有:
|
对于表连接的处理mysql/sql层使用嵌套循环并将条件做处理, 对于tianmu来说这种优化起到了反效果, 对于列的处理来说,执行逻辑类似于以下伪代码, 而不能采用嵌套循环的措施进行优化 left outer join (
| table("sys"."b") [ "b"."b1" ] ,
| join (
| | select (
| | | table("sys"."a") [ "a"."a1" ]
| | ) [ "a"."a1" >= int "1" ],
| | select (
| | | table("sys"."c") [ "c"."c1" ]
| | ) [ "c"."c1" <= int "3" ]
| ) [ "a"."a1" = "c"."c1" ]
) [ "b"."b1" = "c"."c1" ]
|
+---------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| c1am_acct_day | 0 | PRIMARY | 1 | ROW_ID | A | 3251257 | NULL | NULL | | BTREE | | |
| c1am_acct_day | 0 | IDX_AMACCTDAY02 | 1 | ACCOUNT_ID | A | 813 | NULL | NULL | | BTREE | | |
| c1am_acct_day | 0 | IDX_AMACCTDAY02 | 2 | FISCAL_DATE | A | 2755421 | NULL | NULL | YES | BTREE | | |
| c1am_acct_day | 1 | IDX_AMACCTDAY01 | 1 | TENANT_ID | A | 9 | NULL | NULL | | BTREE | | |
| c1am_acct_day | 1 | IDX_AMACCTDAY03 | 1 | ACCOUNT_ID | A | 1524 | NULL | NULL | | BTREE | | |
| c1am_acct_day | 1 | IDX_AMACCTDAY04 | 1 | FISCAL_DATE | A | 3513 | NULL | NULL | YES | BTREE | | |
+---------------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|
The text was updated successfully, but these errors were encountered: