Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](nereids) move some topn-join rules from rbo to cbo #46773

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

englefly
Copy link
Contributor

What problem does this PR solve?

the following rules only applies on pattern:
topn->outerJoin
if they are used a rbo rules, we miss the opportunity to optimize the plan, when the initial plan pattern is topn->innerJoin.
to utilize the join reorder, the are moved to cbo rules, and when bottom outer join reorders as the root of join cluster, these rules could be applied.

PushDownTopNThroughJoin
PushDownLimitDistinctThroughJoin
PushDownTopNDistinctThroughJoin

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32679 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 989d8c8efc27ed944f164daf32b98e1475cc2091, data reload: false

------ Round 1 ----------------------------------
q1	17579	6193	6046	6046
q2	2049	303	167	167
q3	10455	1243	759	759
q4	10308	863	437	437
q5	9122	2202	1967	1967
q6	217	178	145	145
q7	892	763	588	588
q8	9228	1390	1191	1191
q9	5299	4834	4932	4834
q10	6739	2276	1871	1871
q11	473	287	262	262
q12	345	359	215	215
q13	18292	3618	3062	3062
q14	249	251	216	216
q15	545	504	509	504
q16	644	614	592	592
q17	591	848	340	340
q18	6810	6489	6332	6332
q19	2181	953	561	561
q20	315	334	201	201
q21	2968	2216	2080	2080
q22	371	345	309	309
Total cold run time: 105672 ms
Total hot run time: 32679 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6337	6255	6278	6255
q2	234	325	249	249
q3	2243	2638	2289	2289
q4	1496	1852	1461	1461
q5	4303	4756	4847	4756
q6	185	172	142	142
q7	2051	1964	1840	1840
q8	2662	2860	2710	2710
q9	7222	7230	7257	7230
q10	3062	3249	2751	2751
q11	587	541	514	514
q12	731	755	658	658
q13	3581	3816	3325	3325
q14	294	308	289	289
q15	578	519	507	507
q16	649	683	652	652
q17	1205	1747	1264	1264
q18	7835	7444	7336	7336
q19	859	1164	1088	1088
q20	1995	2002	1905	1905
q21	5838	5147	5209	5147
q22	619	646	601	601
Total cold run time: 54566 ms
Total hot run time: 52969 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195740 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 989d8c8efc27ed944f164daf32b98e1475cc2091, data reload: false

query1	1305	976	938	938
query2	6395	2331	2362	2331
query3	11083	4758	4779	4758
query4	32535	23631	23281	23281
query5	3499	595	449	449
query6	271	188	177	177
query7	3972	489	304	304
query8	289	247	240	240
query9	9212	2744	2747	2744
query10	481	317	256	256
query11	17708	15448	14999	14999
query12	167	106	108	106
query13	1549	518	391	391
query14	9764	7289	7585	7289
query15	270	218	192	192
query16	7420	622	438	438
query17	1567	778	626	626
query18	2018	416	317	317
query19	207	191	194	191
query20	126	117	113	113
query21	203	126	104	104
query22	4486	4490	4403	4403
query23	34036	33811	33494	33494
query24	6449	2350	2366	2350
query25	508	463	404	404
query26	857	286	162	162
query27	2037	480	351	351
query28	5664	2494	2491	2491
query29	633	562	424	424
query30	214	209	164	164
query31	950	875	821	821
query32	91	58	55	55
query33	500	343	298	298
query34	794	864	525	525
query35	772	821	738	738
query36	1022	1051	999	999
query37	132	100	79	79
query38	4083	4336	4090	4090
query39	1509	1480	1430	1430
query40	207	127	101	101
query41	55	49	47	47
query42	127	106	106	106
query43	530	542	510	510
query44	1398	843	846	843
query45	183	178	170	170
query46	876	1053	664	664
query47	1864	1882	1842	1842
query48	392	415	320	320
query49	729	486	389	389
query50	675	673	399	399
query51	7044	7024	6951	6951
query52	101	100	89	89
query53	236	268	187	187
query54	493	512	430	430
query55	89	90	86	86
query56	248	251	248	248
query57	1266	1192	1180	1180
query58	250	235	250	235
query59	3163	3428	3143	3143
query60	269	263	257	257
query61	138	115	112	112
query62	820	792	709	709
query63	225	192	190	190
query64	3734	1035	671	671
query65	3333	3248	3195	3195
query66	898	412	302	302
query67	16398	15654	15413	15413
query68	9296	709	523	523
query69	475	285	253	253
query70	1214	1138	1094	1094
query71	428	351	246	246
query72	6463	3903	3813	3813
query73	672	753	369	369
query74	9876	9105	8921	8921
query75	3927	3157	2682	2682
query76	3586	1175	767	767
query77	761	376	363	363
query78	10036	10008	9323	9323
query79	3265	823	593	593
query80	752	516	432	432
query81	479	287	238	238
query82	600	149	132	132
query83	186	182	153	153
query84	283	89	77	77
query85	838	342	297	297
query86	361	307	282	282
query87	4491	4323	4436	4323
query88	3352	2229	2178	2178
query89	418	321	285	285
query90	1849	192	191	191
query91	134	148	110	110
query92	62	54	54	54
query93	1847	866	528	528
query94	652	383	291	291
query95	336	261	259	259
query96	501	612	277	277
query97	2818	2947	2835	2835
query98	214	203	193	193
query99	1639	1478	1347	1347
Total cold run time: 292330 ms
Total hot run time: 195740 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 989d8c8efc27ed944f164daf32b98e1475cc2091, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.03
query3	0.24	0.07	0.07
query4	1.62	0.11	0.12
query5	0.41	0.43	0.43
query6	1.14	0.65	0.64
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.50	0.50
query10	0.56	0.57	0.54
query11	0.14	0.10	0.10
query12	0.15	0.11	0.10
query13	0.60	0.60	0.60
query14	2.74	2.87	2.75
query15	0.89	0.83	0.81
query16	0.38	0.38	0.38
query17	1.01	1.07	1.06
query18	0.22	0.21	0.20
query19	1.95	1.82	2.00
query20	0.01	0.01	0.01
query21	15.36	0.93	0.61
query22	0.75	0.93	0.68
query23	15.13	1.46	0.53
query24	3.04	1.10	0.98
query25	0.12	0.17	0.12
query26	0.40	0.15	0.13
query27	0.06	0.05	0.04
query28	13.63	1.61	1.05
query29	12.58	3.95	3.27
query30	0.26	0.10	0.06
query31	2.80	0.60	0.38
query32	3.23	0.55	0.45
query33	3.13	3.06	3.04
query34	16.67	5.11	4.48
query35	4.54	4.56	4.50
query36	0.64	0.49	0.48
query37	0.10	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.03	0.02
query40	0.17	0.13	0.13
query41	0.08	0.03	0.03
query42	0.04	0.02	0.03
query43	0.04	0.03	0.02
Total cold run time: 105.66 s
Total hot run time: 31.34 s

@wm1581066 wm1581066 requested a review from morrySnow January 13, 2025 02:07
@wm1581066 wm1581066 added the usercase Important user case type label label Jan 13, 2025
@@ -395,11 +392,8 @@ public class Rewriter extends AbstractBatchJobExecutor {
topDown(new SplitLimit()),
topDown(
new PushDownLimit(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. don't remove topn push down rule from rbo stage
  2. enhance the topn push down with the limit threshold protection
  3. try to find why topn not pushing down to the inner join node
  4. put this rule to cbo stage to extend the pushing down scenario is fine

@englefly englefly force-pushed the rewrite-rule-to-cbo branch from 989d8c8 to 41d9205 Compare January 13, 2025 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants