Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nereids] consider numNulls in filter estimation #29184

Merged
merged 5 commits into from
Jan 2, 2024

Conversation

xzj7019
Copy link
Contributor

@xzj7019 xzj7019 commented Dec 27, 2023

Proposed changes

consider numNulls in filter estimation

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@xzj7019 xzj7019 marked this pull request as draft December 27, 2023 13:58
@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 27, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit cb60eb4f8498ecd684c0338c2f6515412617c36d, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5032	4646	4654	4646
q2	364	151	158	151
q3	1471	1268	1162	1162
q4	1141	952	900	900
q5	3182	3150	3139	3139
q6	248	130	128	128
q7	1052	489	490	489
q8	2263	2255	2241	2241
q9	6715	6676	6686	6676
q10	3189	3279	3266	3266
q11	332	213	207	207
q12	344	209	207	207
q13	4128	3443	3433	3433
q14	242	210	223	210
q15	575	523	524	523
q16	443	384	383	383
q17	1050	773	571	571
q18	7188	6858	6788	6788
q19	1625	1629	1637	1629
q20	516	291	298	291
q21	3195	2723	2748	2723
q22	364	310	309	309
Total cold run time: 44659 ms
Total hot run time: 40072 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4589	4567	4560	4560
q2	273	164	168	164
q3	3384	3361	3350	3350
q4	2225	2198	2198	2198
q5	5704	5713	5708	5708
q6	242	122	119	119
q7	2379	1855	1844	1844
q8	3605	3604	3598	3598
q9	9028	8988	8961	8961
q10	3828	3908	3909	3908
q11	489	360	360	360
q12	765	603	590	590
q13	3895	3163	3217	3163
q14	287	248	246	246
q15	585	523	517	517
q16	493	442	464	442
q17	1976	1951	1957	1951
q18	8676	8215	8170	8170
q19	1776	1751	1743	1743
q20	2246	1931	1929	1929
q21	6094	5745	5751	5745
q22	538	454	452	452
Total cold run time: 63077 ms
Total hot run time: 59718 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.31 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 27.9 seconds inserted 10000000 Rows, about 358K ops/s
storage size: 17183613537 Bytes

double numNulls = stats.numNulls;
double ndv = stats.ndv;
if (numNulls > rowCount - ndv) {
numNulls = rowCount - ndv > 0 ? rowCount - ndv : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"numNulls = rowCount-ndv" this is not true.
consider col values (1, 1, 1, null)
rowcount=4, ndv=1
then we get numNulls = 4-1=3, but it is 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current impl may lead to the inconsistent status between rowCount and ndv, so it will go into this handling logic unexpectly. I will comment it first and refine it in future.

@xzj7019 xzj7019 force-pushed the fix_num_nulls branch 2 times, most recently from 5e004d0 to 0cad7be Compare December 28, 2023 09:43
@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 28, 2023

run buildall

1 similar comment
@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 28, 2023

run buildall

@xzj7019 xzj7019 marked this pull request as ready for review December 28, 2023 11:06
@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 28, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 49.98 seconds
stream load tsv: 565 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.4 seconds inserted 10000000 Rows, about 340K ops/s
storage size: 17183446384 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 752dc13a4eddd300c790cfa012ed6411da5ab076, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4982	4647	4630	4630
q2	375	164	155	155
q3	1468	1315	1248	1248
q4	1130	921	890	890
q5	3197	3180	3141	3141
q6	254	129	125	125
q7	1068	498	491	491
q8	2275	2258	2259	2258
q9	6747	6700	6723	6700
q10	3204	3297	3260	3260
q11	321	213	210	210
q12	347	206	207	206
q13	4186	3440	3442	3440
q14	240	211	223	211
q15	574	518	520	518
q16	445	390	390	390
q17	1047	789	566	566
q18	7121	6901	6789	6789
q19	1641	1641	1638	1638
q20	576	303	304	303
q21	3202	2719	2757	2719
q22	370	305	314	305
Total cold run time: 44770 ms
Total hot run time: 40193 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4592	4592	4546	4546
q2	274	168	165	165
q3	3377	3373	3352	3352
q4	2236	2213	2201	2201
q5	5722	5743	5717	5717
q6	241	118	118	118
q7	2385	1844	1898	1844
q8	3616	3620	3631	3620
q9	9053	8987	8942	8942
q10	3813	3882	3893	3882
q11	483	375	384	375
q12	770	588	590	588
q13	3881	3232	3187	3187
q14	290	253	260	253
q15	574	520	519	519
q16	493	440	469	440
q17	1969	1958	1957	1957
q18	8665	8228	8363	8228
q19	1776	1763	1764	1763
q20	2242	1940	1938	1938
q21	6128	5789	5746	5746
q22	563	469	463	463
Total cold run time: 63143 ms
Total hot run time: 59844 ms

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 28, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 50.99 seconds
stream load tsv: 570 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.7 seconds inserted 10000000 Rows, about 336K ops/s
storage size: 17183927583 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 8f80f8cc6be9e177354700fee5529cb2a318be39, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5337	5137	5104	5104
q2	404	164	158	158
q3	1449	1270	1190	1190
q4	1080	856	834	834
q5	3162	3073	2927	2927
q6	232	138	139	138
q7	948	535	560	535
q8	2135	2278	2220	2220
q9	6840	6812	6802	6802
q10	3201	3158	3199	3158
q11	344	236	229	229
q12	388	241	241	241
q13	4408	3627	3619	3619
q14	263	215	222	215
q15	617	557	567	557
q16	457	398	391	391
q17	1046	600	561	561
q18	7083	6659	6754	6659
q19	1641	1610	1562	1562
q20	597	333	367	333
q21	2880	2402	2500	2402
q22	409	316	319	316
Total cold run time: 44921 ms
Total hot run time: 40151 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5076	5046	5107	5046
q2	350	259	269	259
q3	3378	3324	3325	3324
q4	2173	2011	1965	1965
q5	5930	5957	5947	5947
q6	232	135	131	131
q7	2392	1925	1952	1925
q8	3533	3660	3661	3660
q9	8989	8900	8965	8900
q10	3918	3932	3947	3932
q11	589	469	474	469
q12	790	643	653	643
q13	3878	3210	3181	3181
q14	298	274	300	274
q15	627	554	556	554
q16	562	535	528	528
q17	2050	1816	1794	1794
q18	8672	8378	8397	8378
q19	1764	1697	1703	1697
q20	2294	2004	1982	1982
q21	5734	5334	5334	5334
q22	565	526	483	483
Total cold run time: 63794 ms
Total hot run time: 60406 ms

@@ -117,6 +118,11 @@ public Statistics visitCompoundPredicate(CompoundPredicate predicate, Estimation
colBuilder.setMinValue(union.getLow()).setMinExpr(union.getLowExpr())
.setMaxValue(union.getHigh()).setMaxExpr(union.getHighExpr())
.setNdv(union.getDistinctValues());
if (!(leftExpr instanceof IsNull || rightExpr instanceof IsNull)) {
colBuilder.setNumNulls(0);
Copy link
Contributor

@englefly englefly Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(A is null and B>0) or (A is not null and B>0)
=>A.numNulls = 0
this is not true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the max numNulls instead.

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 29, 2023

run buildall

Comment on lines 224 to 225
} else if (statsForRight.isUnKnown) {
selectivity = 0.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right is a literal, why it could be unknow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will remove it.

Comment on lines +401 to +403
if (not.child().getInputSlots().size() == 1 && !(child instanceof IsNull)) {
// only consider the single column numNull, otherwise, ignore
rowCount = Math.max(rowCount - originColStats.numNulls, 1);
statisticsBuilder.setRowCount(rowCount);
}
Copy link
Contributor

@keanji-x keanji-x Dec 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could cause heavy errors when the child is unknown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the precondition has been checked before.

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools

Tpch sf100 test result on commit 393db64fd453dcf349373cda22dac415d07f6d75, data reload: false

------ Round 1 ----------------------------------
q1	17687	5687	5095	5095
q2	2018	162	150	150
q3	10604	1113	1134	1113
q4	10214	776	834	776
q5	7807	2972	2822	2822
q6	216	140	141	140
q7	901	512	557	512
q8	9276	1969	1994	1969
q9	6811	6384	6338	6338
q10	8280	3065	2993	2993
q11	448	213	218	213
q12	402	242	245	242
q13	18004	3635	3622	3622
q14	260	220	211	211
q15	601	546	540	540
q16	455	396	393	393
q17	953	528	512	512
q18	7302	6686	6656	6656
q19	1591	1311	1382	1311
q20	729	350	354	350
q21	2784	2377	2425	2377
q22	387	313	331	313
Total cold run time: 107730 ms
Total hot run time: 38648 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5104	5006	5010	5006
q2	347	263	253	253
q3	3302	3272	3266	3266
q4	2134	2004	1991	1991
q5	5793	5778	5768	5768
q6	225	135	135	135
q7	2299	1907	1908	1907
q8	3593	3446	3462	3446
q9	8803	8793	8759	8759
q10	3801	3833	3864	3833
q11	604	488	478	478
q12	810	654	645	645
q13	6093	3257	3255	3255
q14	282	264	287	264
q15	619	564	542	542
q16	551	524	519	519
q17	1899	1764	1762	1762
q18	8701	8414	8316	8316
q19	1625	1613	1598	1598
q20	2197	2004	1960	1960
q21	5668	5304	5242	5242
q22	554	455	498	455
Total cold run time: 65004 ms
Total hot run time: 59400 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.72 seconds
stream load tsv: 576 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.4 seconds inserted 10000000 Rows, about 352K ops/s
storage size: 17184299542 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools

Tpch sf100 test result on commit 393db64fd453dcf349373cda22dac415d07f6d75, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5421	5170	5170	5170
q2	390	165	159	159
q3	1460	1153	1198	1153
q4	1094	851	816	816
q5	3159	3071	3112	3071
q6	231	140	139	139
q7	985	570	529	529
q8	2175	2315	2237	2237
q9	6728	6679	6685	6679
q10	3225	3201	3157	3157
q11	363	226	224	224
q12	383	251	242	242
q13	4359	3619	3663	3619
q14	247	222	220	220
q15	618	559	557	557
q16	463	422	423	422
q17	1051	539	580	539
q18	7085	6740	7131	6740
q19	1656	1581	1589	1581
q20	637	358	362	358
q21	2868	2435	2498	2435
q22	382	315	320	315
Total cold run time: 44980 ms
Total hot run time: 40362 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5128	5056	5113	5056
q2	338	258	263	258
q3	3407	3355	3330	3330
q4	2155	2078	2029	2029
q5	5923	5893	5914	5893
q6	236	134	134	134
q7	2408	1927	1894	1894
q8	3557	3632	3665	3632
q9	9056	8996	9062	8996
q10	3909	3939	3946	3939
q11	596	491	478	478
q12	800	649	636	636
q13	3870	3189	3224	3189
q14	318	271	290	271
q15	629	547	549	547
q16	553	516	483	483
q17	2012	1849	1814	1814
q18	8724	8336	8386	8336
q19	1765	1708	1716	1708
q20	2299	1987	1980	1980
q21	5658	5307	5310	5307
q22	597	492	492	492
Total cold run time: 63938 ms
Total hot run time: 60402 ms

@doris-robot
Copy link

TPC-DS test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpcds-tools

TPC-DS sf100 test result on commit 393db64fd453dcf349373cda22dac415d07f6d75, data reload: false

run tpcds-sf100 query with default conf and session variables
query1	907	367	341	341
query2	6431	1947	2160	1947
query3	6662	223	225	223
query4	27184	22631	22458	22458
query5	5676	563	587	563
query6	275	188	193	188
query7	4570	284	287	284
query8	241	220	199	199
query9	8147	2709	2724	2709
query10	457	241	233	233
query11	16262	15531	15419	15419
query12	141	86	82	82
query13	1642	333	333	333
query14	11445	7327	7288	7288
query15	229	189	191	189
query16	6389	299	291	291
query17	1773	509	513	509
query18	1919	278	269	269
query19	216	145	145	145
query20	87	87	81	81
query21	186	100	97	97
query22	5017	4579	4784	4579
query23	32074	31392	31219	31219
query24	11567	2867	2797	2797
query25	602	364	350	350
query26	1575	153	153	153
query27	2892	292	289	289
query28	7032	1986	1963	1963
query29	1604	421	414	414
query30	285	145	149	145
query31	979	803	791	791
query32	93	67	61	61
query33	719	283	297	283
query34	839	452	464	452
query35	883	778	786	778
query36	1403	1214	1240	1214
query37	113	83	86	83
query38	3371	3268	3261	3261
query39	1337	1290	1275	1275
query40	300	99	98	98
query41	39	36	36	36
query42	102	94	92	92
query43	531	516	544	516
query44	1137	779	785	779
query45	204	187	190	187
query46	1087	656	694	656
query47	1625	1636	1569	1569
query48	346	269	261	261
query49	1168	340	339	339
query50	792	355	354	354
query51	5386	5197	5256	5197
query52	97	91	92	91
query53	225	151	155	151
query54	1406	634	647	634
query55	109	92	93	92
query56	220	208	206	206
query57	1018	987	963	963
query58	234	218	220	218
query59	2860	2649	2596	2596
query60	294	258	258	258
query61	94	100	94	94
query62	654	492	474	474
query63	178	155	156	155
query64	5968	1770	1800	1770
query65	3350	3259	3282	3259
query66	1119	346	336	336
query67	15529	15109	15064	15064
query68	13472	560	515	515
query69	604	277	262	262
query70	1659	1575	1539	1539
query71	575	235	239	235
query72	5531	3587	3637	3587
query73	1758	332	324	324
query74	7028	6427	6493	6427
query75	5714	2297	2298	2297
query76	5941	1140	1115	1115
query77	910	301	281	281
query78	9077	8682	8487	8487
query79	5606	523	525	523
query80	1122	386	371	371
query81	511	210	218	210
query82	306	129	125	125
query83	401	142	140	140
query84	255	58	61	58
query85	1145	288	271	271
query86	409	404	394	394
query87	3568	3410	3463	3410
query88	3600	2522	2503	2503
query89	342	274	275	274
query90	1912	240	247	240
query91	121	89	94	89
query92	73	61	59	59
query93	3041	448	497	448
query94	858	221	221	221
query95	524	470	476	470
query96	655	339	338	338
query97	4324	4150	4171	4150
query98	219	198	195	195
query99	1229	858	854	854
Total cold run time: 301786 ms
Total hot run time: 180812 ms

@@ -568,10 +581,33 @@ public Statistics visitLike(Like like, EstimationContext context) {
"col stats not found. slot=%s in %s",
like.left().toSql(), like.toSql());
ColumnStatisticBuilder colBuilder = new ColumnStatisticBuilder(origin);
colBuilder.setNdv(origin.ndv * DEFAULT_LIKE_COMPARISON_SELECTIVITY).setNumNulls(0);
double selectivity = origin.ndv * DEFAULT_LIKE_COMPARISON_SELECTIVITY;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is 'ndv' not selectivity

@xzj7019
Copy link
Contributor Author

xzj7019 commented Dec 29, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools

Tpch sf100 test result on commit 0f692ebd0e2d405b563ccc79601866e92e0e20aa, data reload: false

------ Round 1 ----------------------------------
q1	17640	5083	5094	5083
q2	2008	168	148	148
q3	10525	1114	1148	1114
q4	10174	814	822	814
q5	7817	2954	2885	2885
q6	214	137	141	137
q7	939	519	563	519
q8	9296	2008	2026	2008
q9	6845	6379	6348	6348
q10	8229	3055	3040	3040
q11	422	227	220	220
q12	400	246	245	245
q13	18026	3636	3617	3617
q14	243	217	223	217
q15	595	557	530	530
q16	449	390	424	390
q17	947	478	472	472
q18	7262	6655	6624	6624
q19	1591	1376	1382	1376
q20	712	346	348	346
q21	2827	2388	2442	2388
q22	375	323	330	323
Total cold run time: 107536 ms
Total hot run time: 38844 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5089	5081	5015	5015
q2	333	253	248	248
q3	3317	3282	3254	3254
q4	2117	2033	2015	2015
q5	5810	5772	5757	5757
q6	219	131	136	131
q7	2352	1920	1859	1859
q8	3363	3452	3456	3452
q9	8860	8798	8791	8791
q10	3808	3842	3834	3834
q11	599	479	483	479
q12	814	656	642	642
q13	7082	3259	3206	3206
q14	299	284	278	278
q15	610	546	555	546
q16	582	507	516	507
q17	1927	1772	1765	1765
q18	8696	8265	8325	8265
q19	1607	1574	1579	1574
q20	2186	1984	1969	1969
q21	5617	5203	5332	5203
q22	562	456	462	456
Total cold run time: 65849 ms
Total hot run time: 59246 ms

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools

Tpch sf100 test result on commit 0f692ebd0e2d405b563ccc79601866e92e0e20aa, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5407	5118	5125	5118
q2	387	186	158	158
q3	1456	1178	1113	1113
q4	1095	879	871	871
q5	3100	3165	3156	3156
q6	226	139	137	137
q7	981	543	545	543
q8	2181	2163	2289	2163
q9	6700	6672	6666	6666
q10	3202	3144	3215	3144
q11	362	229	216	216
q12	392	247	247	247
q13	4370	3667	3618	3618
q14	264	222	224	222
q15	631	569	559	559
q16	458	402	411	402
q17	1050	546	532	532
q18	7066	6787	6801	6787
q19	1649	1661	1611	1611
q20	626	383	360	360
q21	2897	2524	2471	2471
q22	398	317	319	317
Total cold run time: 44898 ms
Total hot run time: 40411 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5130	5031	5065	5031
q2	330	247	256	247
q3	3375	3327	3312	3312
q4	2146	2038	2047	2038
q5	5967	5938	5935	5935
q6	232	130	133	130
q7	2409	1942	1945	1942
q8	3557	3660	3680	3660
q9	9092	9064	9044	9044
q10	3907	3907	3938	3907
q11	573	487	509	487
q12	820	635	660	635
q13	3897	3238	3194	3194
q14	297	277	271	271
q15	637	570	567	567
q16	570	502	522	502
q17	2036	1791	1782	1782
q18	8779	8330	8396	8330
q19	1767	1709	1683	1683
q20	2286	2006	1998	1998
q21	5736	5367	5338	5338
q22	551	490	523	490
Total cold run time: 64094 ms
Total hot run time: 60523 ms

@doris-robot
Copy link

TPC-DS test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpcds-tools

TPC-DS sf100 test result on commit 0f692ebd0e2d405b563ccc79601866e92e0e20aa, data reload: false

run tpcds-sf100 query with default conf and session variables
query1	920	356	342	342
query2	6420	2174	1965	1965
query3	6640	229	221	221
query4	26698	22482	22428	22428
query5	5209	588	569	569
query6	274	187	176	176
query7	4569	270	268	268
query8	231	219	205	205
query9	8154	2731	2743	2731
query10	445	247	259	247
query11	16161	15712	15557	15557
query12	146	79	77	77
query13	1629	318	336	318
query14	11833	7340	7274	7274
query15	245	188	193	188
query16	6493	303	293	293
query17	1920	528	528	528
query18	2223	303	273	273
query19	269	143	141	141
query20	87	82	80	80
query21	183	99	95	95
query22	5249	5085	5079	5079
query23	32128	31350	31494	31350
query24	11922	2855	2800	2800
query25	592	353	351	351
query26	1737	144	149	144
query27	2868	288	289	288
query28	6990	1981	1959	1959
query29	2088	423	412	412
query30	292	143	147	143
query31	960	757	788	757
query32	91	62	63	62
query33	729	286	263	263
query34	850	465	467	465
query35	880	763	754	754
query36	1327	1264	1288	1264
query37	192	84	89	84
query38	3396	3278	3246	3246
query39	1316	1300	1271	1271
query40	302	97	92	92
query41	40	35	34	34
query42	102	93	100	93
query43	549	541	469	469
query44	1127	765	774	765
query45	195	184	187	184
query46	1079	644	649	644
query47	1740	1620	1613	1613
query48	352	260	259	259
query49	1204	341	332	332
query50	823	375	379	375
query51	5358	5200	5264	5200
query52	98	83	89	83
query53	217	154	148	148
query54	1376	617	629	617
query55	109	93	86	86
query56	216	207	206	206
query57	1052	975	965	965
query58	233	211	212	211
query59	2860	2573	2676	2573
query60	284	246	275	246
query61	83	81	81	81
query62	660	473	462	462
query63	175	151	158	151
query64	5908	1730	1804	1730
query65	3350	3263	3300	3263
query66	1312	329	324	324
query67	15368	15452	15335	15335
query68	11154	520	520	520
query69	587	268	269	268
query70	1598	1600	1506	1506
query71	575	237	226	226
query72	5372	3647	3576	3576
query73	1318	334	322	322
query74	7453	6455	6518	6455
query75	5351	2294	2276	2276
query76	4835	1139	1099	1099
query77	892	263	299	263
query78	9040	8630	8543	8543
query79	6053	511	516	511
query80	1954	383	390	383
query81	521	211	210	210
query82	347	128	122	122
query83	255	144	150	144
query84	259	55	54	54
query85	2435	291	285	285
query86	388	367	370	367
query87	3527	3353	3366	3353
query88	3144	2422	2443	2422
query89	366	272	278	272
query90	2014	249	235	235
query91	120	97	96	96
query92	71	56	56	56
query93	2529	498	442	442
query94	892	213	216	213
query95	510	483	463	463
query96	653	335	333	333
query97	4295	4173	4183	4173
query98	212	211	192	192
query99	1183	792	799	792
Total cold run time: 300431 ms
Total hot run time: 181381 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.39 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17184045772 Bytes

//if (numNulls > rowCount - ndv) {
// numNulls = rowCount - ndv > 0 ? rowCount - ndv : 0;
//}
double notNullSel = rowCount <= 1.0 ? 1.0 : 1 - getValidSelectivity(numNulls / rowCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if rowCount=0, notNullSel is NaN. And this NaN pollute following derivation.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 2, 2024
Copy link
Contributor

github-actions bot commented Jan 2, 2024

PR approved by at least one committer and no changes requested.

1 similar comment
Copy link
Contributor

github-actions bot commented Jan 2, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Jan 2, 2024

PR approved by anyone and no changes requested.

1 similar comment
Copy link
Contributor

github-actions bot commented Jan 2, 2024

PR approved by anyone and no changes requested.

@xzj7019 xzj7019 requested review from englefly and keanji-x January 2, 2024 05:42
@englefly englefly merged commit 90b2ee9 into apache:master Jan 2, 2024
31 of 33 checks passed
seawinde pushed a commit to seawinde/doris that referenced this pull request Jan 3, 2024
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.4-merged dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants