Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cleanup](move-memtable) remove namespace stream_load #27441

Merged
merged 3 commits into from
Dec 30, 2023

Conversation

kaijchen
Copy link
Contributor

@kaijchen kaijchen commented Nov 22, 2023

Proposed changes

The namespace stream_load was used in legacy code.
After we convert sink v2 to tablet writer v2, this namespace should be removed.

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.57% (8449/23106)
Line Coverage: 28.86% (68679/237942)
Region Coverage: 27.83% (35518/127634)
Branch Coverage: 24.57% (18122/73766)
Coverage Report: http://coverage.selectdb-in.cc/coverage/432d6edc948cd9bfbd5c5fa5b5e1e0cf4771d080_432d6edc948cd9bfbd5c5fa5b5e1e0cf4771d080/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 432d6edc948cd9bfbd5c5fa5b5e1e0cf4771d080, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4927	4689	4656	4656
q2	355	160	160	160
q3	2022	1914	1889	1889
q4	1396	1293	1299	1293
q5	3954	3934	4035	3934
q6	249	133	132	132
q7	1451	882	878	878
q8	2769	2792	2780	2780
q9	9687	9545	10275	9545
q10	3476	3528	3554	3528
q11	378	248	252	248
q12	444	291	292	291
q13	4554	3821	3833	3821
q14	317	281	285	281
q15	597	532	519	519
q16	664	577	595	577
q17	1133	973	933	933
q18	7848	7421	7467	7421
q19	1683	1715	1674	1674
q20	582	295	309	295
q21	4454	4031	4061	4031
q22	473	377	364	364
Total cold run time: 53413 ms
Total hot run time: 49250 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4578	4567	4570	4567
q2	337	223	279	223
q3	4015	4009	4003	4003
q4	2728	2688	2693	2688
q5	9671	9672	9631	9631
q6	244	124	129	124
q7	3010	2514	2483	2483
q8	4485	4504	4485	4485
q9	12926	12824	12878	12824
q10	4109	4158	4170	4158
q11	789	634	650	634
q12	978	804	810	804
q13	4271	3551	3555	3551
q14	374	356	355	355
q15	571	521	529	521
q16	732	672	696	672
q17	3876	3873	3883	3873
q18	9600	9100	9023	9023
q19	1813	1789	1795	1789
q20	2391	2057	2053	2053
q21	8844	8357	8708	8357
q22	894	821	769	769
Total cold run time: 81236 ms
Total hot run time: 77587 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.13 seconds
stream load tsv: 571 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17098458133 Bytes

@kaijchen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 23, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.57% (8449/23106)
Line Coverage: 28.86% (68671/237943)
Region Coverage: 27.83% (35524/127630)
Branch Coverage: 24.57% (18120/73762)
Coverage Report: http://coverage.selectdb-in.cc/coverage/432d6edc948cd9bfbd5c5fa5b5e1e0cf4771d080_432d6edc948cd9bfbd5c5fa5b5e1e0cf4771d080/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.77 seconds
stream load tsv: 567 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17099975954 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 432d6edc948cd9bfbd5c5fa5b5e1e0cf4771d080, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4924	4639	4660	4639
q2	359	172	160	160
q3	2044	1933	1944	1933
q4	1381	1265	1263	1263
q5	3994	3949	4026	3949
q6	246	131	132	131
q7	1434	898	887	887
q8	2769	2792	2773	2773
q9	9745	9563	9467	9467
q10	3441	3529	3530	3529
q11	390	257	240	240
q12	438	284	293	284
q13	4568	3777	3773	3773
q14	314	298	298	298
q15	591	533	520	520
q16	661	586	593	586
q17	1143	964	924	924
q18	7927	7434	7409	7409
q19	1672	1673	1680	1673
q20	568	307	314	307
q21	4429	3978	3983	3978
q22	480	373	380	373
Total cold run time: 53518 ms
Total hot run time: 49096 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4654	4589	4617	4589
q2	344	236	271	236
q3	4024	3994	4017	3994
q4	2713	2706	2695	2695
q5	9612	9623	9577	9577
q6	240	118	125	118
q7	3045	2456	2501	2456
q8	4446	4401	4487	4401
q9	12913	12886	12840	12840
q10	4070	4183	4189	4183
q11	767	757	646	646
q12	983	811	808	808
q13	4298	3605	3554	3554
q14	385	340	347	340
q15	580	518	524	518
q16	729	687	665	665
q17	3860	3886	4008	3886
q18	9623	9022	9167	9022
q19	1801	1781	1799	1781
q20	2407	2077	2041	2041
q21	8816	8415	8468	8415
q22	954	858	852	852
Total cold run time: 81264 ms
Total hot run time: 77617 ms

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yiguolei
Copy link
Contributor

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit fb95bd13b5d767e1e3747431cdc69cc5bd3bce72, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4920	4652	4660	4652
q2	360	159	153	153
q3	2030	1915	1890	1890
q4	1403	1267	1220	1220
q5	3975	3952	4058	3952
q6	252	129	133	129
q7	1420	888	871	871
q8	2781	2799	2767	2767
q9	9757	9584	9379	9379
q10	3462	3521	3516	3516
q11	387	260	231	231
q12	432	287	291	287
q13	4566	3783	3806	3783
q14	328	288	281	281
q15	576	537	516	516
q16	666	586	588	586
q17	1135	992	939	939
q18	7944	7572	7493	7493
q19	1671	1665	1670	1665
q20	561	318	315	315
q21	4414	4028	4011	4011
q22	476	384	375	375
Total cold run time: 53516 ms
Total hot run time: 49011 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4611	4586	4592	4586
q2	333	220	245	220
q3	4046	4038	4023	4023
q4	2720	2714	2714	2714
q5	9711	9778	9784	9778
q6	245	121	125	121
q7	3041	2469	2535	2469
q8	4466	4447	4453	4447
q9	13008	12814	12929	12814
q10	4080	4162	4168	4162
q11	795	647	660	647
q12	973	808	828	808
q13	4287	3559	3568	3559
q14	378	347	346	346
q15	573	524	523	523
q16	733	661	658	658
q17	3878	3903	3798	3798
q18	9576	8935	8894	8894
q19	1804	1776	1771	1771
q20	2394	2077	2048	2048
q21	8942	8679	8703	8679
q22	884	817	810	810
Total cold run time: 81478 ms
Total hot run time: 77875 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.66 seconds
stream load tsv: 569 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.2 seconds inserted 10000000 Rows, about 354K ops/s
storage size: 17100726115 Bytes

@kaijchen kaijchen changed the title [refactor](move-memtable) remove namespace stream_load [cleanup](move-memtable) remove namespace stream_load Dec 7, 2023
@yiguolei
Copy link
Contributor

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools

Tpch sf100 test result on commit 758ba6ccc9919f27eba204d4e3b79aa5d16c8de9, data reload: false

------ Round 1 ----------------------------------
q1	17654	6017	5126	5126
q2	2020	158	144	144
q3	10529	1091	1169	1091
q4	10174	747	764	747
q5	7786	2947	2861	2861
q6	211	129	129	129
q7	931	564	507	507
q8	9276	2021	2065	2021
q9	6923	6415	6360	6360
q10	8239	3105	3018	3018
q11	425	231	227	227
q12	382	232	230	230
q13	17997	3619	3596	3596
q14	245	212	203	203
q15	585	535	530	530
q16	443	388	400	388
q17	976	464	485	464
q18	7335	6750	6601	6601
q19	1569	1370	1415	1370
q20	689	349	340	340
q21	2797	2368	2450	2368
q22	385	327	338	327
Total cold run time: 107571 ms
Total hot run time: 38648 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5115	5082	5054	5054
q2	334	253	256	253
q3	3330	3245	3274	3245
q4	2101	2013	2010	2010
q5	5805	5749	5752	5749
q6	215	124	123	123
q7	2301	1960	1896	1896
q8	3380	3429	3445	3429
q9	8809	8756	8731	8731
q10	3785	3858	3840	3840
q11	579	489	467	467
q12	796	643	638	638
q13	8013	3214	3162	3162
q14	295	271	272	271
q15	598	537	535	535
q16	542	489	501	489
q17	1933	1778	1750	1750
q18	8588	8430	8323	8323
q19	1619	1583	1611	1583
q20	2194	1973	1972	1972
q21	5555	5265	5354	5265
q22	539	511	456	456
Total cold run time: 66426 ms
Total hot run time: 59241 ms

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools

Tpch sf100 test result on commit 758ba6ccc9919f27eba204d4e3b79aa5d16c8de9, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5427	5143	5128	5128
q2	388	196	159	159
q3	1450	1198	1200	1198
q4	1102	882	822	822
q5	3108	3148	3064	3064
q6	222	131	129	129
q7	963	537	531	531
q8	2147	2293	2245	2245
q9	6713	6702	6658	6658
q10	3165	3187	3091	3091
q11	340	223	233	223
q12	381	231	237	231
q13	4396	3655	3628	3628
q14	253	216	216	216
q15	597	556	537	537
q16	454	393	435	393
q17	1048	537	515	515
q18	7063	6715	7019	6715
q19	1641	1571	1597	1571
q20	589	368	344	344
q21	2893	2500	2449	2449
q22	411	320	343	320
Total cold run time: 44751 ms
Total hot run time: 40167 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5106	5134	5095	5095
q2	335	241	267	241
q3	3344	3338	3287	3287
q4	2148	2013	1990	1990
q5	5954	5924	5929	5924
q6	231	125	124	124
q7	2397	1907	1955	1907
q8	3573	3686	3666	3666
q9	9061	9030	8934	8934
q10	3875	3934	3927	3927
q11	587	503	472	472
q12	799	657	654	654
q13	3889	3195	3186	3186
q14	303	263	259	259
q15	610	546	535	535
q16	565	511	517	511
q17	2038	1773	1783	1773
q18	8795	8426	8390	8390
q19	1736	1669	1669	1669
q20	2285	2006	1966	1966
q21	5821	5369	5269	5269
q22	571	504	512	504
Total cold run time: 64023 ms
Total hot run time: 60283 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.64% (8616/23516)
Line Coverage: 28.69% (70048/244115)
Region Coverage: 27.68% (36251/130957)
Branch Coverage: 24.39% (18531/75976)
Coverage Report: http://coverage.selectdb-in.cc/coverage/758ba6ccc9919f27eba204d4e3b79aa5d16c8de9_758ba6ccc9919f27eba204d4e3b79aa5d16c8de9/report/index.html

@doris-robot
Copy link

TPC-DS test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpcds-tools

TPC-DS sf100 test result on commit 758ba6ccc9919f27eba204d4e3b79aa5d16c8de9, data reload: false

run tpcds-sf100 query with default conf and session variables
query1	919	367	346	346
query2	6438	1925	1891	1891
query3	6652	215	205	205
query4	27445	22608	22430	22430
query5	5596	546	530	530
query6	280	202	187	187
query7	4591	280	283	280
query8	225	212	196	196
query9	8246	2693	2749	2693
query10	458	254	261	254
query11	16198	15580	15581	15580
query12	134	80	80	80
query13	1660	329	331	329
query14	12340	7055	7333	7055
query15	251	184	195	184
query16	6433	288	277	277
query17	1795	496	502	496
query18	1920	273	271	271
query19	276	139	142	139
query20	83	81	79	79
query21	181	101	98	98
query22	4916	4617	4783	4617
query23	32038	31188	31256	31188
query24	11769	2810	2835	2810
query25	579	359	364	359
query26	1684	144	148	144
query27	2876	281	286	281
query28	7072	2001	1982	1982
query29	2030	404	393	393
query30	289	142	150	142
query31	961	761	785	761
query32	93	64	61	61
query33	736	289	279	279
query34	876	440	441	440
query35	907	790	767	767
query36	1302	1139	1258	1139
query37	194	76	77	76
query38	3356	3255	3301	3255
query39	1317	1294	1268	1268
query40	302	91	93	91
query41	38	36	35	35
query42	94	92	92	92
query43	530	525	502	502
query44	1058	719	730	719
query45	191	190	193	190
query46	1064	641	653	641
query47	1713	1502	1618	1502
query48	334	266	263	263
query49	1200	330	328	328
query50	735	372	329	329
query51	5415	5276	5311	5276
query52	88	93	90	90
query53	218	154	154	154
query54	1375	567	584	567
query55	96	89	85	85
query56	215	211	208	208
query57	1029	950	965	950
query58	233	207	211	207
query59	2795	2557	2549	2549
query60	258	226	232	226
query61	86	83	90	83
query62	644	465	466	465
query63	175	154	158	154
query64	5969	1748	1737	1737
query65	3343	3267	3279	3267
query66	1287	339	330	330
query67	15748	15170	15261	15170
query68	13115	516	515	515
query69	541	254	261	254
query70	2116	1557	1511	1511
query71	499	234	231	231
query72	5606	3646	3551	3551
query73	3377	311	318	311
query74	7111	6431	6531	6431
query75	5360	2342	2305	2305
query76	6338	1187	1179	1179
query77	660	286	303	286
query78	9097	8617	8643	8617
query79	1492	514	504	504
query80	562	388	348	348
query81	465	214	207	207
query82	215	112	104	104
query83	192	139	144	139
query84	247	54	53	53
query85	961	293	286	286
query86	392	361	354	354
query87	3489	3399	3392	3392
query88	2917	2275	2283	2275
query89	333	264	265	264
query90	1891	214	212	212
query91	125	90	96	90
query92	64	56	58	56
query93	1367	517	515	515
query94	870	197	201	197
query95	476	441	430	430
query96	631	314	319	314
query97	4282	4167	4159	4159
query98	208	207	206	206
query99	1101	871	842	842
Total cold run time: 297449 ms
Total hot run time: 179910 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.16 seconds
stream load tsv: 578 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.3 seconds inserted 10000000 Rows, about 353K ops/s
storage size: 17183863735 Bytes

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 7623b5c into apache:master Dec 30, 2023
26 checks passed
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants