Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch-3.0: [Bug](dead lock) Fix dead lock in Tablet Stat Mgr #46959 #47418

Merged
merged 1 commit into from
Jan 26, 2025

Conversation

github-actions[bot]
Copy link
Contributor

Cherry-picked from #46959

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #28608

Problem Summary:

In TabletStatMgr, We use stream().parallel() or parallelStream() in a
ForkJoinTask,when they are called, the stream will allocate the
`ForEach` task to multiple threads. However, when the stream is within a
ForkJoinTask, it will attempt to steal threads from the ForkJoinPool.
When the number of threads in the ForkJoinPool is small, thread
competition is very likely to occur, ultimately leading to a deadlock.

This commit will abandon ForkJoinPool and use a regular thread pool
instead.
@github-actions github-actions bot requested a review from dataroaring as a code owner January 24, 2025 08:46
@Thearas
Copy link
Contributor

Thearas commented Jan 24, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Jan 24, 2025
@Thearas
Copy link
Contributor

Thearas commented Jan 24, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41476 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 491c6a5f372c5a4f5e5a9fb185dcd61e8b0331e1, data reload: false

------ Round 1 ----------------------------------
q1	17584	7573	7288	7288
q2	2032	165	181	165
q3	10559	1172	1275	1172
q4	10604	778	823	778
q5	7841	2896	2885	2885
q6	236	155	150	150
q7	986	627	631	627
q8	9880	1999	2074	1999
q9	6798	6517	6510	6510
q10	7592	2321	2369	2321
q11	542	269	270	269
q12	408	222	217	217
q13	17771	2978	2984	2978
q14	255	207	215	207
q15	575	542	536	536
q16	650	576	599	576
q17	976	614	555	555
q18	7329	6810	6673	6673
q19	1381	1115	1063	1063
q20	469	206	204	204
q21	4041	3316	3481	3316
q22	1092	987	1017	987
Total cold run time: 109601 ms
Total hot run time: 41476 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7284	7223	7227	7223
q2	331	238	235	235
q3	3184	2989	3022	2989
q4	2084	1875	1874	1874
q5	5764	5761	5836	5761
q6	228	149	147	147
q7	2289	1887	1852	1852
q8	3368	3509	3498	3498
q9	8853	8903	8839	8839
q10	3608	3535	3527	3527
q11	592	483	503	483
q12	807	620	572	572
q13	9624	3211	3141	3141
q14	295	268	275	268
q15	574	524	515	515
q16	693	653	629	629
q17	1825	1624	1603	1603
q18	8446	7710	7535	7535
q19	1726	1615	1578	1578
q20	2069	1785	1820	1785
q21	5426	5145	5150	5145
q22	1111	1021	1016	1016
Total cold run time: 70181 ms
Total hot run time: 60215 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192615 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 491c6a5f372c5a4f5e5a9fb185dcd61e8b0331e1, data reload: false

query1	963	375	368	368
query2	6536	2195	2141	2141
query3	6710	219	213	213
query4	34162	23614	23490	23490
query5	4375	469	481	469
query6	296	190	186	186
query7	4627	324	317	317
query8	288	228	223	223
query9	9659	2696	2698	2696
query10	484	270	269	269
query11	18239	15169	15166	15166
query12	161	104	106	104
query13	1648	434	417	417
query14	9718	7305	6955	6955
query15	251	172	180	172
query16	8116	481	488	481
query17	1607	559	536	536
query18	2142	300	294	294
query19	330	156	157	156
query20	114	105	106	105
query21	217	103	102	102
query22	4359	4157	4049	4049
query23	34819	33918	34183	33918
query24	11209	2947	2938	2938
query25	672	419	410	410
query26	1401	176	175	175
query27	2814	349	355	349
query28	8012	2438	2445	2438
query29	925	464	450	450
query30	330	162	173	162
query31	1024	792	793	792
query32	97	59	66	59
query33	808	314	314	314
query34	918	528	553	528
query35	865	713	730	713
query36	1087	966	971	966
query37	141	81	77	77
query38	4012	3889	3822	3822
query39	1504	1443	1448	1443
query40	291	108	106	106
query41	55	54	53	53
query42	118	108	105	105
query43	552	501	505	501
query44	1231	820	822	820
query45	185	177	173	173
query46	1172	733	735	733
query47	1932	1811	1832	1811
query48	468	375	392	375
query49	1203	419	405	405
query50	836	420	419	419
query51	7119	7174	7048	7048
query52	108	92	93	92
query53	266	194	193	193
query54	1238	479	471	471
query55	82	82	83	82
query56	280	268	274	268
query57	1248	1121	1105	1105
query58	253	217	227	217
query59	3412	3311	3355	3311
query60	298	276	274	274
query61	152	130	137	130
query62	910	686	661	661
query63	237	202	198	198
query64	5526	769	687	687
query65	3335	3222	3187	3187
query66	1450	321	318	318
query67	15972	15527	15495	15495
query68	4634	567	577	567
query69	462	285	290	285
query70	1180	1139	1143	1139
query71	405	267	264	264
query72	6389	4059	4050	4050
query73	768	346	366	346
query74	10237	8949	8974	8949
query75	3450	2663	2669	2663
query76	3158	1083	1085	1083
query77	432	284	283	283
query78	10570	9609	9652	9609
query79	1128	608	597	597
query80	784	442	448	442
query81	520	240	243	240
query82	1328	121	122	121
query83	221	165	150	150
query84	245	86	76	76
query85	1127	298	294	294
query86	311	295	311	295
query87	4375	4342	4414	4342
query88	3500	2403	2379	2379
query89	413	301	290	290
query90	2059	194	189	189
query91	180	151	170	151
query92	65	56	57	56
query93	1068	556	545	545
query94	778	311	293	293
query95	374	259	259	259
query96	611	293	280	280
query97	3325	3179	3215	3179
query98	214	218	194	194
query99	1677	1308	1304	1304
Total cold run time: 301557 ms
Total hot run time: 192615 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.91 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 491c6a5f372c5a4f5e5a9fb185dcd61e8b0331e1, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.06
query4	1.61	0.11	0.11
query5	0.52	0.50	0.53
query6	1.14	0.74	0.73
query7	0.02	0.02	0.01
query8	0.03	0.03	0.03
query9	0.56	0.50	0.50
query10	0.55	0.54	0.57
query11	0.14	0.10	0.10
query12	0.13	0.11	0.11
query13	0.62	0.60	0.59
query14	2.86	2.84	2.82
query15	0.90	0.83	0.82
query16	0.38	0.37	0.38
query17	0.97	0.98	1.06
query18	0.24	0.22	0.22
query19	1.87	1.76	2.01
query20	0.01	0.01	0.02
query21	15.36	0.56	0.59
query22	2.34	2.32	2.08
query23	17.08	0.97	0.81
query24	2.79	0.71	1.46
query25	0.27	0.18	0.12
query26	0.33	0.14	0.14
query27	0.05	0.04	0.05
query28	10.95	1.11	1.08
query29	12.59	3.29	3.28
query30	0.25	0.07	0.06
query31	2.84	0.38	0.37
query32	3.28	0.46	0.46
query33	2.98	3.06	3.05
query34	17.17	4.52	4.50
query35	4.64	4.54	4.62
query36	0.64	0.49	0.50
query37	0.09	0.06	0.06
query38	0.04	0.03	0.03
query39	0.04	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.94 s
Total hot run time: 32.91 s

@dataroaring dataroaring merged commit cc38fad into branch-3.0 Jan 26, 2025
22 checks passed
@github-actions github-actions bot deleted the auto-pick-46959-branch-3.0 branch January 26, 2025 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants