Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](cloud-mow) schema change should retry when encouter TXN_CONFILCT in cloud mode #46748

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hust-hhb
Copy link
Contributor

For mow table, shcema change may encouter TXN_CONFILCT beacause of tow tablet trying to modify delete bitmap lock in the same time, which may lead to shcema change failed, so should add retry in fe.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hust-hhb
Copy link
Contributor Author

run buildall

BiteTheDDDDt
BiteTheDDDDt previously approved these changes Jan 10, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 10, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 39.34% (10252/26058)
Line Coverage: 30.52% (87402/286410)
Region Coverage: 29.57% (44552/150689)
Branch Coverage: 26.11% (22803/87328)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d9535df4426ef1921c655445ec71e79aeb6a1efa_d9535df4426ef1921c655445ec71e79aeb6a1efa/report/index.html

@hust-hhb
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jan 10, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 32783 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 702e8c69bfb83d797b93e5b59cb66b7ff59363a1, data reload: false

------ Round 1 ----------------------------------
q1	17586	6168	6041	6041
q2	2062	308	172	172
q3	10405	1305	749	749
q4	10208	890	443	443
q5	7494	2203	2000	2000
q6	201	173	143	143
q7	893	748	618	618
q8	9261	1414	1234	1234
q9	5419	4872	4971	4872
q10	6754	2298	1853	1853
q11	483	291	264	264
q12	335	361	214	214
q13	17773	3698	3101	3101
q14	231	229	209	209
q15	553	523	489	489
q16	638	641	597	597
q17	583	864	327	327
q18	7081	6404	6424	6404
q19	1212	977	578	578
q20	324	332	193	193
q21	2813	2180	1970	1970
q22	361	328	312	312
Total cold run time: 102670 ms
Total hot run time: 32783 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6195	6285	6243	6243
q2	234	334	253	253
q3	2266	2642	2355	2355
q4	1430	1797	1335	1335
q5	4357	4818	5137	4818
q6	188	185	151	151
q7	2054	2000	1872	1872
q8	2673	2813	2716	2716
q9	7282	7275	7195	7195
q10	3140	3358	2840	2840
q11	599	530	489	489
q12	701	795	642	642
q13	3393	3926	3187	3187
q14	286	299	301	299
q15	565	506	499	499
q16	652	669	668	668
q17	1242	1741	1254	1254
q18	7799	7507	7003	7003
q19	802	1137	1051	1051
q20	1920	1997	1850	1850
q21	5527	5114	4935	4935
q22	595	604	544	544
Total cold run time: 53900 ms
Total hot run time: 52199 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188789 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 702e8c69bfb83d797b93e5b59cb66b7ff59363a1, data reload: false

query1	971	376	370	370
query2	6519	2420	2353	2353
query3	6717	215	219	215
query4	33825	23415	23304	23304
query5	4370	601	444	444
query6	304	206	213	206
query7	4629	495	310	310
query8	306	247	241	241
query9	9542	2668	2633	2633
query10	483	335	235	235
query11	18364	15155	15047	15047
query12	152	109	101	101
query13	1643	517	390	390
query14	10346	7047	7276	7047
query15	224	198	185	185
query16	8221	609	481	481
query17	1590	740	587	587
query18	2098	407	309	309
query19	221	191	160	160
query20	122	115	111	111
query21	242	117	101	101
query22	4197	4491	4241	4241
query23	34109	32941	33223	32941
query24	6484	2271	2238	2238
query25	478	446	380	380
query26	1204	300	152	152
query27	1960	457	326	326
query28	5326	2441	2414	2414
query29	682	542	410	410
query30	233	190	152	152
query31	976	853	785	785
query32	75	62	56	56
query33	519	358	290	290
query34	744	873	508	508
query35	785	819	718	718
query36	979	1048	945	945
query37	127	103	75	75
query38	4107	4062	4068	4062
query39	1453	1410	1400	1400
query40	213	109	99	99
query41	71	54	53	53
query42	127	100	99	99
query43	516	517	497	497
query44	1304	788	811	788
query45	185	175	168	168
query46	865	1032	658	658
query47	1830	1899	1775	1775
query48	376	390	312	312
query49	782	475	382	382
query50	646	647	402	402
query51	6769	6941	6833	6833
query52	101	103	87	87
query53	229	252	183	183
query54	478	481	413	413
query55	91	77	76	76
query56	248	291	246	246
query57	1187	1169	1109	1109
query58	251	233	238	233
query59	3221	3092	2920	2920
query60	291	287	281	281
query61	150	137	137	137
query62	840	768	734	734
query63	230	200	196	196
query64	4362	999	622	622
query65	3244	3193	3232	3193
query66	1053	426	337	337
query67	15859	15700	15597	15597
query68	8353	721	515	515
query69	467	286	263	263
query70	1206	1125	1136	1125
query71	428	294	264	264
query72	6172	3873	3842	3842
query73	664	772	354	354
query74	10307	8845	8814	8814
query75	4293	3218	2632	2632
query76	4186	1180	790	790
query77	779	361	274	274
query78	10841	9991	9468	9468
query79	3678	780	579	579
query80	681	582	434	434
query81	481	289	273	273
query82	604	153	127	127
query83	202	172	149	149
query84	285	97	76	76
query85	734	366	311	311
query86	348	318	308	308
query87	4487	4502	4305	4305
query88	4130	2158	2111	2111
query89	406	307	295	295
query90	1910	189	189	189
query91	140	138	106	106
query92	66	57	54	54
query93	1005	771	534	534
query94	666	395	309	309
query95	333	267	256	256
query96	493	595	275	275
query97	2918	2996	2830	2830
query98	224	203	197	197
query99	1655	1498	1377	1377
Total cold run time: 293988 ms
Total hot run time: 188789 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.59 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 702e8c69bfb83d797b93e5b59cb66b7ff59363a1, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.04
query3	0.23	0.08	0.06
query4	1.60	0.11	0.10
query5	0.44	0.43	0.41
query6	1.19	0.65	0.68
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.59	0.50	0.51
query10	0.56	0.55	0.56
query11	0.14	0.10	0.10
query12	0.14	0.12	0.11
query13	0.60	0.61	0.61
query14	2.73	2.88	2.75
query15	0.91	0.83	0.82
query16	0.38	0.38	0.38
query17	1.06	1.05	1.07
query18	0.23	0.22	0.22
query19	1.85	1.79	1.93
query20	0.02	0.02	0.01
query21	15.38	0.97	0.59
query22	0.74	1.01	0.67
query23	15.04	1.46	0.56
query24	3.08	1.18	1.04
query25	0.18	0.09	0.18
query26	0.36	0.14	0.13
query27	0.07	0.05	0.04
query28	13.56	1.57	1.05
query29	12.61	3.92	3.26
query30	0.25	0.10	0.06
query31	2.81	0.62	0.39
query32	3.23	0.55	0.47
query33	3.10	3.09	3.17
query34	16.86	5.14	4.64
query35	4.50	4.53	4.50
query36	0.66	0.49	0.48
query37	0.10	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.14	0.13
query41	0.08	0.02	0.02
query42	0.04	0.03	0.02
query43	0.03	0.03	0.04
Total cold run time: 105.75 s
Total hot run time: 31.59 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 39.35% (10255/26059)
Line Coverage: 30.52% (87421/286420)
Region Coverage: 29.55% (44532/150692)
Branch Coverage: 26.11% (22798/87330)
Coverage Report: http://coverage.selectdb-in.cc/coverage/702e8c69bfb83d797b93e5b59cb66b7ff59363a1_702e8c69bfb83d797b93e5b59cb66b7ff59363a1/report/index.html

@@ -385,6 +385,10 @@ Status retry_rpc(std::string_view op_name, const Request& req, Response* res,
} else if (res->status().code() == MetaServiceCode::INVALID_ARGUMENT) {
return Status::Error<ErrorCode::INVALID_ARGUMENT, false>("failed to {}: {}", op_name,
res->status().msg());
} else if (res->status().code() ==
MetaServiceCode::KV_TXN_CONFLICT_RETRY_EXCEEDED_MAX_TIMES) {
return Status::Error<ErrorCode::DELETE_BITMAP_LOCK_ERROR, false>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return such error for all rpc? DELETE_BITMAP_LOCK_ERROR is only used for delete
bitmap lock related rpc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants