-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[storage] fix the replay-verify stuck #10930
Conversation
3fde956
to
4e9d32c
Compare
4e9d32c
to
cbea735
Compare
44e36f2
to
30a54ae
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
fbaa0ce
to
e342092
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
e342092
to
e61afc9
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
testsuite/replay_verify.py
Outdated
while i < NUM_OF_RETRIES: | ||
i += 1 | ||
(partition_number, code, msg) = func(*args, **kwargs) | ||
print(f"[partition {partition_number}] trying {i}th time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first print will be "2th"? 😂
And I mean.. res doesn't need to be kept across iterations if we add the return under the code == 1
branch. (updated my previous comment to use while True:
, which makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is some poetry code inspection in CI/CD and enforce there is a return from this function, since it doesn't know if code == 1 would eventually return. updated i = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay.. then
print("try {i}")
should be beforefunc(...)
- looks nicer to do:
(partition_number, code, msg) = (None, None, None)
for i in range(1, NUM_TRIES+1):
print("try {i}")
(partition_number, code, msg) = func(*args, **kwargs)
if code != 1:
break
return (partition_number, code, msg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks beautify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental approval, sorry.
e61afc9
to
95f060e
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
95f060e
to
90bc3c6
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
Description
The problem is the 2nd retry was the same process with old cache or lock on DB. Now, do the retry outside and can still leverage the resumable replay without redundant work。
using old versioned node cache will cause all non-exe thread panic and stuck forever.
also the lock on ledger db cannot be dropped even when the AptosDb goes out of scope in the 2nd iteration.
Test Plan
https://github.com/aptos-labs/aptos-core/actions/runs/6937303262