-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interrupting Sync with ctrl+c leads to expiring everything #169
Comments
Hello, If you are still there, could you please tell me if you did any further investigation? |
Uhhh, just saw this old bug and tested it. This one is really bad, interrupting the send, e.g. by Ctrl-C it is really expiring also the last successful sent one and marking the canceled/failed as sent local. |
Played a while with this. Seems for the Ctrl-C problem But should there not also be a check if the receive was successful?
after all the send/recv??? |
Anything against adding And for the second part, is there already on another place a check if send/receive was successful? If beside Ctrl-C send/receive fails? If not after a failed send/receive the script will still set the sent property and call expire, which will destroy the "real" last successfully synced snapshot. |
I need to find time and headspace to sit down and examine this carefully.
Seems very odd on the face of it.I mean, if it works properly when it fails
"normally"... why would control-c somehow make it worse, and undo things
that go backwards??
very odd.
UNLESS.. control-c is interrupting the call of "whats the last
sendsnapshot name", and I'm not checking for validity of the return.
…On Sat, Mar 19, 2022 at 11:31 PM crispyduck00 ***@***.***> wrote:
Anything against adding trap "exit" INT, to exit the whole script on
Ctrl-C?
And for the second part, is there already on another place a check if
send/receive was successful? If beside Ctrl-C send/receive fails? If not
after a failed send/receive the script will still set the sent property and
call expire, which will destroy the "real" last successfully synced
snapshot.
Which is bad, as the hosts will never get synced again without sending
over everything again.
—
Reply to this email directly, view it on GitHub
<#169 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANEV6IPCRKINZRZRSVXGY3VA3A3LANCNFSM44GLEQVQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
This still isnt making sense to me. I dont have any output in this ticket supporting the claim, the output here, is that it sent sequence 00020, but only expired 06 |
I dont even know if it is misexpiring on the local side or the remote side. |
Oh, seems I forgot last time to send my comment; here it is: Tested it once again:
So the last successful sent snapshot zrep_000000 is locally expired. The idea with trap "exit" INT is also not really a working solution. |
I don't know why it things send/recv was successful. But as it thinks it was successful it try's to set sent on desthost snap, as it does not exist it uses fallback and afterwards sets sent on local one, which is the bad thing that later causes that the "real" last sent is expired. As quick and dirty fix (till you come up with something better) I simply changed the fallback to:
As my hosts all support props on snapshots, I assume if zrep cant set the prop on the desthost snapshot that maybe the snap does not exist and zrep_errquit before sent could be set locally. So far this worked in all my tests. |
Seems like half the problem is with the "attempt fallback methods for older ZFS" modules. havent fully committed to that, but its the direction im considering. |
FYI: The thing that is confusing me, is that the ACTUAL send, has fail checks. So if the send failed, it should just exit. that suggests to me that the problem is when other things get interrupted. |
I know, I don't really understand why the fail check is not working here. Tested with Ctrl-C, but also when running as systemd service and stopping the service while a send. |
I now kinda recall that zfs send/rceive sometimes jams up a system a bit longer than would be expected. |
Hi,
I have to investigate this further but here is my log:
As you can see, I hit
^C
after a while (line 5), which cancels the sending process. However it seems as if zrep thinks it completed successfully and then tries to continue with expiring snapshots instead of aborting the whole script.This bad because it expired all common snapshots and now I have to delete my backup FS and do a full sync!
If I have time, I will try to reproduce this and see where to check the exit state or similar...
The text was updated successfully, but these errors were encountered: