Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ebtables-tiny: Fix lockfile function and reduce wait on lock #275

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SmithChart
Copy link

During investigation of an issue with ebtables we discovered that the locking mechanism in ebtables-tiny did not work as expected. These commits should fix this issue.

These commits were part of the Freifunk Braunschweig parker -releases for about an year now. We did not discover any negative impact with these changes.

Jan Luebbe and others added 2 commits August 4, 2024 12:11
flock without LOCK_NB never returns on a conflicting lock, so this
function doesn't conform to its comment.

Also close the lock file on error, so we can keep retrying.

Signed-off-by: Jan Luebbe <[email protected]>
With the fix form Shoragan to actually wait if a lock-file is present,
retries would only be done every second.
But: Interacting with ebtables should not take that long.
So we can drastically reduce the waiting time here.

With this change will will only wait 50ms (instead of 1s).
To not spam the log too much if a lock is kept for longer (or the other
process has more luck locking) this change also reduces the logging
rate.

Signed-off-by: Chris Fiege <[email protected]>
@SmithChart
Copy link
Author

Please also advise: Should I open a separate PR on the main gluon repo once this is merged? Or do you pick up changes from here automagically?

@rotanid rotanid requested a review from neocturne August 4, 2024 20:30
@neocturne
Copy link
Member

Waiting until the lock releases indefinitely is the intended behavior. The comment on the function is incorrect, not the code (the same incorrect comment exists in "full" ebtables). I believe the comment was correct at some point in the past, but the code was improved without updating the comment.

@jluebbe
Copy link

jluebbe commented Aug 14, 2024

Waiting until the lock releases indefinitely is the intended behavior. The comment on the function is incorrect, not the code (the same incorrect comment exists in "full" ebtables). I believe the comment was correct at some point in the past, but the code was improved without updating the comment.

The original code returned from lock_file with -1 if another process had the lock: https://git.netfilter.org/ebtables/commit/?id=8b5594d7c21f3056c8c194aea1f1519c006aeaee

When it was changed to use flock() it never returned as long another process had the lock, making the retry loop with sleep in ebt_get_kernel_table useless. No "Trying to obtain lock" is ever printed, making it hard to diagnose this case from logs.

https://github.com/freifunk-gluon/gluon/blob/main/package/gluon-radv-filterd/src/gluon-radv-filterd.c also has code to handle timeouts of ebtables-tiny. We're not sure how this is supposed to interact. :)

@neocturne
Copy link
Member

In theory, ebtables should never run long enough for any of this to matter. If the lock is ever held longer than a few milliseconds, that seems like a bug to me, not the code that waits for the lock to release.

@jluebbe
Copy link

jluebbe commented Aug 14, 2024

In theory, ebtables should never run long enough for any of this to matter. If the lock is ever held longer than a few milliseconds, that seems like a bug to me, not the code that waits for the lock to release.

Agreed. We added this change because we saw hanging ebtables-tiny processes and wanted to investigate where they got stuck (e.g on the lock or somewhere else). Without these log messages in a (working) retry loop, there is no clear indication which of those ebtables-tiny processes is hanging where.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants