Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/cond.c: expect enabled interrupts in cond_wait() #20940

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

derMihai
Copy link
Contributor

@derMihai derMihai commented Oct 24, 2024

Contribution description

Bug 1: cond_wait() should never be called with interrupts disabled

The way cond_wait() is currently implemented suggests that it may be called with interrupts disabled. Going to sleep with interrupts disabled doesn't quite make sense (although see below) and doesn't work either, as e.g. on ARM thread_yield_higher() triggers an interrupt. For example, the following assertion is triggered on both ARM and native:

    static cond_t cond = COND_INIT;
    static mutex_t lock = MUTEX_INIT;

    mutex_lock(&lock);
    irq_disable();
    cond_wait(&cond, &lock); // this should block forever
    assert(0); // thread_yield_higher() couldn't trigger the interrupt
    mutex_unlock(&lock);

My suggestion is to always assume interrupts are enabled. However, there is a valid, albeit hacky use-case for calling cond_wait() with interrupts disabled, and that is to guarantee [ condition check + going to sleep ] is atomic w.r. to [ condition set + wake up ] when the latter is in interrupt context:

static cond_t cond = COND_INIT;
static mutex_t lock = MUTEX_INIT; // dummy mutex only for the cond API
static bool trigger = false;

void waiter(void) 
{
    mutex_lock(&lock); 
    // disable IRQ s.t. ISR doesn't fire between reading `trigger` and `cond_wait()`
    // and we end up blocking forever
    irq_disable(); 
    while (!trigger) {
       cond_wait(&cond, &lock); 
    }
    irq_enable();
    mutex_unlock(&lock);
}

void waker_isr(void) 
{
   trigger = true;
   cond_signal(&cond);
}

Then, we can fix cond_wait() by enabling interrupts just before calling thread_yield_higher(), and restoring the interrupts state afterwards.

    ...
    sched_set_status(me, STATUS_COND_BLOCKED);
    thread_add_to_list(&cond->queue, me);
    irq_enable();
    mutex_unlock(mutex);
    thread_yield_higher();
    irq_restore(irqstate);
    ...

I'm not sure which fix is the best. The first one (in the diffs) is more pure, the second enables more use-cases.

Bug 2: cond_wait() is calling mutex_unlock() with interrupts disabled

No matter whether cond_wait() is called with interrupts disabled or not, it calls mutex_unlock() with disabled interrupts. mutex_unock() will reschedule if a higher priority thread is already blocking on the mutex, leading to the same bug as above. I therefore moved mutex_unlock() after interrupts are enabled. This still guarantees that [ condition check + going to sleep ] is atomic w.r. to [ condition set + wake up ].

@github-actions github-actions bot added the Area: core Area: RIOT kernel. Handle PRs marked with this with care! label Oct 24, 2024
@derMihai derMihai changed the title code/cond.c: expect enabled interrupts in cond_wait() core/cond.c: expect enabled interrupts in cond_wait() Oct 24, 2024
@kaspar030
Copy link
Contributor

This is an issue also in other parts of core, e.g., in core/mutex and msg.

@kaspar030
Copy link
Contributor

My suggestion is to always assume interrupts are enabled.

I think this makes the most sense.

@derMihai
Copy link
Contributor Author

This is an issue also in other parts of core, e.g., in core/mutex and msg.

Indeed, and it seems the fix is the same everywhere. Probably we should also add an arch-independent wrapper around thread_yield*() to assert interrupts are enabled.

@kaspar030
Copy link
Contributor

This is an issue also in other parts of core, e.g., in core/mutex and msg.

Indeed, and it seems the fix is the same everywhere. Probably we should also add an arch-independent wrapper around thread_yield*() to assert interrupts are enabled.

The question would be if we should define those functions as interrupt-enabling, and drop the state handling inside them (by doing irq_disable()/irq_enable() directly).

Any alternative allows mis-use (e.g., by doing irq_disable(); mutex_lock(...)), which can only be runtime detected, but it is unclear how to handle that.

@derMihai
Copy link
Contributor Author

The question would be if we should define those functions as interrupt-enabling, and drop the state handling inside them (by doing irq_disable()/irq_enable() directly).

I don't see a problem with that. Any prior usage that conflicts with this was a bug anyway.

@derMihai derMihai marked this pull request as draft October 24, 2024 09:09
@kaspar030
Copy link
Contributor

Probably we should also add an arch-independent wrapper around thread_yield*() to assert interrupts are enabled.

That would also make sense. Currently, core code does irq_restore(...); thread_yield_higher(). depending on the platform, the other way around is allowed and might save a scheduler run.

@derMihai
Copy link
Contributor Author

This goes deeper. Also stuff like mutex_unlock() may not be called with interrupts disabled.

So to revise the definition, it's not only stuff that might block that may not be called with interrupts disabled, but also stuff that might reschedule. And I think this is sensible. The Linux kernel assumes the same for held spinlocks (which on UP do nothing more than disabling interrupts).

@derMihai derMihai marked this pull request as ready for review October 24, 2024 12:51
@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Oct 24, 2024
@riot-ci
Copy link

riot-ci commented Oct 24, 2024

Murdock results

✔️ PASSED

a6b2b03 core/cond.c: cond_wait() expect enabled interrupts

Success Failures Total Runtime
10213 0 10215 17m:31s

Artifacts

@derMihai
Copy link
Contributor Author

I would limit the scope of this PR to the original intent, and that is to fix cond_wait(), as meanwhile I discovered a second, more important bug. I opened #20941 to deal with the more general and complex issue of locking with interrupts disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants