Refactor scheduler and implement spinner thread for Partr. #56475

gbaraldi · 2024-11-06T18:14:20Z

Also add option for child first
I'm splitting this from the workstealing PR to facilitate the reviews. This part should be much easier to merge.

The spinner design is rougly based on go's and mostly reuses the seq-cst barriers we have currently for sleeping. Though unlike n_threads_running the new counters rely on the thread being woken up so they will underreport

vtjnash · 2024-11-06T19:02:22Z

base/task.jl

+function schedule(t::Task)
+    if ChildFirst
+        ct = current_task()
+        if ct.sticky || t.sticky


This should actually check if set_task_tid succeeded so that this isn't a data race here (even though this is dead code right now)

actually any use of yieldto seems to have this problem, so maybe it deserves another look

src/scheduler.c

This also adds a counter for idle/sleeping threads to avoid checking every thread when everyone is running.

gbaraldi · 2025-01-20T19:53:46Z

Using

function fib(n::Int)
           n <= 1 && return n
           t = Threads.@spawn fib(n - 2)
           return fib(n - 1) + fetch(t)::Int
       end

as a benchmark this shows a pretty measurable improvement:
nightly

julia> @benchmark fib(20)
BenchmarkTools.Trial: 1571 samples with 1 evaluation.
 Range (min … max):  2.895 ms …   7.492 ms  ┊ GC (min … max): 0.00% …  0.00%
 Time  (median):     2.958 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.181 ms ± 544.458 μs  ┊ GC (mean ± σ):  6.30% ± 10.37%

  ▆█▃            ▁▁▁                                           
  ███▇▄▄▁▁▁▁▁▃▅███████▇▆▅▅▅▅▅▇▅▆▅▅▅▇▆▆▄▅▆▅▃▅▆▃▆▅▄▅▅▄▅▃▄▃▃▁▄▁▄ █
  2.9 ms       Histogram: log(frequency) by time      5.45 ms <

 Memory estimate: 3.71 MiB, allocs estimate: 67768

PR

julia> @benchmark fib(20)
BenchmarkTools.Trial: 1882 samples with 1 evaluation.
 Range (min … max):  2.403 ms …   5.918 ms  ┊ GC (min … max): 0.00% … 57.77%
 Time  (median):     2.456 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.655 ms ± 501.850 μs  ┊ GC (mean ± σ):  6.95% ± 11.26%

  ▆█▃            ▁▂▁                                           
  ███▇▃▁▁▁▁▁▁▁▅▆█████▇▇▇▄▅▅▃▅▆▆▅▃▅▆▅▇▅▅▅▆▆▅▅▅▅▅▅▆▄▅▅▅▃▆▄▅▃▅▅▅ █
  2.4 ms       Histogram: log(frequency) by time      4.75 ms <

 Memory estimate: 3.51 MiB, allocs estimate: 54732.

While this benchmark isn't super comprehensive, it is pretty much just measuring scheduler latency, and given that this doesn't really change any scheduler decisions and just changes the wakeup logic it seems like a pretty nice improvement

vchuravy · 2025-01-24T14:27:16Z

base/task.jl

    return t
 end

+const ChildFirst = false


Should we for now not add ChildFirst?

Yeah, it's not even correct for now.

vchuravy · 2025-01-24T14:28:09Z

base/task.jl

-    ccall(:jl_wakeup_thread, Cvoid, (Int16,), (tid - 1) % Int16)
+
+    if (tid == 0)
+        Core.Intrinsics.atomic_fence(:sequentially_consistent)


Did fix the codegen for this on AMD?

We have this in our branch, but llvm/llvm-project#106555 is still there

vchuravy · 2025-01-24T14:29:47Z

src/scheduler.c

+JL_DLLEXPORT void jl_add_spinner(void)
+{
+    jl_task_t *ct = jl_current_task;
+    add_spinner(ct);
+}


We could probably pass current_task in from Julia? It's cheaper to get there.

vchuravy · 2025-01-24T14:32:54Z

base/scheduler/partr.jl

+        # This task is stuck to a thread that's likely sleeping, move the task to it's private queue and wake it up
+        # We move this out of the queue to avoid spinning on it
+        tid2 = Threads.threadid(task)
+        if tid2 != 0
+            ntasks = heap.ntasks
+            @atomic :monotonic heap.ntasks = ntasks - Int32(1)
+            heap.tasks[1] = heap.tasks[ntasks]
+            Base._unsetindex!(heap.tasks, Int(ntasks))
+            prio1 = typemax(UInt16)
+            if ntasks > 1
+                multiq_sift_down(heap, Int32(1))
+                prio1 = heap.tasks[1].priority
+            end
+            @atomic :monotonic heap.priority = prio1
+            push!(Base.workqueue_for(tid2), task)
+            unlock(heap.lock)
+            ccall(:jl_wakeup_thread, Cvoid, (Int16,), (tid2 - 1) % Int16)
+        else


Can you explain why this code is now needed? Maybe it would also make sense to factor it out?

I'll add a more comprehensive comment but the gist is:

Thread 2 with Task1 calls wait for some reason, maybe on a lock

Thread 2 tries to find more work, fails and goes to sleep (while holding on to task1)

Thread 1 notifies Task1, scheduling that task
This is where in the new design things might not work as expected, since we're not waking every thread up on every schedule, it can happen that we decide to try and run task1, but it's still stuck to thread 2. This requires waking up thread 2. In order to avoid other tasks from spinning on the stuck task we push it to the private queue of thread 2, since regardless of what happens it has to at least run that task once.

In relation to the factoring out, I guess we could factor out a delete task from partr, but not sure if we gain much

gbaraldi requested review from vchuravy, vtjnash, kpamnany and d-netto November 6, 2024 18:14

vtjnash reviewed Nov 6, 2024

View reviewed changes

oscardssmith added the multithreading Base.Threads and related functionality label Jan 10, 2025

gbaraldi force-pushed the gb/sched-refact branch from 2593629 to 0d72173 Compare January 17, 2025 21:00

Refactor scheduler and switch to a spinner thread concept for wakeups

9ecf779

This also adds a counter for idle/sleeping threads to avoid checking every thread when everyone is running.

gbaraldi force-pushed the gb/sched-refact branch from 0d72173 to 9ecf779 Compare January 17, 2025 21:01

gbaraldi added 2 commits January 20, 2025 14:10

Fix scheduler deadlocking on the spin lock + fix tests and counters

ee3ab2e

Fix analyzegc

a50c9d5

Fix typo

df83c27

gbaraldi requested a review from vtjnash January 20, 2025 19:57

oscardssmith added the performance Must go faster label Jan 20, 2025

gbaraldi added 2 commits January 24, 2025 11:17

Merge branch 'master' into gb/sched-refact

911ab19

Fix merge error

f5d2743

vchuravy reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor scheduler and implement spinner thread for Partr. #56475

Refactor scheduler and implement spinner thread for Partr. #56475

gbaraldi commented Nov 6, 2024 •

edited

Loading

vtjnash Nov 6, 2024

gbaraldi Jan 20, 2025

gbaraldi commented Jan 20, 2025

vchuravy Jan 24, 2025

gbaraldi Jan 24, 2025

vchuravy Jan 24, 2025

gbaraldi Jan 24, 2025

vchuravy Jan 24, 2025

vchuravy Jan 24, 2025

gbaraldi Jan 24, 2025

gbaraldi Jan 24, 2025

Refactor scheduler and implement spinner thread for Partr. #56475

Are you sure you want to change the base?

Refactor scheduler and implement spinner thread for Partr. #56475

Conversation

gbaraldi commented Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaraldi commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaraldi commented Nov 6, 2024 •

edited

Loading