GH-126795: Increase the JIT threshold from 16 to 4096 #126816

brandtbucher · 2024-11-14T01:22:33Z

The core change itself is simple, and results in 2.1% speed improvement and a 3.6% memory improvement for JIT builds. The bulk of this PR is just modifying most of the tests in test_capi.test_opt to remove assumptions about the warmup threshold.

Issue: Use a higher tier-up threshold for JIT code #126795

savannahostrowski · 2024-11-14T03:28:57Z

Misc/NEWS.d/next/Core_and_Builtins/2024-11-13-17-18-13.gh-issue-126795._JBX9e.rst

@@ -0,0 +1 @@
+Increase the threshold for JIT code warmup.


Do we want to include some mention of the performance improvements we've seen with the new threshold?

I agree with Savannah.

savannahostrowski · 2024-11-14T03:34:18Z

Lib/test/test_capi/test_opt.py

@@ -75,20 +75,19 @@ def loop():
                self.assertEqual(opt.get_count(), 0)
                with clear_executors(loop):
                    loop()
-                # Subtract because optimizer doesn't kick in sooner
-                self.assertEqual(opt.get_count(), 1000 - TIER2_THRESHOLD)
+                self.assertEqual(opt.get_count(), 1001)


I might be missing something. Why did we remove the piece about the TIER2_THRESHOLD?

I guess, more generally, how did you decide when to add/remove the threshold from a test? It seems we're wholesale replacing hardcoded values in tests but also adding/removing the threshold when basic arithmetic is used. Was this just trial and error, or is there some convention I'm not picking up on?

IIUC, we were iterating 1000 times, 1000 - TIER2_THRESHOLD of which happened after reaching the threshold.

By that doesn't work if TIER2_THRESHOLD > 1000, so now we iterate 1000 + TIER2_THRESHOLD times, 1000 of which happen after reaching the threshold.

Earlier logic assumed a low threshold (< 1000). With the new threshold this logic needs to be adjusted to keep the test working. Now it's generic enough, that the threshold could be any (positive) number.

Ahhhh, that makes sense. Thanks!

Yeah, generally my approach was to change the number of loops to use the constant, then fix up any math to use the constant (for tests that assert some result). This one was a bit different, since it needed to run more than TIER2_THRESHOLD times. 1000 used to be a "big" number, but now it's not.

markshannon

A few comments on how we name the consts, but looks good in principle.

markshannon · 2024-11-14T18:11:45Z

Lib/test/test_capi/test_opt.py

-                x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = x9 = 42
-                y0 = y1 = y2 = y3 = y4 = y5 = y6 = y7 = y8 = y9 = 42
-                z0 = z1 = z2 = z3 = z4 = z5 = z6 = z7 = z8 = z9 = 42
+                a0 = a1 = a2 = a3 = a4 = a5 = a6 = a7 = a8 = a9 = {TIER2_THRESHOLD}


I don't think those 42s are anything to do with thresholds, just a Douglas Adams reference.

42 is always the right answer :)

markshannon · 2024-11-14T18:12:20Z

Lib/test/test_capi/test_opt.py

+                w0 = w1 = w2 = w3 = w4 = w5 = w6 = w7 = w8 = w9 = {TIER2_THRESHOLD}
+                x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = x9 = {TIER2_THRESHOLD}
+                y0 = y1 = y2 = y3 = y4 = y5 = y6 = y7 = y8 = y9 = {TIER2_THRESHOLD}
+                z0 = z1 = z2 = z3 = z4 = z5 = z6 = z7 = z8 = z9 = {TIER2_THRESHOLD}


z9 does need to exceed the threshold though.

shouldn't we add an assert somewhere to check this condition? if z9 is bigger than the threshold the test fails.

The test passes if z9 meets or exceeds the threshold. It fails if it doesn't. I think it's fine.

markshannon · 2024-11-14T18:15:59Z

Lib/test/test_capi/test_opt.py

@@ -1390,13 +1392,13 @@ def test_guard_type_version_not_removed(self):

        def thing(a):
            x = 0
-            for i in range(100):
+            for i in range(TIER2_THRESHOLD + 100):
                x += a.attr
                # for the first 90 iterations we set the attribute on this dummy function which shouldn't


This comment needs updating

markshannon · 2024-11-14T18:18:03Z

Modules/_testinternalcapi.c

@@ -2222,7 +2222,7 @@ module_exec(PyObject *module)
    }

    if (PyModule_Add(module, "TIER2_THRESHOLD",
-                        PyLong_FromLong(JUMP_BACKWARD_INITIAL_VALUE)) < 0) {
+                        PyLong_FromLong(JUMP_BACKWARD_INITIAL_VALUE + 1)) < 0) {


I'd rather not have the +1 here.
Could you add another const EXCEEDS_TIER2_THRESHOLD = TIER2_THRESHOLD + 1?
In the tests above, use EXCEEDS_TIER2_THRESHOLD where we want to exceed the threshold, and in the places where we need the actual threshold, and you currently have TIER2_THRESHOLD - 1, use TIER2_THRESHOLD

In theory, the tests should continue to work if EXCEEDS_TIER2_THRESHOLD = TIER2_THRESHOLD + 5.

The thing is, the threshold to tier up is 4096 hits. We just have JUMP_BACKWARD_INITIAL_VALUE set to 4095 because we "count" the final zero.

This makes sense for me when looking at the tests, personally. It's intuitive that something that loops TIER2_THRESHOLD times would tier up, and something that loops TIER2_THRESHOLD - 1 wouldn't.

bedevere-app · 2024-11-14T18:19:21Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

brandtbucher · 2024-11-14T18:39:25Z

Results across platforms. Looks like the memory savings are more pronounced on AArch64 macOS and performance impact is more pronounced on AArch64 Linux. Windows seems to benefit less overall (we don't measure memory on Windows, though):

aarch64-apple-darwin: 2% faster, 5% less memory
aarch64-unknown-linux-gnu: 9% faster, 4% less memory
x86_64-unknown-linux-gnu: 1-2% faster, 3-4% less memory
x86_64-pc-windows-msvc: 1% faster
i686-pc-windows-msvc: 1% faster

diegorusso

This is more a question: are we going to make this threshold changeable at runtime?

diegorusso · 2024-11-14T23:13:04Z

Lib/test/test_capi/test_opt.py

@@ -75,20 +75,19 @@ def loop():
                self.assertEqual(opt.get_count(), 0)
                with clear_executors(loop):
                    loop()
-                # Subtract because optimizer doesn't kick in sooner
-                self.assertEqual(opt.get_count(), 1000 - TIER2_THRESHOLD)
+                self.assertEqual(opt.get_count(), 1001)


Earlier logic assumed a low threshold (< 1000). With the new threshold this logic needs to be adjusted to keep the test working. Now it's generic enough, that the threshold could be any (positive) number.

diegorusso · 2024-11-14T23:15:23Z

Lib/test/test_capi/test_opt.py

-                x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = x9 = 42
-                y0 = y1 = y2 = y3 = y4 = y5 = y6 = y7 = y8 = y9 = 42
-                z0 = z1 = z2 = z3 = z4 = z5 = z6 = z7 = z8 = z9 = 42
+                a0 = a1 = a2 = a3 = a4 = a5 = a6 = a7 = a8 = a9 = {TIER2_THRESHOLD}


42 is always the right answer :)

diegorusso · 2024-11-14T23:17:10Z

Lib/test/test_capi/test_opt.py

+                w0 = w1 = w2 = w3 = w4 = w5 = w6 = w7 = w8 = w9 = {TIER2_THRESHOLD}
+                x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = x8 = x9 = {TIER2_THRESHOLD}
+                y0 = y1 = y2 = y3 = y4 = y5 = y6 = y7 = y8 = y9 = {TIER2_THRESHOLD}
+                z0 = z1 = z2 = z3 = z4 = z5 = z6 = z7 = z8 = z9 = {TIER2_THRESHOLD}


shouldn't we add an assert somewhere to check this condition? if z9 is bigger than the threshold the test fails.

diegorusso · 2024-11-14T23:25:35Z

Misc/NEWS.d/next/Core_and_Builtins/2024-11-13-17-18-13.gh-issue-126795._JBX9e.rst

@@ -0,0 +1 @@
+Increase the threshold for JIT code warmup.


I agree with Savannah.

diegorusso · 2024-11-14T23:29:07Z

Results across platforms. Looks like the memory savings are more pronounced on AArch64 macOS and performance impact is more pronounced on AArch64 Linux. Windows seems to benefit less overall (we don't measure memory on Windows, though):

aarch64-apple-darwin: 2% faster, 5% less memory

aarch64-unknown-linux-gnu: 9% faster, 4% less memory

x86_64-unknown-linux-gnu: 1-2% faster, 3-4% less memory

x86_64-pc-windows-msvc: 1% faster

i686-pc-windows-msvc: 1% faster

Anyway, great stuff! These results are incredible!

alonme · 2024-11-15T14:52:36Z

Results across platforms. Looks like the memory savings are more pronounced on AArch64 macOS and performance impact is more pronounced on AArch64 Linux. Windows seems to benefit less overall (we don't measure memory on Windows, though):

aarch64-apple-darwin: 2% faster, 5% less memory

aarch64-unknown-linux-gnu: 9% faster, 4% less memory

x86_64-unknown-linux-gnu: 1-2% faster, 3-4% less memory

x86_64-pc-windows-msvc: 1% faster

i686-pc-windows-msvc: 1% faster

Any idea why aarch64-unknown-linux-gnu would be affected ~4X more?

brandtbucher · 2024-11-15T23:26:24Z

This is more a question: are we going to make this threshold changeable at runtime?

Maybe eventually. I'll open an issue about exactly what interfaces people would like to control the lifecycle of JIT code.

The thing is that changing the thresholds of our counters at runtime is sort of tricky, since they are initialized to some constant and count down from there towards zero. There's also a lot of them.

That means that we need to re-initialize counters whenever the threshold is changed in order to make this work. It's doable, but not quite as simple as e.g. updating GC thresholds at runtime. An env var set at startup could be another option.

brandtbucher · 2024-11-15T23:29:02Z

Any idea why aarch64-unknown-linux-gnu would be affected ~4X more?

Not 100% sure.

Anecdotally, AArch64 Linux code is the largest and least efficient to compile (lots of tricky instruction patching, emitting range-extending trampolines, etc.), so it would make sense that compiling less of it helps us.

brandtbucher · 2024-11-15T23:43:58Z

Okay, I think I've addressed everyone's comments.

I have made the requested changes; please review again (or don't, either way is fine).

bedevere-app · 2024-11-15T23:44:03Z

Thanks for making the requested changes!

@markshannon: please review the changes made to this pull request.

brandtbucher added 3 commits November 13, 2024 17:01

Crank the warmup up to 4096

afd78b4

Rework test_opt to remove assumptions about TIER2_THRESHOLD

a1474b4

blurb add

5dd5806

brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-JIT labels Nov 14, 2024

brandtbucher self-assigned this Nov 14, 2024

bedevere-app bot added the awaiting core review label Nov 14, 2024

bedevere-app bot mentioned this pull request Nov 14, 2024

Use a higher tier-up threshold for JIT code #126795

Open

Touch file for JIT CI

2e7a174

brandtbucher requested a review from savannahostrowski as a code owner November 14, 2024 01:25

savannahostrowski reviewed Nov 14, 2024

View reviewed changes

markshannon requested changes Nov 14, 2024

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting core review labels Nov 14, 2024

diegorusso reviewed Nov 14, 2024

View reviewed changes

brandtbucher added 3 commits November 15, 2024 15:33

Clean up some comments and unused values

a3aee0b

Brag a bit

e2eec5e

Un-touch jit.c

a034966

bedevere-app bot added awaiting change review and removed awaiting changes labels Nov 15, 2024

bedevere-app bot requested a review from markshannon November 15, 2024 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-126795: Increase the JIT threshold from 16 to 4096 #126816

GH-126795: Increase the JIT threshold from 16 to 4096 #126816

brandtbucher commented Nov 14, 2024 •

edited by bedevere-app bot

Loading

savannahostrowski Nov 14, 2024

diegorusso Nov 14, 2024

savannahostrowski Nov 14, 2024

markshannon Nov 14, 2024

diegorusso Nov 14, 2024

savannahostrowski Nov 15, 2024

brandtbucher Nov 15, 2024

markshannon left a comment

markshannon Nov 14, 2024

diegorusso Nov 14, 2024

markshannon Nov 14, 2024

diegorusso Nov 14, 2024

brandtbucher Nov 15, 2024

markshannon Nov 14, 2024

markshannon Nov 14, 2024

brandtbucher Nov 14, 2024

bedevere-app bot commented Nov 14, 2024

brandtbucher commented Nov 14, 2024

diegorusso left a comment

diegorusso Nov 14, 2024

diegorusso Nov 14, 2024

diegorusso Nov 14, 2024

diegorusso Nov 14, 2024

diegorusso commented Nov 14, 2024

alonme commented Nov 15, 2024

brandtbucher commented Nov 15, 2024

brandtbucher commented Nov 15, 2024

brandtbucher commented Nov 15, 2024

bedevere-app bot commented Nov 15, 2024

GH-126795: Increase the JIT threshold from 16 to 4096 #126816

Are you sure you want to change the base?

GH-126795: Increase the JIT threshold from 16 to 4096 #126816

Conversation

brandtbucher commented Nov 14, 2024 • edited by bedevere-app bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bedevere-app bot commented Nov 14, 2024

brandtbucher commented Nov 14, 2024

diegorusso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diegorusso commented Nov 14, 2024

alonme commented Nov 15, 2024

brandtbucher commented Nov 15, 2024

brandtbucher commented Nov 15, 2024

brandtbucher commented Nov 15, 2024

bedevere-app bot commented Nov 15, 2024

brandtbucher commented Nov 14, 2024 •

edited by bedevere-app bot

Loading