Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: gh-126868: Add freelist for compact int objects #126865

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Nov 15, 2024

We can add freelists for the int object to improve performance. Using the new methods from #121934 the amount of code needed for adding a freelist is quite small. We only implement the freelist for compact ints (e.g. a single digit). For multi-digit int objects adding freelists is more complex (we need a size-based freelist) and the gains are smaller (for very large int objects the allocation is not a significant part of the computation time)

Notes:

  • To bo done: run pyperformance benchmarks
  • To be done: run benchmarks on linux (the gain seems to be smaller than on windows)
  • The long_dealloc contained special casing to avoid deallocating small ints. These are immortal now (with fixed refcount value), so we removed that code

Some references to discussions on freelists

The freelist improves performance of int operations in microbenchmarks:

bench_long: Mean +- std dev: [main_long] 106 ns +- 5 ns -> [pr_long1c] 99.8 ns +- 4.4 ns: 1.07x faster
bench_alloc: Mean +- std dev: [main_long] 210 us +- 6 us -> [pr_long1c] 177 us +- 10 us: 1.19x faster

Benchmark hidden because not significant (1): bench_collatz

Geometric mean: 1.08x faster
Benchmark script
# Quick benchmark for cpython long objects

import pyperf


def collatz(a):
    while a > 1:
        if a % 2 == 0:
            a = a // 2
        else:
            a = 3 * a + 1


def bench_collatz(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    for ii in range_it:
        collatz(ii)
    return pyperf.perf_counter() - t0


def bench_long(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    x = 10
    for ii in range_it:
        x = x * x
        y = x // 2
        x = y + ii + x
        if x > 10**10:
            x = x % 1000
    return pyperf.perf_counter() - t0


def bench_alloc(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()
    for ii in range_it:
        for kk in range(20_000):
            del kk
    return pyperf.perf_counter() - t0


# %timeit bench_long(1000)

if __name__ == "__main__":
    runner = pyperf.Runner()
    runner.bench_time_func("bench_collatz", bench_collatz)
    runner.bench_time_func("bench_long", bench_long)
    runner.bench_time_func("bench_alloc", bench_alloc)

On the pyperformance test suite (actually, a subset of the suite, not all benchmarks run on my system) shows the percentage of successfull freelist allocations increases significantly

Main:

Allocations from freelist 	2,004,971,371 	39.8%
Frees to freelist 	2,005,350,418 	
Allocations 	3,034,877,938 	60.2%
Allocations to 512 bytes 	3,008,791,812 	59.7%
Allocations to 4 kbytes 	18,648,072 	0.4%
Allocations over 4 kbytes 	7,438,054 	0.1%
Frees 	3,142,033,922

PR

Allocations from freelist 	3,058,347,887 	58.6%
Frees to freelist 	3,058,576,117 	
Allocations 	2,159,771,546 	41.4%
Allocations to 512 bytes 	2,133,373,693 	40.9%
Allocations to 4 kbytes 	18,802,328 	0.4%
Allocations over 4 kbytes 	7,595,525 	0.1%
Frees 	2,267,538,686

@eendebakpt eendebakpt changed the title Draft: Add freelist of compact int objects Draft: gh-126868: Add freelist for compact int objects Nov 15, 2024
@eendebakpt eendebakpt marked this pull request as draft November 15, 2024 12:50
@mdboom
Copy link
Contributor

mdboom commented Nov 15, 2024

I'm running this PR over pyperformance on our benchmarking hardware. It will take ~3 hours.

@mdboom
Copy link
Contributor

mdboom commented Nov 15, 2024

I'm running this PR over pyperformance on our benchmarking hardware. It will take ~3 hours.

Actually, scratch that -- I'll wait until the tests are passing here. That's required for PGO builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants