Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only ~16 bits of library ASLR entropy when vm.mmap_rnd_bits is 18 #323

Open
kenballus opened this issue Aug 26, 2024 · 4 comments
Open

Only ~16 bits of library ASLR entropy when vm.mmap_rnd_bits is 18 #323

kenballus opened this issue Aug 26, 2024 · 4 comments

Comments

@kenballus
Copy link

(This might be expected behavior. Apologies in advance.)

I saw this blog post about ASLR behaving weirdly on recent x86 Linux kernels, and was curious if Asahi was also affected.

By default, vm.mmap_rnd_bits is 18 on Asahi Fedora 40. Thus, we should expect that there are something like $2^{18}$ equally-likely possible values for the base address of libc.

Under that assumption, if we collect 603 libc base addresses, we should expect to see about a 50% chance of there being at least one duplicate among them1.

I wrote this little Python script to collect a bunch of libc base addresses and check for duplicates, then report how often it hits at least one duplicate:

import multiprocessing
import subprocess

# Number of times we run the experiment
ITERATIONS: int = 1000

# Number of addresses we collect per experiment
SAMPLE_SIZE: int = 603

def experiment(sample_size: int) -> bool:
    """Grabs the libc base address sample_size times, returns whether any of them are duplicates"""
    seen: set[int] = set()
    for _ in range(sample_size):
        addr: int = int(
            subprocess.run(
                ["sh", "-c", "grep libc /proc/self/maps | head -n 1 | cut -d'-' -f 1"],
                capture_output=True,
            ).stdout,
            16,
        )
        if addr in seen:
            return True
        seen.add(addr)
    return False

with multiprocessing.Pool(multiprocessing.cpu_count() // 2) as pool:
    print(sum(pool.map(experiment, [SAMPLE_SIZE] * ITERATIONS)) / ITERATIONS)

The math would lead us to expect this script to spit out something around 0.5, but it reports something around 0.9 on my M1 MBA. (Feel free to run it for yourself; only takes a minute or two). This would suggest that the distribution of libc base addresses has less than 18 bits of entropy.

If you change SAMPLE_SIZE to 302, then the script will report a value close to 0.5. By the same math as before, this would suggest about 16 bits of effective entropy, instead of 18.

I'm puzzled as to why we seem to observe only 25% as many possible addresses as we should. Some potential reasons:

  • ASLR addresses are either not independent or not drawn from a uniform distribution on possible values.
  • Libc doesn't fit in 75% of the address space.
    • Seems unlikely for a large address space.
  • I'm missing some big thing about the way ASLR works and this is totally expected behavior
    • Seems likely
  • There's a problem with my statistics
    • Seems likely, but I did check my work with a couple of statistician friends!

I'd really appreciate it if anyone has any ideas about the cause of this, or could suggest a more appropriate place to bring this up.

Thanks!

Footnotes

  1. This can be found by setting $n=2^{18}$ and solving for $k$ in this. You can check your work here.

@AsahiLinux AsahiLinux deleted a comment Aug 26, 2024
@AsahiLinux AsahiLinux deleted a comment Aug 26, 2024
@AsahiLinux AsahiLinux deleted a comment Aug 26, 2024
@AsahiLinux AsahiLinux deleted a comment Aug 26, 2024
@AsahiLinux AsahiLinux deleted a comment Aug 26, 2024
@AsahiLinux AsahiLinux deleted a comment Aug 26, 2024
@jannau
Copy link
Member

jannau commented Aug 26, 2024

another unlikely possibility: 2 bits are wasted on randomizing 4k aligned addresses instead of 16k

no, the section alignment is anyway 64k and the addresses are aligned to that

@DavidBuchanan314
Copy link

DavidBuchanan314 commented Aug 28, 2024

I can replicate this result. Just as a sanity check of the experimental setup, I tried replacing the subprocess with addr = secrets.randbits(18) and confirmed that it produced the expected result of ~0.5.

Here's my optimised version of the experiment, which spawns fewer subprocesses and therefore runs about 5x faster: (not hugely important but it makes testing things faster)

import multiprocessing
import subprocess

# Number of times we run the experiment
ITERATIONS: int = 1000

# Number of addresses we collect per experiment
SAMPLE_SIZE: int = 603

def experiment(sample_size: int) -> bool:
    """Grabs the libc base address sample_size times, returns whether any of them are duplicates"""
    seen: set[int] = set()
    for _ in range(sample_size):
        lines = subprocess.run(
            ["cat", "/proc/self/maps"],
            capture_output=True,
        ).stdout.decode().split("\n")
        addr = [int(l.split("-")[0], 16) for l in lines if "libc" in l][0]
        if addr in seen:
            return True
        seen.add(addr)
    return False

with multiprocessing.Pool(multiprocessing.cpu_count() // 2) as pool:
    print(sum(pool.map(experiment, [SAMPLE_SIZE] * ITERATIONS)) / ITERATIONS)

I noticed checking the addresses of [stack] and /usr/bin/cat provide similar results, whereas [heap] actually does produce the expected value of ~0.5.

By observation, [heap] (which I think gets mmapped by libc?) appears to be 16k aligned rather than 64k aligned like the various LOAD segments. It does look like alignment might be something to do with it.

Unproven hypothesis: if you have a 16k kernel with 64k-aligned userland binaries, you throw away 2 bits of entropy relative to whatever vm.mmap_rnd_bits is supposed to provide.

@kenballus
Copy link
Author

Thanks for the feedback.

I tried running the script on some other aarch64 platforms.

  1. QEMU aarch64 (virt-9.0)
    i. OS: ARMbian aarch64, Linux 6.6.47
    ii. vm.mmap_rnd_bits: 18
    iii. Page size: 4k
    iv. Collision rate for sample size of 151, and 100 iterations: 0.5

  2. Raspberry Pi 3 Model B v1.2
    i. OS: Raspberry Pi OS 11 aarch64, Linux 6.1.0
    ii. vm.mmap_rnd_bits: 18
    iii. Page size: 4k
    iv. Collision rate for sample size of 151, and 1000 iterations: 0.517

  3. Raspberry Pi 3 Model B v1.2
    i. OS: Raspberry Pi OS 10 aarch64, Linux 5.4.51
    ii. vm.mmap_rnd_bits: 18
    iii. Page size: 4k
    iv. Collision rate for sample size of 151, and 100 iterations: 0.46

The takeaway here is that systems with 4k pages have this same problem, but even worse by another ~2 bits. I'm going to keep going back in time on the rpi to see if this has always been the behavior, or if this is a regression.

@DavidBuchanan314
Copy link

@kenballus did the binaries you tested against on those systems have 4k-aligned segments, too?

If my hypothesis is correct, loading a 64k-aligned binary on a 4k kernel with vm.mmap_rnd_bits=18 will result in only ~14 bits of entropy.

(If you want a test binary, /usr/sbin/busybox from asahi fedora should work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@jannau @DavidBuchanan314 @kenballus and others