Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting memmaps in ArrayProxies is slower than loading into memory first (optimization opportunity) #1371

Open
clane9 opened this issue Oct 7, 2024 · 6 comments

Comments

@clane9
Copy link

clane9 commented Oct 7, 2024

It seems that reading uncompressed nifti volumes takes significantly longer when memory mapping is enabled. In this example, I load four large 4D nifti volumes from the ABCD dataset, with and without memory mapping.

import time

import nibabel as nib
import numpy as np

print(nib.__version__)


def benchmark_nibabel_load(path: str, mmap: bool):
    tic = time.monotonic()
    img = nib.load(path, mmap=mmap)
    data = np.asarray(img.get_fdata())
    rt = time.monotonic() - tic
    print(f"mmap: {mmap}, run time: {rt:.3f}s")


paths = [
    "/ABCD/sub-NDARINV0CCVJ39W/ses-2YearFollowUpYArm1/func/sub-NDARINV0CCVJ39W_ses-2YearFollowUpYArm1_task-rest_run-04_bold.nii",
    "/ABCD/sub-NDARINV0YE7L9KU/ses-2YearFollowUpYArm1/func/sub-NDARINV0YE7L9KU_ses-2YearFollowUpYArm1_task-rest_run-04_bold.nii",
    "/ABCD/sub-NDARINV0F82C6R8/ses-2YearFollowUpYArm1/func/sub-NDARINV0F82C6R8_ses-2YearFollowUpYArm1_task-rest_run-02_bold.nii",
    "/ABCD/sub-NDARINV0V80916L/ses-baselineYear1Arm1/func/sub-NDARINV0V80916L_ses-baselineYear1Arm1_task-rest_run-04_bold.nii",
]

mmaps = [True, False, True, False]

for path, mmap in zip(paths, mmaps):
    benchmark_nibabel_load(path, mmap=mmap)

This is what I get, using nibabel v5.2.1:

mmap: True, run time: 90.764s
mmap: False, run time: 3.595s
mmap: True, run time: 37.405s
mmap: False, run time: 4.810s

Any idea why this might happen?

@effigies
Copy link
Member

effigies commented Oct 10, 2024

.get_fdata() casts to a floating point dtype, float64 by default. You may prefer just np.asanyarray(img.dataobj), which will ensure that scaling is applied but preserve the memmap if no scaling is necessary.

If you know the dtype you want to operate in, np.int16img.dataobj) (for example) would work. If you're using targeting float32, img.get_fdata(dtype='float32') will work and cache the result for the next call.

@clane9
Copy link
Author

clane9 commented Oct 15, 2024

Thanks @effigies, I think these are good strategies for interacting with the memmap data. But still I'm confused why the simple get_fdata() is so much slower when mmap=True (the default)?

Regardless, I will just set mmap=False for my case and avoid the issue, so it is not too critical to get to the bottom of.

@effigies
Copy link
Member

I don't really know why the memmap is slower. I would probably profile it and see where it's spending the time.

@clane9
Copy link
Author

clane9 commented Oct 16, 2024

Ya I did a quick profiling, with mmap=True, it's spending almost all of the time here, doing casting it seems.

image

With mmap=False, the casting takes less time and most of the time is spent doing the read

image

I guess when the array is a memmap, astype() does an implicit read, which is slower for some reason than the explicit read when mmap=False.

@effigies
Copy link
Member

I'm not sure there's much to be done here. That said, if you're up for it, you could try looking into and benchmarking other approaches to scaling. e.g., pre-allocating an array with np.empty(dtype=target_dtype) and assigning its values with np.copyto.

That will move more of the branching logic (e.g., are dtypes the same) into nibabel, and it has been nice to let numpy make those decisions and trust them to be efficient. It could be worth upstreaming the solution (if we find one) into numpy's memmap.astype() implementation.

@clane9
Copy link
Author

clane9 commented Oct 16, 2024

Ya I agree, I feel like there's not much worth doing at this point. Especially since in this case mmap=False is fine. Just wanted to make a note since it did confuse us for a little while before we tracked it down. Will let you know if we end up feeling motivated and decide to look for some improvement.

@effigies effigies changed the title Slow reading of uncompressed nifti volumes when memory mapping is enabled Casting memmaps in ArrayProxies is slower than loading into memory first (optimization opportunity) Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants