-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage #17
Comments
Hi @fjorka let's first explore our memory profiling options. I just created a script and ran using memory_profiler. # script.py
import nd2
from memory_profiler import profile
import numpy as np
@profile
def main():
f = nd2.ND2File("big.nd2")
x = f.to_xarray()
# instead of for loop... easier to see effect of each line in report
a = x.isel(C=0, Z=0, T=np.arange(0, 10)).compute()
b = x.isel(C=0, Z=0, T=np.arange(10, 20)).compute()
c = x.isel(C=0, Z=0, T=np.arange(20, 30)).compute()
d = x.isel(C=0, Z=0, T=np.arange(30, 40)).compute()
e = x.isel(C=0, Z=0, T=np.arange(40, 50)).compute()
f = x.isel(C=0, Z=0, T=np.arange(50, 60)).compute()
g = x.isel(C=0, Z=0, T=np.arange(60, 70)).compute()
h = x.isel(C=0, Z=0, T=np.arange(70, 80)).compute()
i = x.isel(C=0, Z=0, T=np.arange(80, 90)).compute()
j = x.isel(C=0, Z=0, T=np.arange(90, 100)).compute()
if __name__ == "__main__":
main() then run with I get the following output
Though the file is 15GB, it looks to be allocating about what I'd expect for each chunk. If you get something dramatically different with your file, I might want to play with it? 😬... i know it's a lot to ask, but let me know if you can share it somehow (dropbox, etc...) |
Hi @tlambert03
Gives the profile of:
I shared with you the file from the above example. The one from the previous example is ~0.5TB (multi position time-lapse) but I can figure out sharing it too if you would like to work with it. |
thanks! I downloaded it. You know... one thing that is probably important to mention here, which I should have thought of earlier... is that nd2 files are not (natively) chunked along the channel axis. So when you load 1 channel for a given timepoint, you load them all. you should be able be able to save memory by only loading a Z, or T subset... but chunking in channels will require some additional functionality that isn't natively supported by the nd2 format. (still possible). one additional observation: try leaving xarray out of the loop. Use just
...and remember that if any of those dimensions are XY or C, it won't save memory (until that's added) |
Thanks for the explanation @tlambert03! I re-wrote the code to actually load only single time points and arrange them later.
A single 'im' is around 4GB but this loop takes ~18-24GB of RAM to execute (never less than 18GB after initial loading). In my mind, it should never open more than a single time point and in general require around 4GB of RAM. Do you have any insights about what I can do better here. |
Something in lines of that would be a nice addition to this library. It is possible to read just one xy-plane at a time from the nd2 file and discard the data for the irrelevant channels, the downside is that all data has to be re-read for each channel (not sure how it works with time series). To reduce RAM usage even more the data could be streamed to disk as it is read; but that might be out of scope here. |
Description
I try to load selected parts of nd2 files but too much memory is allocated for the objects that need to be computed. As a consequence, it fails to load objects that are bigger than ~4 times available memory.
What I Did
Test on a time lapse experiment:
Test on a single time point big image:
In the second example the memory allocation is correct when it has to compute the whole file.
It may be related to the problem of calculating object size incorrectly as shown here:
The text was updated successfully, but these errors were encountered: