Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

niftiRead.m: Include an option to call different functions depending on nifti file size #257

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

htakemur
Copy link
Contributor

Currently, niftiRead.m produces problem to read a huge nifti file (e.g. 10GB). Revision includes that the niftiRead.m will call older (slower) code more robust for bigger file, if the nifti file size is over a certain threshold (3GB by default).

I'm not sure how much this makes sense to you or not, and the selection of 3GB is arbitrary. Also, this may affect fMRI processing as well.

Please give me any feedbacks if you had any thoughts.

@JWinawer
Copy link
Member

Hi Hiromasa,

You are the second person to point out problems with large nifti files. (The first problem was writing large files - yours is reading).

These problems do not manifest the same way on all machines. They seem to be dependent on hardware (probably RAM) and maybe the OS.

For example, I tested code that allows me to read and write a Nifti that is 12 GB and it succeeds perfectly (that is, the data gets created, written with niftiWrite, read back with niftiRead, and this is identical to the original data).

So I am not sure we should make this change mandatory for everyone.

One option would be to make a new Vista preference field.

You can see the current preferences with:
getpref('VISTA')

A better option, though will take some work, is to figure out why the mex function fails with large files whereas the matlab function succeeds. I am guessing that it has something to do with the way the C-code allocates memory, but I am not sure.

When your niftiRead fails for a large file, do you get an error?

By the way, this was my test function to read/write a 12 GB file. I would not advise running this unless you have a lot of RAM!

nii = niftiCreate;
 
data = rand(100,100,100,1500); 
s = whos('data');
fprintf('data is %5.3f GB \n', s.bytes * 1e-9);
 
 
nii = niftiSet(nii, 'data', data);
nii = niftiSet(nii, 'dim', size(data));
nii = niftiSet(nii, 'pixdim', [1 1 1 1]);
clear data;
 
fname = fullfile(tempdir, 'myNifit.nii.gz');
 
niftiWrite(nii, fname);
 
nii2 = niftiRead(fname);
 
assert(isequal(nii.data, nii2.data));

@htakemur
Copy link
Contributor Author

Hi Jon,

Thanks for comments!

I did not get any error message when the mex code fails to import large nifti files. It produces a matrix with identical size of input data, but some part of the data becomes zero.

This may be a source code of mex file (written in C).
https://github.com/vistalab/vistasoft/blob/60c9231ef00bb3ca859ee3b427275f5fba2a4330/fileFilters/nifti/C-code/readFileNifti.c

I have no experience in writing C-code and this might be hard to find out a solution. However it seems to be the code is splitting jobs to allocate memory. There might be something, but a bit difficult to tell.

Given the dependency on environment, the tentative solution may be vistasoft preferences, as you suggested.

@j-ales
Copy link
Contributor

j-ales commented Jul 27, 2017

This will be a problem caused by various potential issues. The total number of elements/voxels in the dataset, which is correlated with the size of the file could be the issue. If your hitting 32bit code from the 2000's. When a data set requires a number bigger than 2^32 somewhere. The exact problems arise depending on what part of the chain is limited to a 32 number.

Mex files need to be compiled with -largeArrayDims. Otherwise when a mex file attempts to create a matrix with more than 2^32-2 elements should throw an error "out of memory". Might be good to use that in compiling the mex files. But I don't think this is the problem you're hitting because there's no out of memory message.

readFileNifti relies on the niftilib to perform all the nifti reading.

nifti1_io.c uses a type "size_t" to represent the number of voxels in a file (As well as in lots of places). Size_t type is for representing memory addresses and should be 64 bit on 64 bit systems. But this can potentially cause differences between systems/compilers. Because what value this type takes depends on both the system used to compile the code and the options used. E.g. The *.mexmaci64 file would handle it fine but not the *.mexmaci binary even though both use the same c code (but that's a silly example because it would bomb out with an out of memory error as explained above).

Vistasfot is using an older version of the niftilib (1.37 10 Mar 2009). I'm guessing the real problem is caused by a bug in niftilib that caused it to choke on large gzipped files because of use of an "int" type incorrectly instead of "size_t".

From the current niftilib code on

"1.42 06 Jul 2010 [rickr]: trouble with large (gz) files\n",
" - noted/investigated by M Hanke and Y Halchenko\n"
" - fixed znzread/write, noting example by M Adler\n"
" - changed nifti_swap_* routines/calls to take size_t (6)\n"

Updating nifti* files to the versions from: https://nifti.nimh.nih.gov/pub/dist/src/niftilib/
and recompiling the mex files should fix the problem. I'd also suggest adding -largeArrayDims to the mex options.

@JWinawer
Copy link
Member

JWinawer commented Jul 27, 2017 via email

@j-ales
Copy link
Contributor

j-ales commented Jul 28, 2017

That's a limitation of the nifti1 specification. 2^15-1 is the maximum value for a signed 16 bit integer.
The nifiti format stores dimensions in the header as a short which is 16 bits. (See: https://nifti.nimh.nih.gov/pub/dist/src/niftilib/nifti1.h).
I haven't looked at the code to see how/if it handles an overflow in a dimension. But overflows can cause weird unspecified behaviours.

@JWinawer
Copy link
Member

Yes, @j-ales, you are right, and the 2^15-1 limit is probably unrelated to @htakemur 's problem. It is probably reasonable enough to assume that no dimension of the nifti needs more than 2^15 elements.

Separately, I noticed that readFileNifti casts dims as an int. I think this is wrong. I think it should be cast as mwSize. I changed this and recompiled the mex (for mexmaci64) and put this in a new branch called readFileNiftiFix.

@htakemur : Could you try this branch and see if you are able to read large files? If you are not using 64 bit Mac, then you will want to recompile readFileNifti after checking out the new branch. You could test this code (changing the variable GB to whatever value you want), or try reading your own large files.

GB = 5;
mx = round(GB*1e9/8);
data = reshape(1:mx, 100, 100, 10, []);
s = whos('data');
fprintf('data is %5.3f GB \n', s.bytes * 1e-9);

nii = niftiCreate('data', data); clear data;
fname = fullfile(tempdir, 'myNifti.nii');
niftiWrite(nii, fname); clear nii;
nii = niftiRead(fname);
disp(size(nii.data))

This works fine for me, even when GB is set to more than 10 (I tested up to 12 ).

@arokem
Copy link
Member

arokem commented Jul 29, 2017

A comment/question from the peanut gallery: is file IO really a performance bottle-neck for anyone? Would it make sense to get rid of the maintenance burden of keeping this C code up-to-date and error free, and rely only on the Matlab code for file IO? I imagine that file IO used to be very slow when this solution was first adopted. Is it still too slow to rely on the uncompiled version?

@JWinawer
Copy link
Member

JWinawer commented Jul 29, 2017 via email

@JWinawer
Copy link
Member

@arokem

Methinks you are correct. (Tested from 0.04GB to 4GB)
readfileniftilog

GB          = 4*(.1:.1:1).^2;
n           = length(GB);
niftiTime 	= zeros(1,n);
matTime     = zeros(1,n);
 
for ii = 1:n    
    disp(ii)
    dimsize = round((GB(ii)*1e9/8)^0.25);
    numElements = dimsize^4;
    data = reshape(1:numElements, dimsize, dimsize, dimsize, dimsize);
    s = whos('data');
    GB(ii) = s.bytes * 1e-9;

    nii   = niftiCreate('data', data); clear data;
    fname = fullfile(tempdir, 'myNifti.nii');
    
    disp('writing...')
    niftiWrite(nii, fname); clear nii;
    disp('reading...')
    
    tic, niftiRead(fname); niftiTime(ii)=toc;
    tic, niftiReadMatlab(fname); matTime(ii) = toc;


    
end

figure; set(gcf, 'Color', 'w');

plot(GB, niftiTime, 'o-', GB, matTime, 'x-', 'LineWidth', 4, 'MarkerSize', 12)
set(gca, 'FontSize', 16);
legend('niftiRead Mex', 'niftiRead M-file')
xlabel('File Size (GB)')
ylabel('Time (seconds)')

@htakemur
Copy link
Contributor Author

@JWinawer @arokem If the benefit to use C-code were the past issue, perhaps we could just simply rewrite niftiRead (and niftiWrite as well?) and make a new pull request?

@JWinawer
Copy link
Member

Yes, I agree with you both.

@htakemur, if you have a Linux machine, maybe you can run my code and test the speed of compiled v uncompiled?

@JWinawer
Copy link
Member

And in case anyone was wondering about writing, same story: no advantage for the compiled c-code.

readwriteniftis

So yes, I propose changing the niftiRead/Write wrappers to call m-files rather than mex files. We can leave the mex files in the repository in case anyone was calling them directly, or otherwise finds them useful. We need to validate the m-files to make sure they perform correctly (e.g., in terms of units, transforms, etc). I'll do this in the next week or two.

@htakemur
Copy link
Contributor Author

Hi @JWinawer

I have tested your code in my computer. The computer is Ubuntu 12.04 LTS. The MATLAB calls the readFileNifti.mexa64. I used matlab2016a.

In my environment, the mex file is significantly slower than m file. I don't know why, but here is the result. (I used the same code which you described above).

mexmatlabcomparison

@j-ales
Copy link
Contributor

j-ales commented Jul 30, 2017

That test only test the *.nii pipeline. It's the gzip pipeline that's the problem. The matlab code for reading nii is fine. It's what it does when it encounters a gz file that is slow: It copies the file to a new location, gunzips the file in that location, then reads the *.nii file. So instead of reading once it: reads,writes,reads,writes,reads which should be around 5x slower. But matlab gunzip() is also slow so for GZ files I get 15x slower operation in total. This depends somewhat on the compressibility of the file. If you want to test. I modified your benchmark code to include gzipping the file and made half of the data random so gz would compress it by a factor of ~2. For the c code there is no difference between gz and non gz files.

Note: In the below test the x-axis is for the uncompressed file size for the GZ case. The nii.gz file sizes are half the .gz files. I'm also using a version of readFileNifti compiled myself against the most recent version of niftilib.

Regarding @arokem comment about maintainability. You're right. But the vista c-code is just a mex wrapper for the nimh nifti reference code. The original problem is no one grabbed the new version of the niftilib code that fixed a gzip bug. Using the nifti reference code might be desirable as that's widely used code. However, it might be nice to have a matlab native version. The problem is in writing the code to efficiently handle directly reading a gzipped file. Bob wrote some code to use the java gzip class to read a gz file in matlab. Presumably that code could be used in the matlab load_nii commands. However, that trades compiled c for java. Java isn't guaranteed to be available to matlab code (although it's much more guaranteed than 10 years ago). So I'm not sure if that's a net positive.

@htakemur I get the opposite from you. I get the matlab code being 1.5x slower then the c-code for the *.nii case. Using OSX 10.9.5, and matlab 2014b.

gzippedcomp
gzippedcompnomatlabgz

GB          = 1*(.1:.1:1).^2;
n           = length(GB);
niftiTime 	= zeros(1,n);
matTime     = zeros(1,n);
 
for ii = 1:n    
    disp(ii)
    dimsize = round((GB(ii)*1e9/8)^0.25);
    numElements = dimsize^4;
    data = reshape(1:numElements, dimsize, dimsize, dimsize, dimsize);
    data(1:round(numElements/2)) = rand(round(numElements/2),1); %Make half the data random so not trivial for gz. 
    s = whos('data');
    GB(ii) = s.bytes * 1e-9;

    nii   = niftiCreate('data', data); clear data;
    fname = fullfile(tempdir, 'myNifti.nii');
    
    disp('writing...')
    niftiWrite(nii, fname); clear nii;
    disp('gzipping...')
    gzip(fname);
    
    disp('reading...')
    
    tic, niftiRead(fname); niftiTime(ii)=toc;
    tic, niftiReadMatlab(fname); matTime(ii) = toc;
    
    tic, niftiRead([fname '.gz']); niftiGzTime(ii)=toc;
    tic, niftiReadMatlab([fname '.gz']); matGzTime(ii) = toc;
    


    
end

figure; set(gcf, 'Color', 'w');

plot(GB, niftiTime, 'o-', GB, matTime, 'x-', 'LineWidth', 4, 'MarkerSize', 12);
hold on;
plot(GB, niftiGzTime, '+-', GB, matGzTime, 'd-', 'LineWidth', 4, 'MarkerSize', 12)

set(gca, 'FontSize', 16);
legend('niftiRead Mex', 'niftiRead M-file','GZIPPED niftiRead Mex', 'GZIPPED niftiRead M-file')

xlabel('File Size (GB)')
ylabel('Time (seconds)')

@JWinawer
Copy link
Member

This is helpful - I get the same result, meaning that for gzipped files, there is a big advantage for the compiled function.

To summarize, we have two different problems requiring two different solutions.

The first problem is that @htakemur could not read large nifti files. I am guessing that @j-ales is right that the reason is we are using out of data niftilib files. I updated these fils with the newer ones from NIH and out them in the branch readFileNiftiFix. This branch also has the newly compiled mex function for 64-bit mac (I cannot recompile the others as I don't have other types of machines). Hopefully, this will fix @htakemur 's problem. It should, given that the header comments in the newer nifti1_io.c indicate a 2010 change to deal with problems with large files.
@htakemur , can you checkout the branch readFileNiftiFix and tell us if this fixes your problem with large files? If you are using Ubuntu, you will need to re-compile the mex.

I'll create a pull request for that branch.

The second problem is that the m-file is slow, for all the reasons Justin mentioned. I can easily fix the problem of copying the zipped file and reading it multiple times, but that is only a small gain. The biggest problem is that Matlab's gunzip is slow, as you noted. I'd like to fix this too at some point but will leave it aside for now. Maybe java will be a solution. Not sure.

@htakemur
Copy link
Contributor Author

htakemur commented Aug 1, 2017

Hi @JWinawer

Thanks. I am now testing compiling, but it isn't very straightforward.

Following the notes for remembering for myself (not necessarily asking for a help).

  1. In many cases, the compiler installed in the system and supported by MATLAB does not match. In my MATLAB2015a (in Ubuntu 12), they support the gcc version until 4.7, not 4.8. But my computer installs 4.8. Then the MATLAB refuses to compile c-code because of this mismatch.

  2. Then I installed the older compiler, and type
    mex -v GCC='/usr/bin/gcc-4.7' readFileNifti.c nifti1_io.c znzlib.c -largeArrayDims

in order to call older version of gcc (4.7).

  1. Still, my matlab do not produce compilex mex file, and producing error message:

/usr/bin/gcc-4.7 -c -D_GNU_SOURCE -DMATLAB_MEX_FILE -I"/usr/local/MATLAB/R2015a/extern/include" -I"/usr/local/MATLAB/R2015a/simulink/include" -ansi -fexceptions -fPIC -fno-omit-frame-pointer -pthread -O -DNDEBUG /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/readFileNifti.c -o /tmp/mex_1336443893019682_1693/readFileNifti.o Error using mex In file included from /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/nifti1_io.h:20:0, from /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/readFileNifti.c:54: /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/znzlib.h:57:1: error: expected identifier or ‘(’ before ‘/’ token /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/znzlib.h:57:6: warning: missing terminating ' character [enabled by default] /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/znzlib.h:57:1: error: missing terminating ' character /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/znzlib.h:58:10: warning: missing terminating ' character [enabled by default] /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/znzlib.h:58:1: error: missing terminating ' character In file included from /usr/include/zlib.h:34:0, from /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/znzlib.h:65, from /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/nifti1_io.h:20, from /home/htakemur/matlab/git/vistasoft-readFileNiftiFix/fileFilters/nifti/C-code/readFileNifti.c:54: /usr/include/x86_64-linux-gnu/zconf.h:377:4: error: unknown type name ‘Byte’

@JWinawer
Copy link
Member

JWinawer commented Aug 2, 2017

I will look for an ubuntu machine in my dept and see if I can compile the c-code in the readFileNiftiFix branch.
I do not have one in my lab. @j-ales do you?

@htakemur
Copy link
Contributor Author

htakemur commented Aug 4, 2017

@JWinawer Sorry for taking your time...

@garikoitz
Copy link
Contributor

Maybe worth checking PR #330 for speed as well? It uses nii_tool instead of the mex file for reading, but I haven't check for speed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants