-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge gz file #44
Comments
Coincidently, one of the ideas that I've had at the back of my head for a while was a "lazy" NIfTI volume implementation, which would not pull all of the volume from the file to memory. It would be backed by some sort of paged caching mechanism, thus restricting memory usage while preventing an excessive number of file reads. Combined with adaptor methods for retrieving volume slices, I believe that such a volume would solve that particular problem. On the other hand, it's in fact weird that the program can allocate enough memory for the volume, but fail to read it afterwards. Something else might be at play here, so I would like to look into this as well. |
I've been bitten by this bug while processing huge multishell DWI datasets. The workaround of using uncompressed NIFTI files works at the expense of disk space. |
I also agree that this is an important (and fairly challenging) matter. So far, I've thought of two non-exclusive ways to overcome the problem:
|
The ideas in this issue might still be interesting, but the bug has been found and solved in flate2. tl;dr There was an infinite loop on huge files. |
Great to know! I say we keep the issue open nonetheless, as a more memory-efficient solution for reading large volumes may still be useful. |
We received an big image from the Human Connectome Project, nothing huge, but we needed to resample it to 1x1x1 and now it's 2.3Gb in
.nii.gz
and 8.0Gb in.nii
. It's a 181x218x181x288 f32 image, thus allocating 8 227 466 496 bytes and reading from a Gz source, hereI tested and it doesn't seem to be a memory issue, in the sense that it does reach the
read_exact
line, but then it's stuck for, err, long enough that I kill the job. 7zip decodes it in ~1m40s, nifti-rs reads the non-gz version in ~10s. For the gz version, it allocates ~3750Mb, then run indefinitely (max we waited was 1 hour) while always using one process, so it's doing something.We will probably work with HCP image in the future so we might want to contribute a solution to this problem. I'm not sure how to solve this though! Do you think a chunk version would work? Something like:
It might slow down the reading of "normal"/smaller images, but we can probably create a different code path for "big" images. What do you think?
The text was updated successfully, but these errors were encountered: