Huge gz file #44

nilgoyette · 2019-02-21T15:21:48Z

We received an big image from the Human Connectome Project, nothing huge, but we needed to resample it to 1x1x1 and now it's 2.3Gb in .nii.gz and 8.0Gb in .nii. It's a 181x218x181x288 f32 image, thus allocating 8 227 466 496 bytes and reading from a Gz source, here

let mut raw_data = vec![0u8; nb_bytes_for_data(header)?];
source.read_exact(&mut raw_data)?;

I tested and it doesn't seem to be a memory issue, in the sense that it does reach the read_exact line, but then it's stuck for, err, long enough that I kill the job. 7zip decodes it in ~1m40s, nifti-rs reads the non-gz version in ~10s. For the gz version, it allocates ~3750Mb, then run indefinitely (max we waited was 1 hour) while always using one process, so it's doing something.

We will probably work with HCP image in the future so we might want to contribute a solution to this problem. I'm not sure how to solve this though! Do you think a chunk version would work? Something like:

out = image of right dimension
buffer = vec![0; 1024]
while not eof
    read chuck
    reinterpret to input type
    cast to requested type
    linear_transform
    assign to out  at right place.
return out

It might slow down the reading of "normal"/smaller images, but we can probably create a different code path for "big" images. What do you think?

The text was updated successfully, but these errors were encountered:

Enet4 · 2019-02-21T17:50:44Z

Coincidently, one of the ideas that I've had at the back of my head for a while was a "lazy" NIfTI volume implementation, which would not pull all of the volume from the file to memory. It would be backed by some sort of paged caching mechanism, thus restricting memory usage while preventing an excessive number of file reads. Combined with adaptor methods for retrieving volume slices, I believe that such a volume would solve that particular problem.

On the other hand, it's in fact weird that the program can allocate enough memory for the volume, but fail to read it afterwards. Something else might be at play here, so I would like to look into this as well.

fmorency · 2019-04-12T13:49:33Z

I've been bitten by this bug while processing huge multishell DWI datasets. The workaround of using uncompressed NIFTI files works at the expense of disk space.

Enet4 · 2019-04-12T14:52:57Z

I also agree that this is an important (and fairly challenging) matter. So far, I've thought of two non-exclusive ways to overcome the problem:

As I stated above, one could have a lazy implementation that would keep the file open and consume the stream as values are requested, keeping only a few portions of past data in memory. This comes with a caveat in GZip compressed volumes, because if the user wishes to read a value far back in the volume data, the the program would have to reopen the file and deflate the byte stream from the beginning. This could be automated, albeit with some implications in performance predictability.
Recently, I've also been thinking about providing an alternative public API, with no arbitrary indexing capabilities, but still with a means to iterate through slices of the volume. This is easier to implement and should still reflect most use cases without becoming unergonomic.

nilgoyette · 2019-06-21T12:13:42Z

The ideas in this issue might still be interesting, but the bug has been found and solved in flate2. tl;dr There was an infinite loop on huge files.

Enet4 · 2019-06-21T14:30:19Z

Great to know! I say we keep the issue open nonetheless, as a more memory-efficient solution for reading large volumes may still be useful.

Enet4 mentioned this issue Jun 27, 2019

Streamed Volumes #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge gz file #44

Huge gz file #44

nilgoyette commented Feb 21, 2019

Enet4 commented Feb 21, 2019

fmorency commented Apr 12, 2019

Enet4 commented Apr 12, 2019 •

edited

Loading

nilgoyette commented Jun 21, 2019

Enet4 commented Jun 21, 2019

Huge gz file #44

Huge gz file #44

Comments

nilgoyette commented Feb 21, 2019

Enet4 commented Feb 21, 2019

fmorency commented Apr 12, 2019

Enet4 commented Apr 12, 2019 • edited Loading

nilgoyette commented Jun 21, 2019

Enet4 commented Jun 21, 2019

Enet4 commented Apr 12, 2019 •

edited

Loading