Downsample 8bit particle #113

joonp9 · 2022-03-31T21:32:58Z

Hi,

I have about 3 million particles that contribute to at least 3 conformations. At first, I tried to simply downsample my particles to 160 box size, but it later gave me an error saying I need hundreds of GB of GPU memory, which I do not have. After hearing a talk, I decided to change my particles to 8 bit using e2proc2d.py --mrc8bit (I know this is deprecated, but I wasn't sure whether I should use --outmode int8 or uint8). When I try to downsample new 8bit particles, I get following error:

Is my approach completely wrong? I don't have much experience, so I wanted to get some insight on this.

If there is any other suggestions for dealing with very large particle set, please let me know.

Thank you,
Joon

michal-g · 2024-09-09T00:16:41Z

In older versions of cryoDRGN the float32 number format was hard-coded in a lot of places in the code for .mrc files, thus leading to the error above. I tested the more recent v3.4.0 version of cryoDRGN using int8 and int16 files produced using e.g. e2proc2d.py ... --outmode=int8 and found that there were still errors due to an incomplete specification of the MRCHeader.DTYPE_FOR_MODE class variable:

old

DTYPE_FOR_MODE = {
        0: np.int8,
        1: np.int16,
        2: np.float32,
        3: "2h",  # complex number from 2 shorts
        4: np.complex64,
        6: np.uint16,
        12: np.float16,
        16: "3B",
    }  # RBG values

new (ae6458d)

DTYPE_FOR_MODE = {
        0: np.uint8,
        1: np.int16,
        2: np.float32,
        3: "2h",  # complex number from 2 shorts
        4: np.complex64,
        6: np.uint16,
        12: np.float16,
        16: "3B",
        17: np.int8,
    }  # RBG values

The downsample command also forced output to be in float32 format even if the input was e.g. int8, which has also been fixed e.g. here in dc182ff:

old

header = MRCHeader.make_default_header(nz=src.n, ny=new_D, nx=new_D, Apix=new_apix, data=None, is_vol=False)

new

header = MRCHeader.make_default_header(nz=src.n, ny=new_D, nx=new_D, Apix=new_apix, dtype=src.dtype, is_vol=False)

Finally, in newer versions of cryoDRGN we have implemented the -b batch processing flag for downsample which is used to process image input in smaller chunks (as opposed to --chunk which outputs downsampled images to smaller chunked files). We recommend trying this first to avoid memory issues instead of messing with the number format of your image stack.

Sorry for the much belated reply, and hope this is of help to others who are running into problems with memory usage or number formats when using downsample!

michal-g self-assigned this Sep 9, 2024

michal-g mentioned this issue Sep 11, 2024

v3.4.0: Plotting class labels, RELION 3.1 support, and phase-randomization for FSCs #399

Merged

michal-g closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downsample 8bit particle #113

Downsample 8bit particle #113

joonp9 commented Mar 31, 2022

michal-g commented Sep 9, 2024 •

edited

Loading

Downsample 8bit particle #113

Downsample 8bit particle #113

Comments

joonp9 commented Mar 31, 2022

michal-g commented Sep 9, 2024 • edited Loading

michal-g commented Sep 9, 2024 •

edited

Loading