Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsample 8bit particle #113

Closed
joonp9 opened this issue Mar 31, 2022 · 1 comment
Closed

Downsample 8bit particle #113

joonp9 opened this issue Mar 31, 2022 · 1 comment
Assignees

Comments

@joonp9
Copy link

joonp9 commented Mar 31, 2022

Hi,

I have about 3 million particles that contribute to at least 3 conformations. At first, I tried to simply downsample my particles to 160 box size, but it later gave me an error saying I need hundreds of GB of GPU memory, which I do not have. After hearing a talk, I decided to change my particles to 8 bit using e2proc2d.py --mrc8bit (I know this is deprecated, but I wasn't sure whether I should use --outmode int8 or uint8). When I try to downsample new 8bit particles, I get following error:

Screen Shot 2022-03-31 at 4 31 41 PM

Is my approach completely wrong? I don't have much experience, so I wanted to get some insight on this.

If there is any other suggestions for dealing with very large particle set, please let me know.

Thank you,
Joon

@michal-g
Copy link
Collaborator

michal-g commented Sep 9, 2024

In older versions of cryoDRGN the float32 number format was hard-coded in a lot of places in the code for .mrc files, thus leading to the error above. I tested the more recent v3.4.0 version of cryoDRGN using int8 and int16 files produced using e.g. e2proc2d.py ... --outmode=int8 and found that there were still errors due to an incomplete specification of the MRCHeader.DTYPE_FOR_MODE class variable:

old

DTYPE_FOR_MODE = {
        0: np.int8,
        1: np.int16,
        2: np.float32,
        3: "2h",  # complex number from 2 shorts
        4: np.complex64,
        6: np.uint16,
        12: np.float16,
        16: "3B",
    }  # RBG values

new (ae6458d)

DTYPE_FOR_MODE = {
        0: np.uint8,
        1: np.int16,
        2: np.float32,
        3: "2h",  # complex number from 2 shorts
        4: np.complex64,
        6: np.uint16,
        12: np.float16,
        16: "3B",
        17: np.int8,
    }  # RBG values

The downsample command also forced output to be in float32 format even if the input was e.g. int8, which has also been fixed e.g. here in dc182ff:

old

header = MRCHeader.make_default_header(nz=src.n, ny=new_D, nx=new_D, Apix=new_apix, data=None, is_vol=False)

new

header = MRCHeader.make_default_header(nz=src.n, ny=new_D, nx=new_D, Apix=new_apix, dtype=src.dtype, is_vol=False)

Finally, in newer versions of cryoDRGN we have implemented the -b batch processing flag for downsample which is used to process image input in smaller chunks (as opposed to --chunk which outputs downsampled images to smaller chunked files). We recommend trying this first to avoid memory issues instead of messing with the number format of your image stack.

Sorry for the much belated reply, and hope this is of help to others who are running into problems with memory usage or number formats when using downsample!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants