Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

[BUG] Scrublet OverflowError: cannot serialize a bytes object larger than 4 GiB #214

Closed
IrinaMate opened this issue May 15, 2020 · 2 comments · Fixed by #224
Closed

[BUG] Scrublet OverflowError: cannot serialize a bytes object larger than 4 GiB #214

IrinaMate opened this issue May 15, 2020 · 2 comments · Fixed by #224
Labels
bug Something isn't working

Comments

@IrinaMate
Copy link

Pipeline: v.0.19
Running scublet on a 10x sample of 23k cells and the expression matrix gets to big and runs out of space or something

WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'single_sample_scrublet:SCRUBLET__DOUBLET_REMOVAL:SC__SCRUBLET__DOUBLET_DETECTION (1)'

Caused by:
Process single_sample_scrublet:SCRUBLET__DOUBLET_REMOVAL:SC__SCRUBLET__DOUBLET_DETECTION (1) terminated with an error exit status (1)

Command executed:

/user/leuven/330/vsc33098/.nextflow/assets/vib-singlecell-nf/vsn-pipelines/src/scrublet/bin/sc_doublet_detection.py --use-variable-features False --technology 10x --h5ad-with-variable-features-info TEW__88a474__TotSeqA_Hashing_Nuclei_MCF7_PC3_MDAMB231_DU145_cdna___TEW__ad4622__TotSeqA_Hashing_Nuclei_MCF7_PC3_MDAMB231_DU145_hto.SC__SCANPY__DIM_REDUCTION_PCA.h5ad --output-prefix "TEW__88a474__TotSeqA_Hashing_Nuclei_MCF7_PC3_MDAMB231_DU145_cdna___TEW__ad4622__TotSeqA_Hashing_Nuclei_MCF7_PC3_MDAMB231_DU145_hto.SC__SCRUBLET__DOUBLET_DETECTION" TEW__88a474__TotSeqA_Hashing_Nuclei_MCF7_PC3_MDAMB231_DU145_cdna___TEW__ad4622__TotSeqA_Hashing_Nuclei_MCF7_PC3_MDAMB231_DU145_hto.SC__FILE_CONVERTER.h5ad

Command exit status:
1

Command output:
Preprocessing...
Simulating doublets...
Embedding transcriptomes using PCA...
Calculating doublet scores...
Automatically set threshold at doublet score = 0.22
Detected doublet rate = 12.8%
Estimated detectable doublet fraction = 74.5%
Overall doublet rate:
Expected = 10.0%
Estimated = 17.2%
Elapsed time: 65.0 seconds

Command error:
Traceback (most recent call last):
File "/user/leuven/330/vsc33098/.nextflow/assets/vib-singlecell-nf/vsn-pipelines/src/scrublet/bin/sc_doublet_detection.py", line 233, in
pickle.dump(scrub, f)
OverflowError: cannot serialize a bytes object larger than 4 GiB

Work dir:
/ddn1/vol1/staging/leuven/stg_00002/lcb/lcb_projects/TEW/Hashing/Cells/10x/vsn_pipeline/TEW__88a474__TotSeqA_Hashing_Nuclei_MCF7_PC3_MDAMB231_DU145/work/77/f116c577cbb00a6d5b3e000ec25313

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

@IrinaMate IrinaMate added the bug Something isn't working label May 15, 2020
@dweemx
Copy link
Contributor

dweemx commented May 15, 2020

Hey @IrinaMate,
I added a possible fix in the develop branch; could you give it a try ?

@IrinaMate
Copy link
Author

Hey @dweemx it works. Thanks a lot for the fast fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants