Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing test h5ad files needed for continue pretraining pipeline #12

Closed
WhenMelancholy opened this issue Dec 19, 2024 · 2 comments
Closed
Labels
enhancement New feature or request question Further information is requested

Comments

@WhenMelancholy
Copy link

Hi Team, I greatly appreciate your work on this project. The code is very well organized and the documentation is comprehensive, which made it easy to get started. Thank you for making this valuable contribution open source.

While implementing the continue pretraining pipeline, I noticed that some h5ad files required for testing are missing. Specifically, I encountered the following error when trying to run the validation:

FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = '.../data/gNNpgpo6gATjuxTE7CCp.h5ad', errno = 2, error message = 'No such file or directory')

I also found there are some other datasets used for validation but not included in the repository:

# testdatasets=['/R4ZHoQegxXdSFNFY5LGe.h5ad', '/SHV11AEetZOms4Wh7Ehb.h5ad',
# '/V6DPJx8rP3wWRQ43LMHb.h5ad', '/Gz5G2ETTEuuRDgwm7brA.h5ad', '/YyBdEsN89p2aF4xJY1CW.h5ad',
# '/SO5yBTUDBgkAmz0QbG8K.h5ad', '/r4iCehg3Tw5IbCLiCIbl.h5ad', '/SqvXr3i3PGXM8toXzUf9.h5ad',
# '/REIyQZE6OMZm1S3W2Dxi.h5ad', '/rYZ7gs0E0cqPOLONC8ia.h5ad', '/FcwMDDbAQPNYIjcYNxoc.h5ad',
# '/fvU5BAMJrm7vrgDmZM0z.h5ad', '/gNNpgpo6gATjuxTE7CCp.h5ad'],

I wasn't able to locate these test files in the repository. Would you be willing to provide these h5ad files or instructions on how to obtain them? This would be extremely helpful for those of us looking to build upon your excellent work.

Thanks again for creating and sharing this project!

@WhenMelancholy WhenMelancholy added enhancement New feature or request question Further information is requested labels Dec 19, 2024
@jkobject
Copy link
Collaborator

jkobject commented Jan 6, 2025

Hi @WhenMelancholy,

Oh right, this is quite omission on my side. to explain, these files are coming from the cellxgene lamindb dataset used during my run. I get them from cellxgene but their IDs as is make them completely useless for you.

I thus need to share with you these files, also I will update the package and README to showcase this:

  • gNNpgpo6gATjuxTE7CCp #denoising test
  • yBCKp6HmXuHa0cZptMo7 #omnipath grn test

respectively they are these 2 datasets on cellxgene, preprocessed with scdataloader's preprocessor:

@jkobject
Copy link
Collaborator

jkobject commented Jan 7, 2025

for simplicity for you, I have also moved them to hf:
https://huggingface.co/jkobject/scPRINT/tree/main

@jkobject jkobject closed this as completed Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants