-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create sample dataset #72
Comments
|
which crops, how much padding, and where should it be saved? |
so far for the small datasets to pull from s3:// via a script:
|
@d-v-b @yuriyzubov the general format should follow our schema em data: jrc_hela-2.zarr/recon-1/em/... explicit list of crops to be included: |
|
also, replace the current jrc_hela-2.zarr on s3 |
the data on s3 is now correct (i.e., the obviously this will eventually need to a) do all the things it's supposed to do, and b) be integrated into dacapo. but I don't think I can do either of those things today. @yuriyzubov (or anyone else), if you want to hack on this script feel free, just let me know, so that we can avoid duplicated effort. Specifically, if becomes part of dacapo, please link that PR or commit to this issue so I know about it. Otherwise I can finish it up over the weekend. |
@avweigel two of the crops in this list overlap (6 and 113), is that OK? |
I updated the gist with a fully-functioning script. it's pretty slow -- running it took several hours on my workstation -- but it does work. If the crappy performance is a problem, we can explore some performance optimizations. I am already doing some parallelism, but it's pretty coarse-grained and could surely benefit from some tooling. @rhoadesScholar if I wanted this script to be added to dacapo, where would we put it in the source tree? |
@d-v-b The idea was to put it in the examples folder. But perhaps this should be done with a more minimal list of crops to speed things up. I imagine users might start getting frustrated after 5+ minutes if they're just trying to run an example notebook. 😬 Are you downloading the whole scale pyramids? Because that would explain a lot of slowness, can could be safely omitted for simple example cases imo. |
I will see how things run with a reduced number of crops + only downloading |
Sorry for jumping in late,
Pytorch
DaCapo Example:
|
I think |
if we want to give them the best experience. i will recommand the hello world example to finetune |
|
what's the type of |
It is a DaCapo DataSplit |
Revisiting this @d-v-b and @yuriyzubov. We need a script to essentially do this for the segmentation challenge as well. |
We need a sample dataset in a predefined format for testing and demoing
The text was updated successfully, but these errors were encountered: