Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create organized starfish.spacetx bucket #1314

Closed
shanaxel42 opened this issue May 6, 2019 · 10 comments
Closed

Create organized starfish.spacetx bucket #1314

shanaxel42 opened this issue May 6, 2019 · 10 comments
Assignees
Labels
feature New work
Milestone

Comments

@shanaxel42
Copy link
Collaborator

shanaxel42 commented May 6, 2019

We need to define how we want to organize both the published datasets @dganguli will be working with. And the spaceTX datasets and results for the benchmarking.

@shanaxel42 shanaxel42 self-assigned this May 6, 2019
@ambrosejcarr
Copy link
Member

ambrosejcarr commented May 6, 2019

I think #1287 is relevant to the discussion of the formatted/processed separation.

@shanaxel42 shanaxel42 added this to the SpaceTX milestone May 7, 2019
@shanaxel42 shanaxel42 added the feature New work label May 7, 2019
@ttung
Copy link
Collaborator

ttung commented May 8, 2019

My proposal:

Top level: A list of datasets. NOT ASSAYS.
Second level: {raw, spacetx-formatted, outputs}.
Each second-level folder has timestamped folders.

@ambrosejcarr
Copy link
Member

Makes sense. Do we want a naming convention for the datasets? What information should be encoded in the name?

Hoping to avoid things like:

ISS_1
ISS_ambrose
ISS_deep
...

@ttung
Copy link
Collaborator

ttung commented May 8, 2019

I would say something more about the tissue type and chemistry?

@shanaxel42
Copy link
Collaborator Author

shanaxel42 commented May 14, 2019

proposal!

starfish.data.published
        assays
            datasets (some sort of schema for naming tissue type and chemistry)
                raw
                       raw_data
                formatted
                    date
                        experiment.json
                processed
                    date
                        decoded.csv (or whatever)
starfish.data.unpublished
        assays
            datasets (some sort of schema for naming tissue type and chemistry)
                raw
                       raw_data
                formatted
                    date
                        experiment.json
                processed
                    date
                        decoded.csv (or whatever)
starfish.data.spaceTX
        assays
            datasets (some sort of schema for naming tissue type and chemistry)
                raw
                       raw_data
                formatted
                    date
                        experiment.json
                processed
                    date
                        decoded.csv (or whatever)

we hand off everything underneath the spacteTX directory

@ttung
Copy link
Collaborator

ttung commented May 14, 2019

I'm generally supportive of this approach.

What's with the top-level starfish thing?

@shanaxel42
Copy link
Collaborator Author

What's with the top-level starfish thing?

oh thats just whatever the top level is so I guess now its starfish.data.public but could be whatever I don't really have a preference for that

@ambrosejcarr
Copy link
Member

I am downloading some slide-seq data to work with. Based on the above proposal, I intend to put it in:

starfish.data.public/published/rodriques_science_2019_slide-seq_perkinje-cerebellum/20190605/<data>

The "datasets" corresponds to <first-author-last-name>_<science>_<year>_<assay_type>_<tissue-type>

How does this sound for the "datasets" schema?

@shanaxel42
Copy link
Collaborator Author

scheme sounds fine but it should go in starfish.data.published, not starfish.data.public, the former is the new bucket

@shanaxel42
Copy link
Collaborator Author

shanaxel42 commented Jun 13, 2019

starfish.data.spacetx/ now contains all the current spaceTX data and results we have in an organized structure. The structure as well as the original locations used to copy the data over from is described here:

ISS_30
	mouse		
		formatted: 
			https://console.aws.amazon.com/s3/buckets/spacetx.starfish.data.upload/xiaoyan_qian/

	human
		formatted: 
			https://console.aws.amazon.com/s3/buckets/spacetx.starfish.data.upload/xiaoyan_qian/ISS_human_HCA_07_MultiFOV/
			main_files/?region=us-east-1&tab=overview

		starish_results: 
			https://console.aws.amazon.com/s3/buckets/spacetx.starfish.data.upload/xiaoyan_qian/ISS_human_HCA_07_MultiFOV/
			main_files/*iss_spacetx_*

ISS_120 
	mouse		
		formatted: 
			https://console.aws.amazon.com/s3/object/spacetx.starfish.data.upload/xiaoyan_qian/
			ISS_m_brain_03/README.txt?region=us-east-1&tab=overview

	human

		formatted: 
			https://console.aws.amazon.com/s3/buckets/spacetx.starfish.data.upload/xiaoyan_qian/	
			ISS_h_brain_03/?region=us-east-1&tab=overview

FISSEQ
	mouse: 
		contributer results: 
			spacetx.starfish.data.upload/samuel_inverso/20181203-mouse-71


BaristaSEQ: 
	mouse:  
		formatted: 
			https://console.aws.amazon.com/s3/buckets/spacetx.starfish.data.public/browse/formatted/20190319/baristaseq/?region=us-east-1&tab=overview

		contributer_results: 
			https://console.aws.amazon.com/s3/object/spacetx.starfish.data.upload/xiaoyin_chen/resultsandcode.zip?region=us-east-1&tab=overview

seqFISH
	mouse:
		formatted_mulitplexed: 
			https://console.aws.amazon.com/s3/buckets/spacetx.starfish.data.upload/nico_pierson/multiplexed/

		formatted_sequential:
			spacetx.starfish.data.upload/nico_pierson/sequential/

		contributer_results_mulitplexed: 
			spacetx.starfish.data.upload/nico_pierson/multiplexed/output/ 

smFISH
	mouse: 
		formatted:
			https://console.aws.amazon.com/s3/buckets/starfish.data.spacetx/smFISH/mouse/formatted/20190214/?region=us-east-1&tab=overview


spatial transcriptomics
	mouse: 
		formatted:
			Ambrose TODO

@shanaxel42 shanaxel42 changed the title Organize aws buckets Create organized starfish.spacetx bucket Jun 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New work
Projects
None yet
Development

No branches or pull requests

3 participants