We are building cellxgene, a data publishing and exploration platform that maximizes the reusability of published data by standardizing it and making it findable and accessible. Cellxgene will provide cell-centric data access APIs for computational reuse and interactive graphical interfaces so that biologists and physicians can ask questions directly of the data.
Our technology roadmap aims to:
- Make it easier to find data and evaluate if existing data can be used to answer novel research questions
- Make it easier for biologists, physicians, and computational biologists to use data to answer questions
We are planning to develop the following features on the Data Portal application. The Status column has one of three statuses:
- Development: Our engineering team is currently working hard to deliver this feature.
- Discovery: We're in the process of defining requirements and prototyping this feature.
- Not Started: Self explanatory! We haven't started Discovery yet.
Expected Release |
Feature | Description | Supports Theme | Status |
---|---|---|---|---|
2021 | Find datasets | Users can filter by metadata to find relevant datasets | 1 | Development |
2022 | Gene sets | Users can publish gene sets containing marker genes, pathways, or mechanisms along with datasets. Scientists coming to cellxgene can download and explore data with published gene sets | 1 | Discovery |
2022 | Differential expression scale and speed improvements | Differential expression runs on million cell scRNA-seq datasets in tens of seconds | 2 | Discovery |
2022 | Where's my gene? | Users can view the expression of gene(s) across cell types in our public data corpus spanning all available human tissues | 2 | Discovery |
2022 | Protein & splicing data support | Users can publish and explore datasets that contain multiple data modalities (i.e. Protein & mRNA). | 1, 2 | Discovery |
2022 | Download selected cells from the corpus | Users can select cells that match metadata in our schema and download a concatenated set of cells from our UI or in their local compute environment using our API | 2 | Not Started |
2022 | Integrate selections of cells and explore the results | Users can select cells to integrate and then explore them using cellxgene's visual tools | 2 | Not Started |
2022+ | Annotate and persist integrated datasets | Once a user integrates multiple datasets, they can create cell labels and gene sets. The integrated dataset, cell labels, and gene sets will be downloadable, and will persist the next time they log in | 2 | Not Started |
2022+ | Integrate my dataset with public datasets | Users can (privately) upload their dataset, select [cell types or datasets] to integrate with from the data corpus, predict labels and annotations for their data, and explore or download the result | 2 | Not Started |
2022+ | Gene set enrichment analysis | Users can submit a list of genes and receive gene sets that are likely matches. Users can view the gene sets that are most differentially expressed between two populations of cells | 2 | Not Started |
2022+ | Where's my pathway? | Users can view expression of predefined pathways (expressed as a gene set mean) for cell types across the data corpus | 2 | Not Started |
2022+ | Gene expression based indexing | When selecting cells to integrate or download, users can refine their selection with gene-expression based queries in addition to using metadata values. For example, users could select only T-cells that express CD4 and FOXP3 | 2 | Not Started |
- Our roadmap is subject to change based on community feedback and updates to our strategy.
- We update our roadmap every six months, this represents our best assessment of our future plans.
- We are fairly confident in delivering features in the short term. Longer term features we are less certain about what features we will deliver but are included to illustrate our vision for cellxgene.
We welcome your feedback! Please see the Contact Us section for how to get in touch.