Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding in clustering #20

Open
seichhorn opened this issue Oct 1, 2019 · 3 comments
Open

Adding in clustering #20

seichhorn opened this issue Oct 1, 2019 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@seichhorn
Copy link
Collaborator

I was gearing up to update the clustering code I and others use to make it 1) compatible with the most recent stable release of scanpy instead of the development version I was originally wrapping and 2) integrated into merlin. In thinking about it though, it didn't fit as cleanly into the merlin framework as I was originally thinking, largely because the clustering will often be performed on several datasets.

I felt like the two cleanest options were to make a "metaanalysis" class in merlin that takes in many datasets and performs analysis tasks on the aggregated data, and clustering would be one such analysis. The other was to just not integrate the clustering code and instead just make it easy to port the data from one to the other. Do you have any thoughts on this?

I feel like the metaanalysis class would only be worth it if we were going to use it for more than only clustering analyses. It's also possible to just let the clustering be a normal analysis task, tie it to a particular dataset, but let the user pass in multiple datasets via parameters. This seemed like something you wouldn't like, and I don't really favor it.

@seichhorn seichhorn added the enhancement New feature or request label Oct 1, 2019
@emanuega
Copy link
Owner

emanuega commented Oct 2, 2019

I anticipate that all MERFISH experiments will have to take into account more than a single measurement so the ability to perform meta analysis on multiple datasets is useful to incorporate into MERlin.

I do prefer the first option, but rather than a metaanalysis class, I would prefer a metadataset class (such as MetaMERFISHDataSet) that organizes multiple datasets and saves all the analysis performed on the metadataset into the corresponding metadataset directory. The metadataset can still be a subclass of the dataset class but instead of using the name of the raw data folder, a name will have to be specified for the metadataset. With this, very little would have to change in the analysis task structure and they should even be able to be executed in the nearly same way that analysis tasks are currently run on a MERFISHDataSet with an appropriate analysis parameters file. The tasks that run on a MetaMERFISHDataSet could be placed in merlin.metaanalysis instead of merlin.analysis to help distinguish them from the analysis tasks that run on a MERFISHDataSet and we could add in a check within each analysis task to make sure the right kind of DataSet is passed. The CLI will have to be updated so that MetaMERFISHDataSets can be created since currently only one dataset can be specified.

Perhaps each dataset within the metadataset can also include a label indicating the group the dataset belongs to, such as different experimental conditions, so that the analysis tasks that operate on the metadataset can analyze gene expression differences between the different conditions.

@seichhorn
Copy link
Collaborator Author

Yeah this works for me conceptually, I'll start working on the MetaMERFISHDataSets class now.

@seichhorn
Copy link
Collaborator Author

@emanuega to close out this long-running issue, I spent a while implementing the metaMERFISHDataSet class and clustering metaanalyses in the mercluster branch, but ultimately felt like these contributions were decreasing the clarity/quality of MERlin because of the extra baggage that had to come along with the changes to support these functions. I decided I would leave MERlin alone in this regard and implement the MERlin architecture to support just these types of metaanalysis tasks in a separate project, which is now in a functional but early stage in my MERCluster repo. If in the end you want to put something like this into MERlin I'm happy to help. A separate thought is that if we were going to extend MERlin it might be more generally useful add some additional visualization features so people can better interact with the data, this type of analysis tend to be more generally of interest from others in the lab.

One missing piece is that I think MERlin would benefit from a final data-aggregation task to merge your exported barcodes, sequential genes, etc into a single, normalized output file. I started writing that and at some point will issue a PR with it for MERlin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants