Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate obs #3

Closed
wants to merge 4 commits into from
Closed

Propagate obs #3

wants to merge 4 commits into from

Conversation

mffrank
Copy link
Contributor

@mffrank mffrank commented Oct 21, 2021

Implements the features from #2

Tested with this:

import mudata as md
import anndata as ad
import numpy as np

mod1 = ad.AnnData(np.arange(0, 100, 0.1).reshape(-1, 10))
mod2 = ad.AnnData(np.arange(101, 2101, 1).reshape(-1, 20))
mods = {"mod1": mod1, "mod2": mod2}
# Make var_names different in different modalities
for m in ["mod1", "mod2"]:
    mods[m].var_names = [f"{m}_var{i}" for i in range(mods[m].n_vars)]
    mods[m].obs['common'] = 1
mod1.obs['individual1'] = 2
mod2.obs['individual2'] = 3
mdata = md.MuData(mods)

# Now add some global columns
mdata.obs['to_propagate'] = 1
mdata.obs['mod2:individualadded'] = 4
mdata.propagate_obs()

mdata.obs
mdata.mod['mod1'].obs

Has some minimal tests and docstring, please let me know what's missing.

@gtca
Copy link
Collaborator

gtca commented Nov 10, 2021

Thanks for going ahead to implement it! I think there is still place for some discussion about how this should behave.
Let's consider different scenarios.

simple global column

mdata.obs["simple"] = np.arange(mdata.n_obs)
mdata.propagate_obs()
mdata.obs.columns  
# => Index(['rna:simple', 'atac:simple'], dtype='object')
mdata.obs.head(2)
#  =>                 rna:simple   atac:simple
# AAACAGCCAAATATCC-1           0             0
# AAACAGCCAGGAACTG-1           1             1

The column is successfully propagated but since it is now present in individual modalities, mdata.obs["simple"] is removed and modname:simple columns are added from modalities upon the update. Yes, this would allow to have modality.obs["simple"] but this removes mdata.obs["simple"], which did not have modality specificity before. Depends on the use case... but in general this does not seem ideal to me.

It might be that we consider changing .update() to include comparisons to decide if modname:simple can be collapsed to simple but I think we would like to avoid that (deterministic behaviour, speed, etc.).

Maybe the simplest way here is to keep propagated columns such as simple as global columns as well, or at least to have this as an option.

modality-specific change

mdata.obs["rna:simple"] = "x"
mdata.propagate_obs()

mdata.obs.head(2)
#  =>                 rna:simple   atac:simple
# AAACAGCCAAATATCC-1           0             0
# AAACAGCCAGGAACTG-1           1             1

While the doc currently indicates that the function will overwrite existing columns with the same name in individual modalities, it doesn't seem to be the case here. Is this intended?

It will overwrite the "simple" column when it's added to mdata.obs without modname though:

mdata.obs["simple"] = "x"
mdata.propagate_obs()
mdata.obs.head(2)
# =>                 rna:simple atac:simple
# AAACAGCCAAATATCC-1          x           x
# AAACAGCCAGGAACTG-1          x           x

Let me know what you think!

@gtca
Copy link
Collaborator

gtca commented Sep 20, 2023

This is closed in favour of #57.

@gtca gtca closed this Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants