Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BPT-155] Spatial Data Refactor #104

Merged
merged 71 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
f0f5329
Update README.md
ckmah Jul 12, 2023
7d3ec21
Merge pull request #103 from ckmah/download-badge
ckmah Jul 12, 2023
1d582ac
Delete Non-refactored Files
dylanclam12 Aug 8, 2023
22e1141
SpatialData Modified Query Script
dylanclam12 Aug 8, 2023
a7ff251
SpatialData Formatting in _io.py and _geometry.py
dylanclam12 Aug 8, 2023
efda092
_shape_features.py Refactor
dylanclam12 Aug 8, 2023
f1912d5
_point_features.py Refactor
dylanclam12 Aug 8, 2023
7ff7f69
_lp.py Refactor Part 1
dylanclam12 Aug 11, 2023
ed6e1e9
_lp.py Refactor Part 2
dylanclam12 Aug 15, 2023
e47a265
_neighborhoods.py Addition
dylanclam12 Aug 15, 2023
f83ff83
_utils.py sync_points function and minor fixes
dylanclam12 Aug 17, 2023
e724986
_flux.py Refactor
dylanclam12 Aug 22, 2023
cb29d2d
Removed Dask Functionality
dylanclam12 Aug 24, 2023
55766cf
_flux_enrichment.py Refactor
dylanclam12 Aug 28, 2023
7068c43
_colocation.py, _decomposition.py, _composition.py Refactor
dylanclam12 Aug 29, 2023
0a33e1b
_plotting.py, _colors.py, _layers.py, _utils.py Refactor
dylanclam12 Oct 5, 2023
f17bf6e
_lp.py, _multidimensional.py Plotting Refactor
dylanclam12 Oct 13, 2023
31e2fc5
Addressing review comments on PR [BPT-155]
dylanclam12 Jan 16, 2024
10f1b33
Parse refactor for _utils.py, _geometry.py, and _io.py
dylanclam12 Jan 30, 2024
49f81f8
Parse refactor for _shape_features.py
dylanclam12 Jan 30, 2024
0585f38
Parse refactor for _points_features.py
dylanclam12 Jan 30, 2024
b28217a
Parse refactor for _flux.py and _flux_enrichment.py
dylanclam12 Jan 30, 2024
5e8cee9
_signatures.py Refactor
dylanclam12 Jan 30, 2024
f1a3980
_colocation.py minor fix
dylanclam12 Jan 31, 2024
8e82f13
Remove cell_boundaries_key parameters
dylanclam12 Feb 20, 2024
49512aa
cleanup typos, delete misc code, comments
ckmah Feb 22, 2024
1562b8d
cleanup formatting, fix docstring return info
ckmah Feb 22, 2024
0d5f7e7
shape sjoin index added to cell_boundaries df; use full shape names
ckmah Feb 22, 2024
dbdc570
cleanup format with ruff
ckmah Feb 22, 2024
ae5ef79
sjoins require key for cell shape
ckmah Feb 22, 2024
2ad8165
use full cell shape key for prefixing shape feature keys
ckmah Feb 22, 2024
35d07ae
sync sjoin data types, restrict points to 2D
ckmah Feb 28, 2024
0e6a929
remove shape prefixing
ckmah Feb 28, 2024
f790681
remove import unused to_tensor fn
ckmah Feb 28, 2024
b48f167
use instance_key to track cell shape
ckmah Feb 29, 2024
ec9ce51
test_io seems to work, test_shape_features wip, commented out other t…
ckmah Mar 1, 2024
a24e171
cast index to str always
ckmah Mar 19, 2024
26b1785
fixed point and shape sync (untested)
ckmah Mar 19, 2024
e90076a
save instance_key to proper points attr field
ckmah Mar 19, 2024
f9d3ccb
Merge remote-tracking branch 'origin/sjoin-index-type-fix' into spati…
dylanclam12 Mar 19, 2024
e13e73e
_geometry.py setters and getters
dylanclam12 Mar 20, 2024
d1a253d
test_io.py
dylanclam12 Mar 20, 2024
f234817
_shape_feature.py remove overwrite
dylanclam12 Mar 20, 2024
d7b7424
_shape_features.py remove overwrite
dylanclam12 Mar 20, 2024
c966c93
Merge remote-tracking branch 'origin/spatialdata_refactor' into spati…
dylanclam12 Mar 20, 2024
f9d5b7d
_shape_features.py Tests
dylanclam12 Mar 20, 2024
e73470b
_points_features.py: shape_names --> shape_keys
dylanclam12 Mar 22, 2024
06b615a
_points_features.py: instance_key
dylanclam12 Mar 22, 2024
abc9f51
_points_features.py: shape_key_index
dylanclam12 Mar 22, 2024
0470639
_point_features.py: analyze_points refactor
dylanclam12 Mar 22, 2024
495fdec
test_point_features.py
dylanclam12 Mar 22, 2024
cc2113e
simplifying test_point_features.py
dylanclam12 Mar 22, 2024
4eb7435
simplifying test_shape_features.py
dylanclam12 Mar 22, 2024
f0dfe93
Change PATTERN_FEATURES
dylanclam12 Mar 22, 2024
bf5f276
_lp.py: instance_key
dylanclam12 Mar 22, 2024
ebf3e36
_lp.py Tests
dylanclam12 Mar 22, 2024
9fa35a3
_geometry.py: set_metadata documentation
dylanclam12 Mar 28, 2024
79cb21e
_flux.py: instance_key
dylanclam12 Mar 28, 2024
f6e44af
_neighborhoods.py: feature_name
dylanclam12 Mar 28, 2024
0fbd7a0
_flux.py Tests
dylanclam12 Mar 28, 2024
cc18e88
_flux_enrichment.py: set_points_metadata
dylanclam12 Apr 1, 2024
a179937
_flux_enrichement.py: instance_key
dylanclam12 Apr 1, 2024
9034dd5
_flux_enrichment.py Tests
dylanclam12 Apr 1, 2024
dad41c4
remove import random test_flux.py and test_flux_enrichment.py
dylanclam12 Apr 1, 2024
285ee56
_colocation.py: colocation instance_key and feature_key
dylanclam12 Apr 1, 2024
48c8d62
_colocation.py: coloc_quotient keys
dylanclam12 Apr 1, 2024
411f62f
_colocation.py Tests
dylanclam12 Apr 1, 2024
30ecb71
_geometry.py Tests
dylanclam12 Apr 5, 2024
ee1f007
cleanup misc constants
ckmah Apr 5, 2024
8fdfdf1
Merge branch 'v2.1' into spatialdata_refactor
ckmah Apr 5, 2024
fcb72ec
more merge conflicts
ckmah Apr 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions bento/_constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@
PATTERN_NAMES = ["cell_edge", "cytoplasmic", "none", "nuclear", "nuclear_edge"]
PATTERN_PROBS = [f"{p}_p" for p in PATTERN_NAMES]
PATTERN_FEATURES = [
"cell_inner_proximity",
"nucleus_inner_proximity",
"nucleus_outer_proximity",
"cell_inner_asymmetry",
"nucleus_inner_asymmetry",
"nucleus_outer_asymmetry",
"cell_boundaries_inner_proximity",
"nucleus_boundaries_inner_proximity",
"nucleus_boundaries_outer_proximity",
"cell_boundaries_inner_asymmetry",
"nucleus_boundaries_inner_asymmetry",
"nucleus_boundaries_outer_asymmetry",
"l_max",
"l_max_gradient",
"l_min_gradient",
"l_monotony",
"l_half_radius",
"point_dispersion_norm",
"nucleus_dispersion_norm",
"nucleus_boundaries_dispersion_norm",
]
24 changes: 20 additions & 4 deletions bento/geometry/_geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,8 @@ def get_shape_metadata(
def set_points_metadata(
sdata: SpatialData,
points_key: str,
metadata: Union[pd.Series, pd.DataFrame],
metadata: Union[List, pd.Series, pd.DataFrame],
column_names: Optional[Union[str, List[str]]] = None,
):
"""Write metadata in SpatialData points element as column(s). Aligns metadata index to shape index.

Expand All @@ -291,10 +292,17 @@ def set_points_metadata(
"""
if points_key not in sdata.points.keys():
raise ValueError(f"{points_key} not found in sdata.points")

if isinstance(metadata, list):
metadata = pd.Series(metadata, index=sdata.points[points_key].index)

# Set metadata as columns in sdata.shape[shape_key]
if isinstance(metadata, pd.Series):
metadata = pd.DataFrame(metadata)

if column_names is not None:
if isinstance(column_names, str):
column_names = [column_names]
metadata = metadata.rename(columns={metadata.columns[0]: column_names[0]})

sdata.points[points_key] = sdata.points[points_key].reset_index(drop=True)
for name, series in metadata.iteritems():
Expand All @@ -305,7 +313,8 @@ def set_points_metadata(
def set_shape_metadata(
sdata: SpatialData,
shape_key: str,
metadata: Union[pd.Series, pd.DataFrame],
metadata: Union[List, pd.Series, pd.DataFrame],
column_names: Optional[Union[str, List[str]]] = None,
):
"""Write metadata in SpatialData shapes element as column(s). Aligns metadata index to shape index.

Expand All @@ -320,11 +329,18 @@ def set_shape_metadata(
"""
if shape_key not in sdata.shapes.keys():
raise ValueError(f"Shape {shape_key} not found in sdata.shapes")

if isinstance(metadata, list):
metadata = pd.Series(metadata, index=sdata.shapes[shape_key].index)

# Set metadata as columns in sdata.shape[shape_key]
if isinstance(metadata, pd.Series):
metadata = pd.DataFrame(metadata)

if column_names is not None:
if isinstance(column_names, str):
column_names = [column_names]
metadata = metadata.rename(columns={metadata.columns[0]: column_names[0]})

sdata.shapes[shape_key].loc[:, metadata.columns] = metadata.reindex(
sdata.shapes[shape_key].index
).fillna("")
Expand Down
66 changes: 41 additions & 25 deletions bento/tools/_lp.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from typing import List, Optional, Union
import pickle
import warnings

Expand All @@ -13,21 +14,25 @@
from tqdm.auto import tqdm
from spatialdata._core.spatialdata import SpatialData

#from .._utils import track
from .._constants import PATTERN_NAMES, PATTERN_FEATURES

tqdm.pandas()

def lp(sdata: SpatialData, groupby: str = "gene"):
def lp(
sdata: SpatialData,
instance_key: str = "cell_boundaries",
groupby: Optional[Union[str, List[str]]] = "gene"
):
"""Predict transcript subcellular localization patterns.
Patterns include: cell edge, cytoplasmic, nuclear edge, nuclear, none

Parameters
----------
sdata : SpatialData
Spatial formatted SpatialData object

groupby : str or list of str, optional (default: None)
Key in `data.points['transcripts'] to groupby, by default None. Always treats each cell separately
Key in `sdata.points[points_key] to groupby, by default None. Always treats each cell separately

Returns
-------
Expand All @@ -42,7 +47,7 @@ def lp(sdata: SpatialData, groupby: str = "gene"):
groupby = [groupby]

# Compute features
feature_key = f"cell_{'_'.join(groupby)}_features"
feature_key = f"{instance_key}_{'_'.join(groupby)}_features"
if feature_key not in sdata.table.uns.keys() or not all(
f in sdata.table.uns[feature_key].columns for f in PATTERN_FEATURES
):
Expand Down Expand Up @@ -78,7 +83,7 @@ def lp(sdata: SpatialData, groupby: str = "gene"):
)

# Add cell and groupby identifiers
pattern_prob.index = sdata.table.uns[feature_key].set_index(["cell", *groupby]).index
pattern_prob.index = sdata.table.uns[feature_key].set_index([instance_key, *groupby]).index

# Threshold probabilities to get indicator matrix
thresholds = [0.45300, 0.43400, 0.37900, 0.43700, 0.50500]
Expand All @@ -87,13 +92,15 @@ def lp(sdata: SpatialData, groupby: str = "gene"):
sdata.table.uns["lp"] = indicator_df.reset_index()
sdata.table.uns["lpp"] = pattern_prob.reset_index()

def lp_stats(sdata: SpatialData):
def lp_stats(sdata: SpatialData, instance_key: str = "cell_boundaries"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should default value be set for instance_key?

"""Computes frequencies of localization patterns across cells and genes.

Parameters
----------
data : SpatialData
sdata : SpatialData
Spatial formatted SpatialData object.
instance_key : str
cell boundaries instance key

Returns
-------
Expand All @@ -104,18 +111,20 @@ def lp_stats(sdata: SpatialData):

cols = lp.columns
groupby = list(cols[~cols.isin(PATTERN_NAMES)])
groupby.remove("cell")
groupby.remove(instance_key)

g_pattern_counts = lp.groupby(groupby).apply(lambda df: df[PATTERN_NAMES].sum().astype(int))
sdata.table.uns["lp_stats"] = g_pattern_counts

def _lp_logfc(sdata, phenotype=None):
def _lp_logfc(sdata, instance_key, phenotype=None):
"""Compute pairwise log2 fold change of patterns between groups in phenotype.

Parameters
----------
data : SpatialData
Spatial formatted SpatialData object.
instance_key: str
cell boundaries instance key
phenotype : str
Variable grouping cells for differential analysis. Must be in sdata.shapes["cell_boundaries"].columns.

Expand All @@ -126,22 +135,19 @@ def _lp_logfc(sdata, phenotype=None):
"""
stats = sdata.table.uns["lp_stats"]

if phenotype not in sdata.shapes["cell_boundaries"].columns:
if phenotype not in sdata.shapes[instance_key].columns:
raise ValueError("Phenotype is invalid.")

phenotype_vector = sdata.shapes["cell_boundaries"][phenotype]
phenotype_vector = sdata.shapes[instance_key][phenotype]

pattern_df = sdata.table.uns["lp"].copy()
groups_name = stats.index.name
'''pattern_df[["cell", groups_name]] = data.uns[f"cell_{groups_name}_features"][
["cell", groups_name]
]'''

gene_fc_stats = []
for c in PATTERN_NAMES:
# save pattern frequency to new column, one for each group
group_freq = (
pattern_df.pivot(index="cell", columns=groups_name, values=c)
pattern_df.pivot(index=instance_key, columns=groups_name, values=c)
.replace("none", np.nan)
.astype(float)
.groupby(phenotype_vector)
Expand Down Expand Up @@ -184,15 +190,17 @@ def log2fc(group_col):

return gene_fc_stats

def _lp_diff_gene(cell_by_pattern, phenotype_series):
def _lp_diff_gene(cell_by_pattern, phenotype_series, instance_key):
"""Perform pairwise comparison between groupby and every class.

Parameters
----------
cell_by_pattern : DataFrame
Cell by pattern matrix.
phenotype_vector : Series
phenotype_series : Series
Series of cell groupings.
instance_key : str
cell boundaries instance key

Returns
-------
Expand All @@ -204,7 +212,7 @@ def _lp_diff_gene(cell_by_pattern, phenotype_series):
# One hot encode categories
group_dummies = pd.get_dummies(phenotype_series)
group_names = group_dummies.columns.tolist()
group_data = cell_by_pattern.set_index("cell").join(group_dummies, how='inner')
group_data = cell_by_pattern.set_index(instance_key).join(group_dummies, how='inner')
group_data.columns = group_data.columns.astype(str)

# Perform one group vs rest logistic regression
Expand Down Expand Up @@ -245,14 +253,18 @@ def _lp_diff_gene(cell_by_pattern, phenotype_series):
return results if len(results) > 0 else None

def lp_diff_discrete(
sdata: SpatialData, phenotype: str = None
sdata: SpatialData,
instance_key: str = "cell_boundaries",
phenotype: str = None
):
"""Gene-wise test for differential localization across phenotype of interest.

Parameters
----------
sdata : SpatialData
Spatial formatted SpatialData object.
instance_key : str
cell boundaries instance key
phenotype : str
Variable grouping cells for differential analysis. Must be in sdata.shape["cell_boundaries].columns.

Expand All @@ -266,7 +278,7 @@ def lp_diff_discrete(
stats = sdata.table.uns["lp_stats"]

# Retrieve cell phenotype
phenotype_series = sdata.shapes["cell_boundaries"][phenotype]
phenotype_series = sdata.shapes[instance_key][phenotype]
if is_numeric_dtype(phenotype_series):
raise KeyError(f"Phenotype dtype must not be numeric | dtype: {phenotype_series.dtype}")

Expand All @@ -276,7 +288,7 @@ def lp_diff_discrete(

diff_output = (
pattern_df.groupby(groups_name)
.progress_apply(lambda gp: _lp_diff_gene(gp, phenotype_series))
.progress_apply(lambda gp: _lp_diff_gene(gp, phenotype_series, instance_key))
.reset_index()
)

Expand All @@ -294,7 +306,7 @@ def lp_diff_discrete(
results.loc[results["-log10padj"] == np.inf, "-log10padj"] = results.loc[results["-log10padj"] != np.inf]["-log10padj"].max()

# Group-wise log2 fold change values
log2fc_stats = _lp_logfc(sdata, phenotype)
log2fc_stats = _lp_logfc(sdata, instance_key, phenotype)

# Join log2fc results to p value df
results = (
Expand All @@ -310,14 +322,18 @@ def lp_diff_discrete(
sdata.table.uns[f"diff_{phenotype}"] = results

def lp_diff_continuous(
sdata: SpatialData, phenotype: str = None
sdata: SpatialData,
instance_key: str = "cell_boundaries",
phenotype: str = None
):
"""Gene-wise test for differential localization across phenotype of interest.

Parameters
----------
sdata : SpatialData
Spatial formatted SpatialData object.
instance_key : str
cell boundaries instance key
phenotype : str
Variable grouping cells for differential analysis. Must be in sdata.shape["cell_boundaries].columns.

Expand All @@ -331,14 +347,14 @@ def lp_diff_continuous(
stats = sdata.table.uns["lp_stats"]
lpp = sdata.table.uns["lpp"]
# Retrieve cell phenotype
phenotype_series = sdata.shapes["cell_boundaries"][phenotype]
phenotype_series = sdata.shapes[instance_key][phenotype]


pattern_dfs = {}
# Compute correlation for each point group along cells
for p in PATTERN_NAMES:
groups_name = stats.index.name
p_labels = lpp.pivot(index="cell", columns=groups_name, values=p)
p_labels = lpp.pivot(index=instance_key, columns=groups_name, values=p)
p_corr = p_labels.corrwith(phenotype_series, axis=0, drop=True)

pattern_df = pd.DataFrame(p_corr).reset_index(drop = False)
Expand Down
Loading